llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	f208644ed3	[CGBuilder] Remove CreateBitCast() method Use CreateElementBitCast() instead, or don't work on Address where not necessary.	2022-02-14 15:06:04 +01:00
Sander de Smalen	0b41238ae7	[AArch64] Emit TBAA metadata for SVE load/store intrinsics In Clang we can attach TBAA metadata based on the load/store intrinsics based on the operation's element type. This also contains changes to InstCombine where the AArch64-specific intrinsics are transformed into generic LLVM load/store operations, to ensure that all metadata is transferred to the new instruction. There will be some further work after this patch to also emit TBAA metadata for SVE's gather/scatter- and struct load/store intrinsics. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D119319	2022-02-11 09:00:29 +00:00
Simon Pilgrim	09857a4bd1	[X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes: __m256i test_mm256_adds_epi8(__m256i a, __m256i b) { // CHECK-LABEL: test_mm256_adds_epi8 // CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.}}, <32 x i8> %{{.}}) return _mm256_adds_epi8(a, b); }	2022-02-08 15:00:10 +00:00
Simon Pilgrim	a59faf272e	Revert rG6c174ab2ad0676b295f11f6c3913eff9289fa6b9 "[X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat" Missed some legacy builtin tests that need cleaning up first	2022-02-08 14:45:28 +00:00
Simon Pilgrim	6c174ab2ad	[X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes: __m256i test_mm256_adds_epi8(__m256i a, __m256i b) { // CHECK-LABEL: test_mm256_adds_epi8 // CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.}}, <32 x i8> %{{.}}) return _mm256_adds_epi8(a, b); }	2022-02-08 14:21:20 +00:00
Simon Pilgrim	c00db97159	[Clang] Add elementwise saturated add/sub builtins This patch implements `__builtin_elementwise_add_sat` and `__builtin_elementwise_sub_sat` builtins. These map to the add/sub saturated math intrinsics described here: https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics With this in place we should then be able to replace the x86 SSE adds/subs intrinsics with these generic variants - it looks like other targets should be able to use these as well (arm/aarch64/webassembly all have similar examples in cgbuiltin). Differential Revision: https://reviews.llvm.org/D117898	2022-02-08 11:22:01 +00:00
Nikita Popov	c45a99f36b	[MatrixBuilder] Require explicit element type in CreateColumnMajorLoad() This makes the method compatible with opaque pointers.	2022-02-07 16:57:33 +01:00
Nikita Popov	cdc0573f75	[MatrixBuilder] Remove unnecessary IRBuilder template (NFC) IRBuilderBase exists specifically to avoid the need for this.	2022-02-07 16:42:38 +01:00
John Brawn	bca998ed3c	[AArch64] Generate fcmps when appropriate for neon intrinsics Differential Revision: https://reviews.llvm.org/D118257	2022-02-04 12:55:38 +00:00
tyb0807	51e188d079	[AArch64] Support for memset tagged intrinsic This introduces a new ACLE intrinsic for memset tagged (https://github.com/ARM-software/acle/blob/next-release/main/acle.md#memcpy-family-of-operations-intrinsics---mops). void __builtin_arm_mops_memset_tag(void , int, size_t) A corresponding LLVM intrinsic is introduced: i8* llvm.aarch64.mops.memset.tag(i8*, i8, i64) The types match llvm.memset but the return type is not void. This is part 1/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson Differential Revision: https://reviews.llvm.org/D117753	2022-01-31 20:49:34 +00:00
David Green	82973edfb7	[ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics Since it's introduction, the qrdmlah has been represented as a qrdmulh and a sadd_sat. This doesn't produce the same result for all input values though. This patch fixes that by introducing a qrdmlah (and qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah instructions. The old test cases will now produce a qrdmulh and sqadd, as expected. Fixes #53120 and #50905 and #51761. Differential Revision: https://reviews.llvm.org/D117592	2022-01-27 19:19:46 +00:00
JackAKirk	0ad19a8331	[CUDA,NVPTX] Corrected fragment size for tf32 LD B matrix. Signed-off-by: JackAKirk <jack.kirk@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D118023	2022-01-25 11:29:19 -08:00
Simon Pilgrim	e4074432d5	[X86] Remove avx512f integer and/or/xor/min/max reduction intrinsics and use generic equivalents None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage: llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c Differential Revision: https://reviews.llvm.org/D117881	2022-01-24 11:57:53 +00:00
Simon Pilgrim	3e50593b18	[X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min` D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes: ``` __m256i test_mm256_max_epu32(__m256i a, __m256i b) { // CHECK-LABEL: test_mm256_max_epu32 // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.}}, <8 x i32> %{{.}}) return _mm256_max_epu32(a, b); } ``` This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`). Sibling patch to D117791 Differential Revision: https://reviews.llvm.org/D117798	2022-01-24 11:40:29 +00:00
Simon Pilgrim	e5147f82e1	[X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN) This patch removes the `__builtin_ia32_pabs` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes: ``` __m256i test_mm256_abs_epi8(__m256i a) { // CHECK-LABEL: test_mm256_abs_epi8 // CHECK: [[ABS:%.]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false) return _mm256_abs_epi8(a); } ``` This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`). Differential Revision: https://reviews.llvm.org/D117791	2022-01-24 11:25:21 +00:00
Simon Pilgrim	0abaf64580	Revert rG4727d29d908f9dd608dd97a58c0af1ad579fd3ca "[X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs" Some build bots are referencing the `__builtin_ia32_pabs` intrinsics via alternative headers	2022-01-21 12:35:36 +00:00
Simon Pilgrim	3ef88b3184	Revert rG8ee135dcf8ff060656ad481c3e980fe8763576f5 "[X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`" Some build bots are referencing the `__builtin_ia32_pmax/min` intrinsics via alternative headers	2022-01-21 12:34:19 +00:00
Simon Pilgrim	8ee135dcf8	[X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min` D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes: ``` __m256i test_mm256_max_epu32(__m256i a, __m256i b) { // CHECK-LABEL: test_mm256_max_epu32 // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.}}, <8 x i32> %{{.}}) return _mm256_max_epu32(a, b); } ``` This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`). Sibling patch to D117791 Differential Revision: https://reviews.llvm.org/D117798	2022-01-21 12:24:58 +00:00
Simon Pilgrim	4727d29d90	[X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN) This patch removes the `__builtin_ia32_pabs` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes: ``` __m256i test_mm256_abs_epi8(__m256i a) { // CHECK-LABEL: test_mm256_abs_epi8 // CHECK: [[ABS:%.]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false) return _mm256_abs_epi8(a); } ``` This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`). Differential Revision: https://reviews.llvm.org/D117791	2022-01-21 11:59:08 +00:00
Chenbing.Zheng	0be3da1fab	[RISCV] Add intrinsic for Zbt extension RV32: fsl, fsr, fsri RV64: fsl, fsr, fsri, fslw, fsrw, fsriw Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117468	2022-01-20 08:27:05 +00:00
Benjamin Kramer	765dd8b8a4	[CGBuiltin] Simplify code. NFCI.	2022-01-14 16:02:02 +01:00
Jun Zhang	8de0c1feca	[Clang] Add __builtin_reduce_or and __builtin_reduce_and This patch implements two builtins specified in D111529. The last __builtin_reduce_add will be seperated into another one. Differential Revision: https://reviews.llvm.org/D116736	2022-01-14 22:05:26 +08:00
Lian Wang	16877c5d2c	[RISCV] Add bfp and bfpw intrinsic in zbf extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116994	2022-01-13 02:53:00 +00:00
Simon Pilgrim	497a4b26c4	CGBuiltin - Use castAs<> instead of getAs<> to avoid dereference of nullptr The pointer is always dereferenced immediately below, so assert the cast is correct instead of returning nullptr	2022-01-12 15:35:37 +00:00
Marco Elver	732ad8ea62	[clang][auto-init] Provide __builtin_alloca*_uninitialized variants When `-ftrivial-auto-var-init=` is enabled, allocas unconditionally receive auto-initialization since [1]. In certain cases, it turns out, this is causing problems. For example, when using alloca to add a random stack offset, as the Linux kernel does on syscall entry [2]. In this case, none of the alloca'd stack memory is ever used, and initializing it should be controllable; furthermore, it is not always possible to safely call memset (see [2]). Introduce `__builtin_alloca_uninitialized()` (and `__builtin_alloca_with_align_uninitialized`), which never performs initialization when `-ftrivial-auto-var-init=` is enabled. [1] https://reviews.llvm.org/D60548 [2] https://lkml.kernel.org/r/YbHTKUjEejZCLyhX@elver.google.com Reviewed By: glider Differential Revision: https://reviews.llvm.org/D115440	2022-01-12 15:13:10 +01:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Jun Zhang	5be131922c	[NFC] Test commit. This is just a test commit to check whether the permission I got is correct or not.	2022-01-08 10:36:09 +08:00
Jun Zhang	b2ed9f3f44	[Clang] Implement the rest of __builtin_elementwise_* functions. The patch implement the rest of __builtin_elementwise_* functions specified in D111529, including: * __builtin_elementwise_floor * __builtin_elementwise_roundeven * __builtin_elementwise_trunc Signed-off-by: Jun <jun@junz.org> Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D115429	2022-01-07 15:11:36 +00:00
Nikita Popov	e8b98a5216	[CodeGen] Emit elementtype attributes for indirect inline asm constraints This implements the clang side of D116531. The elementtype attribute is added for all indirect constraints (*) and tests are updated accordingly. Differential Revision: https://reviews.llvm.org/D116666	2022-01-06 09:29:22 +01:00
Jun Zhang	82020de532	Recommit "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic."" This reverts the revert commit `f552ba6e84`. Recommit with fixed author name.	2022-01-04 13:46:41 +00:00
Florian Hahn	f552ba6e84	Revert "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic." This reverts commit `5c57e6aa57`. Reverted due to a typo in the authors name. Will recommit soon with fixed authorship.	2022-01-04 13:45:28 +00:00
Jun Zhan	5c57e6aa57	[Clang] Extend emitUnaryBuiltin to avoid duplicate logic. This patch extends `emitUnaryBuiltin` so that we can better emitting IR when implement builtins specified in D111529. Also contains some NFC, applying it to existing code. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D116161	2022-01-04 11:47:41 +00:00
Krzysztof Parzyszek	dcb3e8083a	[Hexagon] Make conversions to vector predicate types explicit for builtins HVX does not have load/store instructions for vector predicates (i.e. bool vectors). Because of that, vector predicates need to be converted to another type before being stored, and the most convenient representation is an HVX vector. As a consequence, in C/C++, source-level builtins that either take or produce vector predicates take or return regular vectors instead. On the other hand, the corresponding LLVM intrinsics do have boolean types that, and so a conversion of the operand or the return value was necessary. This conversion would happen inside clang's codegen, but was somewhat fragile. This patch changes the strategy: a builtin that takes a vector predicate now really expects a vector predicate. Since such a predicate cannot be provided via a variable, this builtin must be composed with other builtins that either convert vector to a predicate (V6_vandvrt) or predicate to a vector (V6_vandqrt). For users using builtins defined in hvx_hexagon_protos.h there is no impact: the conversions were added to that file. Other users will need to insert - __builtin_HEXAGON_V6_vandvrt[_128B](V, -1) to convert vector V to a vector predicate, or - __builtin_HEXAGON_V6_vandqrt[_128B](Q, -1) to convert vector predicate Q to a vector. Builtins __builtin_HEXAGON_V6_vmaskedstore.* are a temporary exception to that, but they are deprecated and should not be used anyway. In the future they will either follow the same rule, or be removed.	2021-12-22 12:52:24 -08:00
Jun Zhan	b55ea2fbc0	[Clang] Add __builtin_reduce_xor This patch implements __builtin_reduce_xor as specified in D111529. Reviewed By: fhahn, aaron.ballman Differential Revision: https://reviews.llvm.org/D115231	2021-12-22 10:00:27 +00:00
Sami Tolvanen	ec2e26eaf6	[Clang] Add __builtin_function_start Control-Flow Integrity (CFI) replaces references to address-taken functions with pointers to the CFI jump table. This is a problem for low-level code, such as operating system kernels, which may need the address of an actual function body without the jump table indirection. This change adds the __builtin_function_start() builtin, which accepts an argument that can be constant-evaluated to a function, and returns the address of the function body. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Depends on D108478 Reviewed By: pcc, rjmccall Differential Revision: https://reviews.llvm.org/D108479	2021-12-20 12:55:33 -08:00
Nikita Popov	d930c3155c	[CodeGen] Pass element type to EmitCheckedInBoundsGEP() Same as for other GEP creation methods.	2021-12-15 14:03:33 +01:00
Nikita Popov	481de0ed80	[CodeGen] Prefer CreateElementBitCast() where possible CreateElementBitCast() can preserve the pointer element type in the presence of opaque pointers, so use it in place of CreateBitCast() in some places. This also sometimes simplifies the code a bit.	2021-12-15 11:48:39 +01:00
Nikita Popov	834c8ff587	[CodeGen] Avoid some uses of deprecated Address constructor Explicitly pass in the element type instead.	2021-12-15 11:13:10 +01:00
Nikita Popov	b4f46555d7	[CodeGen] Avoid some pointer element type accesses	2021-12-15 09:29:27 +01:00
Matt Devereau	41def32040	[AArch64][SVE][NEON] Add NEON-SVE-Bridge intrinsics Adds svset_neonq, svget_neonq, svdup_neonq AArch64 intrinsics. These are described in the ACLE specification: https://github.com/ARM-software/acle/pull/72 https://reviews.llvm.org/D114713	2021-12-13 11:31:57 +00:00
Chuanqi Xu	352e36e10d	[Coroutines] Remove unused coroutine builtin/intrinsics llvm.coro.param (NFC-ish) I found that the coroutine intrinsic llvm.coro.param in documentation (https://llvm.org/docs/Coroutines.html#id101) didn't get used actually since there isn't lowering codes in LLVM. I also checked the implementation of libstdc++ and libc++. Both of them didn't use llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic is unused. I think it would be better t to remove it to avoid possible misleading understandings. Note: according to [class.copy.elision]/p1.3, this optimization is allowed by the C++ language specification. Let's make it someday. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D115222	2021-12-09 14:40:25 +08:00
Jun Zhang	8680f951c2	Add __builtin_elementwise_ceil This patch implements one of the missing builtin functions specified in https://reviews.llvm.org/D111529.	2021-12-08 08:29:33 -05:00
Aaron Ballman	6c75ab5f66	Introduce _BitInt, deprecate _ExtInt WG14 adopted the _ExtInt feature from Clang for C23, but renamed the type to be _BitInt. This patch does the vast majority of the work to rename _ExtInt to _BitInt, which accounts for most of its size. The new type is exposed in older C modes and all C++ modes as a conforming extension. However, there are functional changes worth calling out: * Deprecates _ExtInt with a fix-it to help users migrate to _BitInt. * Updates the mangling for the type. * Updates the documentation and adds a release note to warn users what is going on. * Adds new diagnostics for use of _BitInt to call out when it's used as a Clang extension or as a pre-C23 compatibility concern. * Adds new tests for the new diagnostic behaviors. I want to call out the ABI break specifically. We do not believe that this break will cause a significant imposition for early adopters of the feature, and so this is being done as a full break. If it turns out there are critical uses where recompilation is not an option for some reason, we can consider using ABI tags to ease the transition.	2021-12-06 12:52:01 -05:00
Jay Foad	2774bad112	[AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to match how the hardware instruction works. Don't change the API of the corresponding OpenCL builtins. Differential Revision: https://reviews.llvm.org/D115032	2021-12-04 10:32:11 +00:00
Qiu Chaofan	4f94c02616	[Clang] Mutate bulitin names under IEEE128 on PPC64 Glibc 2.32 and newer uses these symbol names to support IEEE-754 128-bit float. GCC transforms name of these builtins to align with Glibc header behavior. Since Clang doesn't have all GCC-compatible builtins implemented, this patch only mutates the implemented part. Note nexttoward is a special case (no nexttowardf128) so it's also handled here. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112401	2021-12-03 17:50:18 +08:00
skc7	16b781e6d1	[AMDGPU][clang] Fix __builtin_nontemporal_store() failure on AMDGPU Reviewed By: yaxunl, sameerds Differential Revision: https://reviews.llvm.org/D114849	2021-12-02 05:53:25 +00:00
Ahsan Saghir	4c8b8e0154	[PowerPC] Allow MMA built-ins to accept non-void pointers and arrays Calls to MMA builtins that take pointer to void do not accept other pointers/arrays whereas normal functions with the same parameter do. This patch allows MMA built-ins to accept non-void pointers and arrays. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D113306	2021-11-16 09:14:41 -06:00
Jon Chesterfield	27177b82d4	[OpenMP] Lower printf to __llvm_omp_vprintf Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char, void arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680	2021-11-10 15:30:56 +00:00
Jon Chesterfield	0fa45d6d80	Revert "[OpenMP] Lower printf to __llvm_omp_vprintf" This reverts commit `db81d8f6c4`.	2021-11-08 20:28:57 +00:00
Jon Chesterfield	db81d8f6c4	[OpenMP] Lower printf to __llvm_omp_vprintf Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf` which takes the same const char, void arguments as cuda vprintf and also passes the size of the void* alloca which will be needed by a non-stub implementation of `__llvm_omp_vprintf` for amdgpu. This removes the amdgpu link error on any printf in a target region in favour of silently compiling code that doesn't print anything to stdout. The exact set of changes to check-openmp probably needs revision before commit Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112680	2021-11-08 18:38:00 +00:00
Florian Hahn	7999355106	[Clang] Add min/max reduction builtins. This patch implements __builtin_reduce_max and __builtin_reduce_min as specified in D111529. The order of operations does not matter for min or max reductions and they can be directly lowered to the corresponding llvm.vector.reduce.{fmin,fmax,umin,umax,smin,smax} intrinsic calls. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D112001	2021-11-02 15:01:42 +01:00
Thomas Lively	fb67f3d969	[WebAssembly] Add prototype relaxed float to int trunc instructions Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u, i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112186	2021-10-28 14:01:53 -07:00
Florian Hahn	01870d51b8	[Clang] Add elementwise abs builtin. This patch implements __builtin_elementwise_abs as specified in D111529. Reviewed By: aaron.ballman, scanon Differential Revision: https://reviews.llvm.org/D111986	2021-10-27 21:01:44 +01:00
Florian Hahn	1ef25d28c1	[Clang] Add elementwise min/max builtins. This patch implements __builtin_elementwise_max and __builtin_elementwise_min, as specified in D111529. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D111985	2021-10-26 16:53:40 +01:00
Zhi An Ng	e1fb13401e	[WebAssembly] Add prototype relaxed float min max instructions Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112146	2021-10-20 09:41:51 -07:00
Zhi An Ng	2542bfa43a	[WebAssembly] Add prototype relaxed swizzle instructions Add i8x16 relaxed_swizzle instructions. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112022	2021-10-19 17:53:04 -07:00
Zhi An Ng	da07942834	[WebAssembly] Add prototype relaxed laneselect instructions Add i8x16, i16x8, i32x4, i64x2 laneselect instructions. These are only exposed as builtins, and require user opt-in.	2021-10-15 17:45:09 -07:00
Hsiangkai Wang	5158cfef8b	[RISCV] After reverting _mt builtins, add `ta` argument for LLVM IR. Previous patch only reverts C builtins for tail policy. In order to keep LLVM IR intact, add the `ta` argument in vector builtins.	2021-10-13 19:41:49 +08:00
Hsiangkai Wang	ff3ed78304	Revert "[RISCV] Define _m intrinsics as builtins, instead of macros." This reverts commit `97f0c63783`. As discussed in https://reviews.llvm.org/D110684, it increased the compile time and the binary size of clang more than 1%. I reverted this patch first to think about a better way to do it.	2021-10-13 12:21:51 +08:00
Hsiangkai Wang	97f0c63783	[RISCV] Define _m intrinsics as builtins, instead of macros. In the original design, we levarage _mt intrinsics to define macros for _m intrinsics. Such as, ``` __builtin_rvv_vadd_vv_i8m1_mt((vbool8_t)(op0), (vint8m1_t)(op1), (vint8m1_t)(op2), (vint8m1_t)(op3), (size_t)(op4), (size_t)VE_TAIL_AGNOSTIC) ``` However, we could not define generic interface for mask intrinsics any more due to clang_builtin_alias only accepts clang builtins as its argument. In the example, ``` __rvv_overloaded __attribute__((clang_builtin_alias(__builtin_rvv_vadd_vv_i8m1_mt))) vint8m1_t vadd(vbool8_t op0, vint8m1_t op1, vint8m1_t op2, vint8m1_t op3, size_t op4, size_t op5); ``` op5 is the tail policy argument. When users want to use vadd generic interface for masked vector add, they need to specify tail policy in the previous design. In this patch, we define _m intrinsics as clang builtins to solve the problem. Differential Revision: https://reviews.llvm.org/D110684	2021-10-12 10:47:55 +08:00
Stefan Pintilie	4fc2f4979c	[PowerPC] Fix __builtin_ppc_load2r to return short instead of int. This patch fixes the return value of the builtin __builtin_ppc_load2r to correctly return short instead of int. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D110771	2021-10-04 06:17:02 -05:00
Quinn Pham	67a3d1e275	[PowerPC] swdiv builtins for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch implements the software divide builtin as wrappers for a floating point divide. XL provided these builtins because it didn't produce software estimates by default at `-Ofast`. When compiled with `-Ofast` these builtins will produce the software estimate for divide. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D106959	2021-09-29 11:31:07 -05:00
Quinn Pham	70391b3468	[PowerPC] FP compare and test XL compat builtins. This patch is in a series of patches to provide builtins for compatability with the XL compiler. This patch adds builtins for compare exponent and test data class operations on floating point values. Reviewed By: #powerpc, lei Differential Revision: https://reviews.llvm.org/D109437	2021-09-28 11:01:51 -05:00
Ahsan Saghir	593b074a09	[PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins This patch adds the following built-ins: __builtin_vsx_build_pair __builtin_mma_build_acc Reviewed By: #powerpc, nemanjai, lei Differential Revision: https://reviews.llvm.org/D107647	2021-09-27 19:51:28 -05:00
Wang, Pengfei	7d6889964a	[X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110336	2021-09-27 09:27:04 +08:00
Thomas Lively	2f519825ba	[WebAssembly] Add prototype relaxed SIMD fma/fms instructions Add experimental clang builtins, LLVM intrinsics, and backend definitions for the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Do not allow these instructions to be selected without explicit user opt-in. Differential Revision: https://reviews.llvm.org/D110295	2021-09-23 11:01:36 -07:00
Xiang1 Zhang	c81d6ab875	[X86] Adjust Keylocker handle mem size Reviewed By: Topper Craig Differential Revision: https://reviews.llvm.org/D109488	2021-09-13 18:03:27 +08:00
Xiang1 Zhang	bdce8d40c6	Revert "[X86] Adjust Keylocker handle mem size" This reverts commit `3731de6b7f`.	2021-09-13 18:00:46 +08:00
Xiang1 Zhang	3731de6b7f	[X86] Adjust Keylocker handle mem size Reviewed By: Topper Craig Differential Revision: https://reviews.llvm.org/D109354	2021-09-13 17:59:33 +08:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Andrei Elovikov	1724a16437	[NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp ...that is located under llvm/lib/Support/. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D108423	2021-08-30 09:48:48 -07:00
Wang, Pengfei	c728bd5bba	[X86] AVX512FP16 instructions enabling 5/6 Enable FP16 FMA instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105268	2021-08-24 09:07:19 +08:00
Andrei Elovikov	f5c2889488	[NFC][clang] Use X86 Features declaration from X86TargetParser ...instead of redeclaring them in clang's own X86Target.def. They were already required to be in sync (IIUC), so no reason to maintain two identical lists. Reviewed By: erichkeane, craig.topper Differential Revision: https://reviews.llvm.org/D108151	2021-08-23 12:30:28 -07:00
Simon Pilgrim	7f48bd3bed	CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC. Don't pass the struct by value.	2021-08-22 12:13:17 +01:00
Wang, Pengfei	b088536ce9	[X86] AVX512FP16 instructions enabling 4/6 Enable FP16 unary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105267	2021-08-22 08:59:35 +08:00
Thomas Lively	88962cea46	[WebAssembly] Restore builtins and intrinsics for pmin/pmax Partially reverts `85157c0079`, which had removed these builtins and intrinsics in favor of normal codegen patterns. It turns out that it is possible for the patterns to be split over multiple basic blocks, however, which means that DAG ISel is not able to select them to the pmin/pmax instructions. To make sure the SIMD intrinsics generate the correct instructions in these cases, reintroduce the clang builtins and corresponding LLVM intrinsics, but also keep the normal pattern matching as well. Differential Revision: https://reviews.llvm.org/D108387	2021-08-20 09:21:31 -07:00
Martin Storsjö	cc3affd8b0	[clang] [MSVC] Implement __mulh and __umulh builtins for aarch64 The code is based on the same __mulh and __umulh intrinsics for x86. This should fix PR51128. Differential Revision: https://reviews.llvm.org/D106721	2021-08-19 11:29:55 +03:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Wang, Pengfei	2379949aad	[X86] AVX512FP16 instructions enabling 3/6 Enable FP16 conversion instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105265	2021-08-18 09:03:41 +08:00
Wang, Pengfei	f1de9d6dae	[X86] AVX512FP16 instructions enabling 2/6 Enable FP16 binary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105264	2021-08-15 08:56:33 +08:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Anshil Gandhi	39dac1f7f6	[clang] Add clang builtins support for gfx90a Implement target builtins for gfx90a including fadd64, fadd32, add2h, max and min on various global, flat and ds address spaces for which intrinsics are implemented. Differential Revision: https://reviews.llvm.org/D106909	2021-08-05 02:08:06 -06:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Kai Luo	e4902e69e9	[PowerPC] Fix return type of XL compat CAS `__compare_and_swap*` should return `i32` rather than `i1`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D107077	2021-07-29 14:49:26 +00:00
Thomas Lively	33786576fd	[WebAssembly] Codegen for extmul SIMD instructions Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724	2021-07-27 08:41:30 -07:00
Nemanja Ivanovic	1c50a5da36	[PowerPC] Implement partial vector ld/st builtins for XL compatibility XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a sequence of 1 to 16 bytes in big endian order, right justified in the vector register (regardless of target endianness). This is equivalent to vec_xl_len_r/vec_xst_len_r which are only available on Power9. This patch simply uses the Power9 functions when compiled for Power9, but provides a more general implementation for Power8. Differential revision: https://reviews.llvm.org/D106757	2021-07-26 13:19:52 -05:00
Thomas Lively	85157c0079	[WebAssembly] Codegen for pmin and pmax Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax} with standard codegen patterns. Since wasm_simd128.h uses an integer vector as the standard single vector type, the IR for the pmin and pmax intrinsic functions contains bitcasts that would not be there otherwise. Add extra codegen patterns that can still select the pmin and pmax instructions in the presence of these bitcasts. Differential Revision: https://reviews.llvm.org/D106612	2021-07-23 14:49:21 -07:00
Kai Luo	e4ed93cb25	[PowerPC] Implement XL compatible behavior of __compare_and_swap According to https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp XL's `__compare_and_swap` has a weird behavior that > In either case, the contents of the memory location specified by addr are copied into the memory location specified by old_val_addr. (unlike c11 `atomic_compare_exchange` specified in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) This patch let clang's implementation follow this behavior. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106344	2021-07-23 01:16:02 +00:00
Thomas Lively	8af333cf1a	[WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8 Use the standard target-independent intrinsic to take advantage of standard optimizations. Differential Revision: https://reviews.llvm.org/D106506	2021-07-21 16:45:54 -07:00
Thomas Lively	db7efcab7d	[WebAssembly] Remove clang builtins for extract_lane and replace_lane These builtins were added to capture the fact that the underlying Wasm instructions return i32s and implicitly sign or zero extend the extracted lanes in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations during code gen that these low-level details do not need to be exposed to users. This commit replaces the use of the builtins in wasm_simd128.h with normal target-independent vector code. As a result, we can switch the relevant intrinsics to use functions rather than macros and can use more user-friendly return types rather than trying to precisely expose the underlying Wasm types. Note, however, that the generated LLVM IR is no different after this change. Differential Revision: https://reviews.llvm.org/D106500	2021-07-21 16:11:00 -07:00
Thomas Lively	1a57ee1276	[WebAssembly] Codegen for v128.load{32,64}_zero Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal instruction selection patterns. The wasm_simd128.h intrinsics header was already using portable code for the corresponding intrinsics, so now it produces the correct instructions. Differential Revision: https://reviews.llvm.org/D106400	2021-07-21 09:02:12 -07:00
Quinn Pham	e002d251dd	[PowerPC] Floating Point Builtins for XL Compat. This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds builtins related to floating point operations Reviewed By: #powerpc, nemanjai, amyk, NeHuang Differential Revision: https://reviews.llvm.org/D103986	2021-07-21 08:33:39 -05:00
Albion Fung	2fd1520247	[PowerPC] Implemented mtmsr, mfspr, mtspr Builtins Implemented builtins for mtmsr, mfspr, mtspr on PowerPC; the patch is intended for XL Compatibility. Differential revision: https://reviews.llvm.org/D106130	2021-07-20 17:51:00 -05:00
Albion Fung	3434ac9e39	[PowerPC] Store, load, move from and to registers related builtins This patch implements store, load, move from and to registers related builtins, as well as the builtin for stfiw. The patch aims to provide feature parady with xlC on AIX. Differential revision: https://reviews.llvm.org/D105946	2021-07-20 15:46:14 -05:00
Victor Huang	1a762f93f8	[PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch add the builtin and emit target independent code for __cmpb. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D105194	2021-07-20 13:06:22 -05:00
Quinn Pham	fd855c24c7	[PowerPC] Restore FastMathFlags of Builder for Vector FDiv Builtins This patch fixes `__builtin_ppc_recipdivf`, `__builtin_ppc_recipdivd`, `__builtin_ppc_rsqrtf`, and `__builtin_ppc_rsqrtd`. FastMathFlags are set to fast immediately before emitting these builtins. Now the flags are restored to their previous values after the builtins are emitted. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D105984	2021-07-20 09:41:00 -05:00
Stefan Pintilie	02cd937945	[PowerPC][Builtins] Added a number of builtins for compatibility with XL. Added a number of different builtins that exist in the XL compiler. Most of these builtins already exist in clang under a different name. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D104386	2021-07-20 08:57:55 -05:00
Quinn Pham	0268e123be	[PowerPC] swdiv_nochk Builtins for XL Compat This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds software divide builtins with no checking. These builtins are each emitted as a fast fdiv. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D106150	2021-07-19 16:51:10 -05:00
Nikita Popov	2c68ecccc9	[OpaquePtr] Remove uses of CreateGEP() without element type Remove uses of to-be-deprecated API. In cases where the correct element type was not immediately obvious to me, fall back to explicit getPointerElementType().	2021-07-17 22:56:27 +02:00
Nikita Popov	6d3e7c783b	[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type Remove uses of to-be-deprecated API. I've fallen back to calling getPointerElementType() in some cases where the correct type wasn't immediately obvious to me.	2021-07-17 18:32:36 +02:00
Nemanja Ivanovic	35a18a981f	[PowerPC] Implement intrinsics for mtfsf[i] This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`). The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands. Reviewed By: #powerpc, qiucf Differential Revision: https://reviews.llvm.org/D105957	2021-07-16 16:26:11 -05:00
Victor Huang	4eb107ccba	[PowerPC] Add PowerPC population count, reversed load and store related builtins and instrinsics for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtins and instrisics for population count, reversed load and store related operations. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D106021	2021-07-15 17:23:56 -05:00
Artem Belevich	d774b4aa5e	[NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction. That should allow clang to compile mma.h from CUDA-11.3. Differential Revision: https://reviews.llvm.org/D105384	2021-07-15 12:02:09 -07:00
Quinn Pham	de3956605a	[PowerPC] Fix popcntb XL Compat Builtin for 32bit This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins. Reviewed By: #powerpc, nemanjai, amyk Differential Revision: https://reviews.llvm.org/D105360	2021-07-15 13:19:47 -05:00
Victor Huang	d40e8091bd	[PowerPC] Add PowerPC rotate related builtins and emit target independent code for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtins and emit target independent code for rotate related operations. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D104744	2021-07-15 10:23:54 -05:00
Thomas Lively	4a4229f70f	[WebAssembly] Codegen for v128.storeX_lane instructions Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50435. Differential Revision: https://reviews.llvm.org/D106019	2021-07-14 16:15:25 -07:00
Thomas Lively	970e090010	[WebAssembly] Codegen for v128.loadX_lane instructions Replace the experimental clang builtin and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50433. Differential Revision: https://reviews.llvm.org/D105950	2021-07-14 11:31:53 -07:00
Albion Fung	f1aca5ac96	[PowerPC] Fix L[D\|W]ARX Implementation LDARX and LWARX sometimes gets optimized out by the compiler when it is critical to the correctness of the code. This inline asm generation ensures that it preserved. Differential Revision: https://reviews.llvm.org/D105754	2021-07-13 11:02:07 -05:00
Thomas Lively	cbabfc63b1	[WebAssembly] Custom combines for f32x4.demote_zero_f64x2 Replace the clang builtin function and LLVM intrinsic for f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern. Differential Revision: https://reviews.llvm.org/D105755	2021-07-12 10:32:18 -07:00
Thomas Lively	e5220104d0	[WebAssembly] Custom combines for f64x2.promote_low_f32x4 Replace the clang builtin function and LLVM intrinsic previously used to select the f64x2.promote_low_f32x4 instruction with custom combines from standard SelectionDAG nodes. Implement the new combines to share code with the similar combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232. Differential Revision: https://reviews.llvm.org/D105675	2021-07-09 18:59:29 -07:00
Hsiangkai Wang	593bf9b4de	[Clang][RISCV] Implement vlseg and vlsegff. Differential Revision: https://reviews.llvm.org/D103527	2021-07-07 13:44:40 +08:00
Xiang1 Zhang	a39bb960fc	[X86] Refine code of generating BB labels in Keylocker Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D105336	2021-07-05 09:29:51 +08:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Yaxun (Sam) Liu	434bd5bf54	[AMDGPU] Add builtin functions image_bvh_intersect_ray Reviewed by: Stanislav Mekhanoshin, Matt Arsenault Differential Revision: https://reviews.llvm.org/D104946	2021-06-30 13:10:47 -04:00
Melanie Blower	e773216f46	[clang][patch] Add builtin __arithmetic_fence and option fprotect-parens This patch adds a new clang builtin, __arithmetic_fence. The purpose of the builtin is to provide the user fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. The builtin prevents the optimizer from rearranging floating point expression evaluation. The new option fprotect-parens has the same effect on parenthesized expressions, forcing the optimizer to respect the parentheses. Reviewed By: aaron.ballman, kpn Differential Revision: https://reviews.llvm.org/D100118	2021-06-30 09:58:06 -04:00
Steffen Larsen	3644726a78	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0. PTX ISA description of - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D104847	2021-06-29 15:44:07 -07:00
Akira Hatanaka	8d21d54725	[CodeGen] Stop creating fake FunctionDecls when generating IR for functions implicitly generated by the compiler These fake functions would cause clang to crash if the changes proposed in https://reviews.llvm.org/D98799 were made.	2021-06-29 14:22:33 -07:00
Xiang1 Zhang	6d234a6908	[X86] Zero some outputs of Kelocker intrinsics in error case Reviewed By: WangPengfei Differential Revision: https://reviews.llvm.org/D104766	2021-06-29 13:35:40 +08:00
Melanie Blower	c27e5a2a8e	Revert "[clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens" This reverts commit `4f1238e44d`. Buildbot fails on predecessor patch	2021-06-28 12:42:59 -04:00
Melanie Blower	4f1238e44d	[clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens This patch adds a new clang builtin, __arithmetic_fence. The purpose of the builtin is to provide the user fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. The builtin prevents the optimizer from rearranging floating point expression evaluation. The new option fprotect-parens has the same effect on parenthesized expressions, forcing the optimizer to respect the parentheses. Reviewed By: aaron.ballman, kpn Differential Revision: https://reviews.llvm.org/D100118	2021-06-28 12:26:53 -04:00
Jinsong Ji	eb237ffca8	[PowerPC] Add XL Compat fetch builtins Prototype ``` unsigned int __fetch_and_add (volatile unsigned int* addr, unsigned int val); unsigned long __fetch_and_addlp (volatile unsigned long* addr, unsigned long val); ``` Ref: https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-fetch Reviewed By: #powerpc, w2yehia, lkail Differential Revision: https://reviews.llvm.org/D104991	2021-06-28 02:52:32 +00:00
Craig Topper	7a112356e4	[X86] Correct the conversion of VALIGND/Q intrinsics to shufflevector. We need to mask the immediate to the width of a single vector rather than 2 vectors. If we use the width of 2 vectors then any shift larger than the length of 1 vector is going to overflow the shuffle indices. Fixes PR50895.	2021-06-26 19:06:00 -07:00
Jinsong Ji	f3ef4f5bff	[PowerPC] Add XL compat __compare_and_swap builtins Prototype int __compare_and_swap (volatile int* addr, int* old_val_addr, int new_val); int __compare_and_swaplp (volatile long* addr, long* old_val_addr, long new_val); Refer to https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp Reviewed By: w2yehia Differential Revision: https://reviews.llvm.org/D104837	2021-06-25 01:08:48 +00:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Bradley Smith	60c9b5f35c	[AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics Use llvm.experimental.vector.insert instead of storing into an alloca when generating code for these intrinsics. This defers the codegen of the generated vector to instruction selection, allowing existing shufflevector style optimizations to apply. Additionally, introduce a new target transform that can recognise fixed predicate patterns in the svbool variants of these intrinsics. Differential Revision: https://reviews.llvm.org/D103082	2021-06-07 12:21:38 +01:00
Irina Dobrescu	50511df32e	[AArch64] Lower bitreverse in ISel Adding lowering support for bitreverse. Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102397	2021-05-17 13:35:27 +01:00
Roman Lebedev	a624cec56d	[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs As it was discovered in post-commit feedback for `0aa0458f14`, we handle thunks incorrectly, and end up annotating their this/return with attributes that are valid for their callees, not for thunks themselves. While it would be good to fix this properly, and keep annotating them on thunks, i've tried doing that in https://reviews.llvm.org/D100388 with little success, and the patch is stuck for a month now. So for now, as a stopgap measure, subj.	2021-05-13 20:33:08 +03:00
Ahsan Saghir	25bbff632d	[PowerPC] Provide MMA builtins for compatibility Vector pair intrinsics and builtins were renamed in https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_. However, some projects used the _mma_ version, so this patch adds these intrinsics to provide compatibility. Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159 Reviewed By: nemanjai, amyk Differential Revision: https://reviews.llvm.org/D100482	2021-05-07 09:10:16 -05:00
Nemanja Ivanovic	c3da07d216	[PowerPC] Provide fastmath sqrt and div functions in altivec.h This adds the long overdue implementations of these functions that have been part of the ABI document and are now part of the "Power Vector Intrinsic Programming Reference" (PVIPR). The approach is to add new builtins and to emit code with the fast flag regardless of whether fastmath was specified on the command line. Differential revision: https://reviews.llvm.org/D101209	2021-04-30 19:17:48 -05:00
Ryan Santhirarajan	0395f9e70b	[ARM] Neon Polynomial vadd Intrinsic fix The Neon vadd intrinsics were added to the ARMSIMD intrinsic map, however due to being defined under an AArch64 guard in arm_neon.td, were not previously useable on ARM. This change rectifies that. It is important to note that poly128 is not valid on ARM, thus it was extracted out of the original arm_neon.td definition and separated for the sake of AArch64. Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D100772	2021-04-28 11:59:40 -07:00
Levy Hsu	8cf54c7ff5	[RISCV] [1/2] Add IR intrinsic for Zbe extension RV32/64: bcompress bdecompress RV64 ONLY: bcompressw bdecompressw Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101143	2021-04-25 19:14:34 -07:00
Thomas Lively	502f54049d	[WebAssembly] Finalize wasm_simd128.h intrinsics Adds new intrinsics for instructions that are in the final SIMD spec but did not previously have intrinsics. Also updates the names of existing intrinsics to reflect the final names of the underlying instructions in the spec. Keeps the old names as deprecated functions to ease the transition to the new names. Differential Revision: https://reviews.llvm.org/D101112	2021-04-23 13:37:27 -07:00
Levy Hsu	b49337bbb9	[RISCV] [1/2] Add IR intrinsic for Zbp extension RV32/64: grev grevi gorc gorci shfl shfli unshfl unshfli RV64 ONLY: grevw greviw gorcw gorciw shflw shfli (For non-existing shfliw) unshfli (For non-existing unshfliw) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100830	2021-04-22 16:34:51 -07:00
Thomas Lively	5c729750a6	[WebAssembly] Remove saturating fp-to-int target intrinsics Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead. This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u}, which are now represented in IR as a saturating truncation to a v2i32 followed by a concatenation with a zero vector. Differential Revision: https://reviews.llvm.org/D100596	2021-04-16 12:11:20 -07:00
Thomas Lively	6a18cc23ef	[WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u} Removes the builtins and intrinsics used to opt in to using these instructions and replaces them with normal ISel patterns now that they are no longer prototypes. Differential Revision: https://reviews.llvm.org/D100402	2021-04-14 13:43:09 -07:00
Thomas Lively	af7925b4dd	[WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u} Add a custom DAG combine and ISD opcode for detecting patterns like (uint_to_fp (extract_subvector ...)) before the extract_subvector is expanded to ensure that they will ultimately lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions are no longer prototypes and can now be produced via standard IR, this commit also removes the target intrinsics and builtins that had been used to prototype the instructions. Differential Revision: https://reviews.llvm.org/D100425	2021-04-14 10:42:45 -07:00
Thomas Lively	af7ab81ce3	[WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops Now that these instructions are no longer prototypes, we do not need to be careful about keeping them opt-in and can use the standard LLVM infrastructure for them. This commit removes the bespoke intrinsics we were using to represent these operations in favor of the corresponding target-independent intrinsics. The clang builtins are preserved because there is no standard way to easily represent these operations in C/C++. For consistency with the scalar codegen in the Wasm backend, the intrinsic used to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though @llvm.roundeven better captures the semantics of the underlying Wasm instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is left to a potential future patch. Differential Revision: https://reviews.llvm.org/D100411	2021-04-14 09:19:27 -07:00
Yaxun (Sam) Liu	25942d7c49	[AMDGPU] Allow relaxed/consume memory order for atomic inc/dec Reviewed by: Jon Chesterfield Differential Revision: https://reviews.llvm.org/D100144	2021-04-09 09:23:41 -04:00
Simon Pilgrim	2901dc7575	Don't directly dereference getAs<> casts to avoid potential null dereferences. NFCI. Replace with castAs<> which asserts the cast is valid. Fixes a number of static analyzer warnings.	2021-04-06 12:24:19 +01:00
Craig Topper	b4f2e80600	[RISCV] Refactor conversion of B extensions to IR intrinsics a little to reduce clang binary size. These all pass 1 type to getIntrinsic. So rather than assigning IntrinsicTypes for each builtin which invokes the SmallVector constructor, just select the intrinsic ID with a switch and share a single assignment of IntrinsicTypes.	2021-04-02 23:49:44 -07:00
Levy Hsu	f78d932cf2	[RISCV] Add IR intrinsics for Zbc extension Head files are included in a separate patch in case the name needs to be changed. RV32 / 64: clmul clmulh clmulr Differential Revision: https://reviews.llvm.org/D99711	2021-04-02 12:09:13 -07:00
Levy Hsu	944adbf285	Recommit "[RISCV] Add IR intrinsic for Zbb extension" Forgot to amend the Author. Original commit message: Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b Differential Revision: https://reviews.llvm.org/D99320	2021-04-02 11:50:19 -07:00
Craig Topper	1f0b309f24	Revert "[RISCV] Add IR intrinsic for Zbb extension" This reverts commit `1808194590`. I forgot to change the author.	2021-04-02 11:47:02 -07:00
Craig Topper	1808194590	[RISCV] Add IR intrinsic for Zbb extension Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b	2021-04-02 11:23:57 -07:00
Levy Hsu	b001d574d7	[RISCV] Add IR intrinsic for Zbr extension Implementation for RISC-V Zbr extension intrinsic. Header files are included in separate patch in case the name needs to be changed RV32 / 64: crc32b crc32h crc32w crc32cb crc32ch crc32cw RV64 Only: crc32d crc32cd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99009	2021-04-02 10:58:45 -07:00
Thomas Lively	45783d0e8a	[WebAssembly] Implement i64x2 comparisons Removes the prototype builtin and intrinsic for i64x2.eq and implements that instruction as well as the other i64x2 comparison instructions in the final SIMD spec. Unsigned comparisons were not included in the final spec, so they still need to be scalarized via a custom lowering. Differential Revision: https://reviews.llvm.org/D99623	2021-03-31 10:46:17 -07:00
Yaxun (Sam) Liu	cc9477166a	[CUDA][HIP] add __builtin_get_device_side_mangled_name Add builtin function __builtin_get_device_side_mangled_name to get device side manged name for functions and global variables, which can be used to get symbol address of kernels or variables by mangled name in dynamically loaded bundled code objects at run time. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D99301	2021-03-25 15:25:29 -04:00

1 2 3 4 5 ...

1640 Commits