Commit Graph

1430 Commits

Author SHA1 Message Date
Thomas Lively cbabfc63b1 [WebAssembly] Custom combines for f32x4.demote_zero_f64x2
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.

Differential Revision: https://reviews.llvm.org/D105755
2021-07-12 10:32:18 -07:00
Thomas Lively e5220104d0 [WebAssembly] Custom combines for f64x2.promote_low_f32x4
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.

Differential Revision: https://reviews.llvm.org/D105675
2021-07-09 18:59:29 -07:00
Hsiangkai Wang 593bf9b4de [Clang][RISCV] Implement vlseg and vlsegff.
Differential Revision: https://reviews.llvm.org/D103527
2021-07-07 13:44:40 +08:00
Xiang1 Zhang a39bb960fc [X86] Refine code of generating BB labels in Keylocker
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105336
2021-07-05 09:29:51 +08:00
Nikita Popov fabc17192e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Yaxun (Sam) Liu 434bd5bf54 [AMDGPU] Add builtin functions image_bvh_intersect_ray
Reviewed by: Stanislav Mekhanoshin, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D104946
2021-06-30 13:10:47 -04:00
Melanie Blower e773216f46 [clang][patch] Add builtin __arithmetic_fence and option fprotect-parens
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.

Reviewed By: aaron.ballman, kpn

Differential Revision: https://reviews.llvm.org/D100118
2021-06-30 09:58:06 -04:00
Steffen Larsen 3644726a78 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0.

PTX ISA description of

  - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld
  - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st
  - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
  - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma

Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D104847
2021-06-29 15:44:07 -07:00
Akira Hatanaka 8d21d54725 [CodeGen] Stop creating fake FunctionDecls when generating IR for
functions implicitly generated by the compiler

These fake functions would cause clang to crash if the changes proposed
in https://reviews.llvm.org/D98799 were made.
2021-06-29 14:22:33 -07:00
Xiang1 Zhang 6d234a6908 [X86] Zero some outputs of Kelocker intrinsics in error case
Reviewed By: WangPengfei

Differential Revision: https://reviews.llvm.org/D104766
2021-06-29 13:35:40 +08:00
Melanie Blower c27e5a2a8e Revert "[clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens"
This reverts commit 4f1238e44d.
Buildbot fails on predecessor patch
2021-06-28 12:42:59 -04:00
Melanie Blower 4f1238e44d [clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.

Reviewed By: aaron.ballman, kpn

Differential Revision: https://reviews.llvm.org/D100118
2021-06-28 12:26:53 -04:00
Jinsong Ji eb237ffca8 [PowerPC] Add XL Compat fetch builtins
Prototype
```
unsigned int __fetch_and_add (volatile unsigned int* addr, unsigned int
val);
unsigned long __fetch_and_addlp (volatile unsigned long* addr, unsigned
long val);
```
Ref:
https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-fetch

Reviewed By: #powerpc, w2yehia, lkail

Differential Revision: https://reviews.llvm.org/D104991
2021-06-28 02:52:32 +00:00
Craig Topper 7a112356e4 [X86] Correct the conversion of VALIGND/Q intrinsics to shufflevector.
We need to mask the immediate to the width of a single vector
rather than 2 vectors. If we use the width of 2 vectors then
any shift larger than the length of 1 vector is going to overflow
the shuffle indices.

Fixes PR50895.
2021-06-26 19:06:00 -07:00
Jinsong Ji f3ef4f5bff [PowerPC] Add XL compat __compare_and_swap builtins
Prototype
int __compare_and_swap (volatile int* addr, int* old_val_addr, int
new_val);

int __compare_and_swaplp (volatile long* addr, long* old_val_addr, long
new_val);

Refer to
https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp

Reviewed By: w2yehia

Differential Revision: https://reviews.llvm.org/D104837
2021-06-25 01:08:48 +00:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Bradley Smith 60c9b5f35c [AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.

Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.

Differential Revision: https://reviews.llvm.org/D103082
2021-06-07 12:21:38 +01:00
Irina Dobrescu 50511df32e [AArch64] Lower bitreverse in ISel
Adding lowering support for bitreverse.

Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102397
2021-05-17 13:35:27 +01:00
Roman Lebedev a624cec56d
[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs
As it was discovered in post-commit feedback
for 0aa0458f14,
we handle thunks incorrectly, and end up annotating
their this/return with attributes that are valid
for their callees, not for thunks themselves.

While it would be good to fix this properly,
and keep annotating them on thunks,
i've tried doing that in https://reviews.llvm.org/D100388
with little success, and the patch is stuck for a month now.

So for now, as a stopgap measure, subj.
2021-05-13 20:33:08 +03:00
Ahsan Saghir 25bbff632d [PowerPC] Provide MMA builtins for compatibility
Vector pair intrinsics and builtins were renamed in
https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_.
However, some projects used the _mma_ version, so this patch adds
these intrinsics to provide compatibility.

Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159

Reviewed By: nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D100482
2021-05-07 09:10:16 -05:00
Nemanja Ivanovic c3da07d216 [PowerPC] Provide fastmath sqrt and div functions in altivec.h
This adds the long overdue implementations of these functions
that have been part of the ABI document and are now part of
the "Power Vector Intrinsic Programming Reference" (PVIPR).

The approach is to add new builtins and to emit code with
the fast flag regardless of whether fastmath was specified
on the command line.

Differential revision: https://reviews.llvm.org/D101209
2021-04-30 19:17:48 -05:00
Ryan Santhirarajan 0395f9e70b [ARM] Neon Polynomial vadd Intrinsic fix
The Neon vadd intrinsics were added to the ARMSIMD intrinsic map,
however due to being defined under an AArch64 guard in arm_neon.td,
were not previously useable on ARM. This change rectifies that.

It is important to note that poly128 is not valid on ARM, thus it was
extracted out of the original arm_neon.td definition and separated
for the sake of AArch64.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D100772
2021-04-28 11:59:40 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Thomas Lively 502f54049d [WebAssembly] Finalize wasm_simd128.h intrinsics
Adds new intrinsics for instructions that are in the final SIMD spec but did not
previously have intrinsics. Also updates the names of existing intrinsics to
reflect the final names of the underlying instructions in the spec. Keeps the
old names as deprecated functions to ease the transition to the new names.

Differential Revision: https://reviews.llvm.org/D101112
2021-04-23 13:37:27 -07:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Thomas Lively 5c729750a6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Thomas Lively 6a18cc23ef [WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u}
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.

Differential Revision: https://reviews.llvm.org/D100402
2021-04-14 13:43:09 -07:00
Thomas Lively af7925b4dd [WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u}
Add a custom DAG combine and ISD opcode for detecting patterns like

  (uint_to_fp (extract_subvector ...))

before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.

Differential Revision: https://reviews.llvm.org/D100425
2021-04-14 10:42:45 -07:00
Thomas Lively af7ab81ce3 [WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.

For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.

Differential Revision: https://reviews.llvm.org/D100411
2021-04-14 09:19:27 -07:00
Yaxun (Sam) Liu 25942d7c49 [AMDGPU] Allow relaxed/consume memory order for atomic inc/dec
Reviewed by: Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D100144
2021-04-09 09:23:41 -04:00
Simon Pilgrim 2901dc7575 Don't directly dereference getAs<> casts to avoid potential null dereferences. NFCI.
Replace with castAs<> which asserts the cast is valid.

Fixes a number of static analyzer warnings.
2021-04-06 12:24:19 +01:00
Craig Topper b4f2e80600 [RISCV] Refactor conversion of B extensions to IR intrinsics a little to reduce clang binary size.
These all pass 1 type to getIntrinsic. So rather than assigning
IntrinsicTypes for each builtin which invokes the SmallVector
constructor, just select the intrinsic ID with a switch and
share a single assignment of IntrinsicTypes.
2021-04-02 23:49:44 -07:00
Levy Hsu f78d932cf2 [RISCV] Add IR intrinsics for Zbc extension
Head files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
clmul
clmulh
clmulr

Differential Revision: https://reviews.llvm.org/D99711
2021-04-02 12:09:13 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Levy Hsu b001d574d7 [RISCV] Add IR intrinsic for Zbr extension
Implementation for RISC-V Zbr extension intrinsic.

Header files are included in separate patch in case the name needs to be changed

RV32 / 64:
        crc32b
        crc32h
        crc32w
        crc32cb
        crc32ch
        crc32cw

RV64 Only:
        crc32d
        crc32cd

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99009
2021-04-02 10:58:45 -07:00
Thomas Lively 45783d0e8a [WebAssembly] Implement i64x2 comparisons
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.

Differential Revision: https://reviews.llvm.org/D99623
2021-03-31 10:46:17 -07:00
Yaxun (Sam) Liu cc9477166a [CUDA][HIP] add __builtin_get_device_side_mangled_name
Add builtin function __builtin_get_device_side_mangled_name
to get device side manged name for functions and global
variables, which can be used to get symbol address of kernels
or variables by mangled name in dynamically loaded
bundled code objects at run time.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99301
2021-03-25 15:25:29 -04:00
Nemanja Ivanovic 4020932706 [PowerPC] Make altivec.h work with AIX which has no __int128
There are a number of functions in altivec.h that use
vector __int128 which isn't supported on AIX. Those functions
need to be guarded for targets that don't support the type.
Furthermore, the functions that produce quadword instructions
without using the type need a builtin. This patch adds the
macro guards to altivec.h using the __SIZEOF_INT128__ which
is only defined on targets that support the __int128 type.
2021-03-24 00:35:51 -05:00
Thomas Lively f5764a8654 [WebAssembly] Finalize SIMD names and opcodes
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.

Depends on D98466.

Differential Revision: https://reviews.llvm.org/D98676
2021-03-18 11:21:25 -07:00
Thomas Lively 2f2ae08da9 [WebAssembly] Remove experimental SIMD instructions
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.

Depends on D98457.

Differential Revision: https://reviews.llvm.org/D98466
2021-03-18 11:21:24 -07:00
Bradley Smith cf0da91ba5 [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.

Differential Revision: https://reviews.llvm.org/D98487
2021-03-17 11:41:22 +00:00
Amy Huang f5352dd9da Emit inline implementation of __builtin__wmemchr on MSVCRT platforms.
The MSVC runtime library doesn't have a definition for wmemchr,
so provide an inline implementation.

Differential Revision: https://reviews.llvm.org/D98472
2021-03-15 15:30:55 -07:00
Stelios Ioannou ab86edbc88 [AArch64] Implement __rndr, __rndrrs intrinsics
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.

These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.

The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838

Differential Revision: https://reviews.llvm.org/D98264

Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
2021-03-15 17:51:48 +00:00
Thomas Preud'homme f60b35340f Stop traping on sNaN in __builtin_isinf
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.

Reviewed By: mibintc

Differential Revision: https://reviews.llvm.org/D97125
2021-03-15 15:38:08 +00:00
Nikita Popov 42eb658f65 [OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
2021-03-12 21:01:16 +01:00
Nikita Popov 46354bac76 [OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)
Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.
2021-03-11 14:40:57 +01:00
Nikita Popov 68e01339cc [CGBuilder] Remove type-less CreateAlignedLoad() APIs (NFC)
These are incompatible with opaque pointers. This is in preparation
of dropping this API on the IRBuilder side as well.

Instead explicitly pass the loaded type.
2021-03-11 10:41:23 +01:00