Commit Graph

1464 Commits

Author SHA1 Message Date
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Bradley Smith 60c9b5f35c [AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.

Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.

Differential Revision: https://reviews.llvm.org/D103082
2021-06-07 12:21:38 +01:00
Irina Dobrescu 50511df32e [AArch64] Lower bitreverse in ISel
Adding lowering support for bitreverse.

Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102397
2021-05-17 13:35:27 +01:00
Roman Lebedev a624cec56d
[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs
As it was discovered in post-commit feedback
for 0aa0458f14,
we handle thunks incorrectly, and end up annotating
their this/return with attributes that are valid
for their callees, not for thunks themselves.

While it would be good to fix this properly,
and keep annotating them on thunks,
i've tried doing that in https://reviews.llvm.org/D100388
with little success, and the patch is stuck for a month now.

So for now, as a stopgap measure, subj.
2021-05-13 20:33:08 +03:00
Ahsan Saghir 25bbff632d [PowerPC] Provide MMA builtins for compatibility
Vector pair intrinsics and builtins were renamed in
https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_.
However, some projects used the _mma_ version, so this patch adds
these intrinsics to provide compatibility.

Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159

Reviewed By: nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D100482
2021-05-07 09:10:16 -05:00
Nemanja Ivanovic c3da07d216 [PowerPC] Provide fastmath sqrt and div functions in altivec.h
This adds the long overdue implementations of these functions
that have been part of the ABI document and are now part of
the "Power Vector Intrinsic Programming Reference" (PVIPR).

The approach is to add new builtins and to emit code with
the fast flag regardless of whether fastmath was specified
on the command line.

Differential revision: https://reviews.llvm.org/D101209
2021-04-30 19:17:48 -05:00
Ryan Santhirarajan 0395f9e70b [ARM] Neon Polynomial vadd Intrinsic fix
The Neon vadd intrinsics were added to the ARMSIMD intrinsic map,
however due to being defined under an AArch64 guard in arm_neon.td,
were not previously useable on ARM. This change rectifies that.

It is important to note that poly128 is not valid on ARM, thus it was
extracted out of the original arm_neon.td definition and separated
for the sake of AArch64.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D100772
2021-04-28 11:59:40 -07:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Thomas Lively 502f54049d [WebAssembly] Finalize wasm_simd128.h intrinsics
Adds new intrinsics for instructions that are in the final SIMD spec but did not
previously have intrinsics. Also updates the names of existing intrinsics to
reflect the final names of the underlying instructions in the spec. Keeps the
old names as deprecated functions to ease the transition to the new names.

Differential Revision: https://reviews.llvm.org/D101112
2021-04-23 13:37:27 -07:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Thomas Lively 5c729750a6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Thomas Lively 6a18cc23ef [WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u}
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.

Differential Revision: https://reviews.llvm.org/D100402
2021-04-14 13:43:09 -07:00
Thomas Lively af7925b4dd [WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u}
Add a custom DAG combine and ISD opcode for detecting patterns like

  (uint_to_fp (extract_subvector ...))

before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.

Differential Revision: https://reviews.llvm.org/D100425
2021-04-14 10:42:45 -07:00
Thomas Lively af7ab81ce3 [WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.

For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.

Differential Revision: https://reviews.llvm.org/D100411
2021-04-14 09:19:27 -07:00
Yaxun (Sam) Liu 25942d7c49 [AMDGPU] Allow relaxed/consume memory order for atomic inc/dec
Reviewed by: Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D100144
2021-04-09 09:23:41 -04:00
Simon Pilgrim 2901dc7575 Don't directly dereference getAs<> casts to avoid potential null dereferences. NFCI.
Replace with castAs<> which asserts the cast is valid.

Fixes a number of static analyzer warnings.
2021-04-06 12:24:19 +01:00
Craig Topper b4f2e80600 [RISCV] Refactor conversion of B extensions to IR intrinsics a little to reduce clang binary size.
These all pass 1 type to getIntrinsic. So rather than assigning
IntrinsicTypes for each builtin which invokes the SmallVector
constructor, just select the intrinsic ID with a switch and
share a single assignment of IntrinsicTypes.
2021-04-02 23:49:44 -07:00
Levy Hsu f78d932cf2 [RISCV] Add IR intrinsics for Zbc extension
Head files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
clmul
clmulh
clmulr

Differential Revision: https://reviews.llvm.org/D99711
2021-04-02 12:09:13 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Levy Hsu b001d574d7 [RISCV] Add IR intrinsic for Zbr extension
Implementation for RISC-V Zbr extension intrinsic.

Header files are included in separate patch in case the name needs to be changed

RV32 / 64:
        crc32b
        crc32h
        crc32w
        crc32cb
        crc32ch
        crc32cw

RV64 Only:
        crc32d
        crc32cd

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99009
2021-04-02 10:58:45 -07:00
Thomas Lively 45783d0e8a [WebAssembly] Implement i64x2 comparisons
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.

Differential Revision: https://reviews.llvm.org/D99623
2021-03-31 10:46:17 -07:00
Yaxun (Sam) Liu cc9477166a [CUDA][HIP] add __builtin_get_device_side_mangled_name
Add builtin function __builtin_get_device_side_mangled_name
to get device side manged name for functions and global
variables, which can be used to get symbol address of kernels
or variables by mangled name in dynamically loaded
bundled code objects at run time.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99301
2021-03-25 15:25:29 -04:00
Nemanja Ivanovic 4020932706 [PowerPC] Make altivec.h work with AIX which has no __int128
There are a number of functions in altivec.h that use
vector __int128 which isn't supported on AIX. Those functions
need to be guarded for targets that don't support the type.
Furthermore, the functions that produce quadword instructions
without using the type need a builtin. This patch adds the
macro guards to altivec.h using the __SIZEOF_INT128__ which
is only defined on targets that support the __int128 type.
2021-03-24 00:35:51 -05:00
Thomas Lively f5764a8654 [WebAssembly] Finalize SIMD names and opcodes
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.

Depends on D98466.

Differential Revision: https://reviews.llvm.org/D98676
2021-03-18 11:21:25 -07:00
Thomas Lively 2f2ae08da9 [WebAssembly] Remove experimental SIMD instructions
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.

Depends on D98457.

Differential Revision: https://reviews.llvm.org/D98466
2021-03-18 11:21:24 -07:00
Bradley Smith cf0da91ba5 [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.

Differential Revision: https://reviews.llvm.org/D98487
2021-03-17 11:41:22 +00:00
Amy Huang f5352dd9da Emit inline implementation of __builtin__wmemchr on MSVCRT platforms.
The MSVC runtime library doesn't have a definition for wmemchr,
so provide an inline implementation.

Differential Revision: https://reviews.llvm.org/D98472
2021-03-15 15:30:55 -07:00
Stelios Ioannou ab86edbc88 [AArch64] Implement __rndr, __rndrrs intrinsics
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.

These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.

The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838

Differential Revision: https://reviews.llvm.org/D98264

Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
2021-03-15 17:51:48 +00:00
Thomas Preud'homme f60b35340f Stop traping on sNaN in __builtin_isinf
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.

Reviewed By: mibintc

Differential Revision: https://reviews.llvm.org/D97125
2021-03-15 15:38:08 +00:00
Nikita Popov 42eb658f65 [OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
2021-03-12 21:01:16 +01:00
Nikita Popov 46354bac76 [OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)
Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.
2021-03-11 14:40:57 +01:00
Nikita Popov 68e01339cc [CGBuilder] Remove type-less CreateAlignedLoad() APIs (NFC)
These are incompatible with opaque pointers. This is in preparation
of dropping this API on the IRBuilder side as well.

Instead explicitly pass the loaded type.
2021-03-11 10:41:23 +01:00
Zakk Chen d6a0560bf2 [Clang][RISCV] Add custom TableGen backend for riscv-vector intrinsics.
Demonstrate how to generate vadd/vfadd intrinsic functions

1. add -gen-riscv-vector-builtins for clang builtins.
2. add -gen-riscv-vector-builtin-codegen for clang codegen.
3. add -gen-riscv-vector-header for riscv_vector.h. It also generates
ifdef directives with extension checking, base on D94403.
4. add -gen-riscv-vector-generic-header for riscv_vector_generic.h.
Generate overloading version Header for generic api.
https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#c11-generic-interface
5. update tblgen doc for riscv related options.

riscv_vector.td also defines some unused type transformers for vadd,
because I think it could demonstrate how tranfer type work and we need
them for the whole intrinsic functions implementation in the future.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>

Reviewed By: jrtc27, craig.topper, HsiangKai, Jim, Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D95016
2021-03-10 18:43:43 -08:00
Jingu Kang 25951c5ab8 [AArch64] Add missing intrinsics for scalar FP rounding
Differential Revision: https://reviews.llvm.org/D98269
2021-03-10 13:22:29 +00:00
Jingu Kang 9b302513f6 [AArch64] Add missing intrinsics for vrnd 2021-03-05 11:26:12 +00:00
Thomas Preud'homme b7aeece47c Revert "Stop traping on sNaN in __builtin_isinf"
This reverts commit 1b6eb56aa0 because the
invert logic for isfinite is incorrect.
2021-03-04 12:07:35 +00:00
Soumi Manna eec7f8f7b1 [WebAssembly] Add missing default cases in switch statements
unsigned variable 'IntNo' has been declared but not been defined inside function
EmitWebAssemblyBuiltinExpr().

static code analysis tool complains about uninitialized variable "IntNo" since
this enters to default branch without setting any intrinsics and calls Function
*Callee = CGM.getIntrinsic(IntNo).

This patch fixes the problem by adding default cases in switch statements.
2021-03-03 13:15:23 -08:00
Thomas Preud'homme 1b6eb56aa0 Stop traping on sNaN in __builtin_isinf
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.

Reviewed By: mibintc

Differential Revision: https://reviews.llvm.org/D97125
2021-03-02 15:54:56 +00:00
Hsiangkai Wang 1a35a1b074 [RISCV] Add vadd with mask and without mask builtin.
Demonstrate how to add RISC-V V builtins and lower them to IR intrinsics for V extension.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>

Differential Revision: https://reviews.llvm.org/D93446
2021-02-24 07:57:31 +08:00
Ryan Santhiraraja 2c25efcbd3 [AArch64] Adding SHA3 Intrinsics support
This patch adds the following SHA3 Intrinsics:
        vsha512hq_u64,
        vsha512h2q_u64,
        vsha512su0q_u64,
        vsha512su1q_u64
        veor3q_u8
        veor3q_u16
        veor3q_u32
        veor3q_u64
        veor3q_s8
        veor3q_s16
        veor3q_s32
        veor3q_s64
        vrax1q_u64
        vxarq_u64
        vbcaxq_u8
        vbcaxq_u16
        vbcaxq_u32
        vbcaxq_u64
        vbcaxq_s8
        vbcaxq_s16
        vbcaxq_s32
        vbcaxq_s64

    Note need to include +sha3 and +crypto when building from the front-end

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D96381
2021-02-22 12:09:20 +00:00
Christopher Tetreault 55448ab540 [AArch64] Adding Neon Polynomial vadd Intrinsics
This patch adds the following intrinsics:
            vadd_p8
            vadd_p16
            vadd_p64
            vaddq_p8
            vaddq_p16
            vaddq_p64
            vaddq_p128

Reviewed By: t.p.northover, DavidSpickett, ctetreau

Differential Revision: https://reviews.llvm.org/D96825
2021-02-19 14:48:12 -08:00
Pengxuan Zheng 0ec32f1326 Revert "[AArch64] Adding Neon Polynomial vadd Intrinsics"
Revert the patch due to buildbot failures.

This reverts commit d9645059c5.
2021-02-18 12:38:16 -08:00
Pengxuan Zheng d9645059c5 [AArch64] Adding Neon Polynomial vadd Intrinsics
This patch adds the following intrinsics:
            vadd_p8
            vadd_p16
            vadd_p64
            vaddq_p8
            vaddq_p16
            vaddq_p64
            vaddq_p128

Reviewed By: t.p.northover, DavidSpickett

Differential Revision: https://reviews.llvm.org/D96825
2021-02-18 11:33:24 -08:00
Jonas Paulsson e57bd1ff4f [CFE, SystemZ] New target hook testFPKind() for checks of FP values.
The recent commit 00a6254 "Stop traping on sNaN in builtin_isnan" changed the
lowering in constrained FP mode of builtin_isnan from an FP comparison to
integer operations to avoid trapping.

SystemZ has a special instruction "Test Data Class" which is the preferred
way to do this check. This patch adds a new target hook "testFPKind()" that
lets SystemZ emit the s390_tdc intrinsic instead.

testFPKind() takes the BuiltinID as an argument and is expected to soon
handle more opcodes than just 'builtin_isnan'.

Review: Thomas Preud'homme, Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D96568
2021-02-18 12:36:46 -06:00
Wang, Pengfei 61da20575d [X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
This is a follow up of D92940.

We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93179
2021-02-15 08:52:06 +08:00
Pengxuan Zheng 61cca0f2e5 [AArch64] Adding Neon Sm3 & Sm4 Intrinsics
This adds SM3 and SM4 Intrinsics support for AArch64, specifically:
        vsm3ss1q_u32
        vsm3tt1aq_u32
        vsm3tt1bq_u32
        vsm3tt2aq_u32
        vsm3tt2bq_u32
        vsm3partw1q_u32
        vsm3partw2q_u32
        vsm4eq_u32
        vsm4ekeyq_u32

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D95655
2021-02-11 14:20:20 -08:00
Wang, Pengfei dd2460ed5d [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96231
2021-02-09 21:14:06 +08:00
Thomas Preud'homme 00a62547da Stop traping on sNaN in __builtin_isnan
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.

Reviewed By: kpn

Differential Revision: https://reviews.llvm.org/D95948
2021-02-05 18:28:48 +00:00
Kevin P. Neal 81b69879c9 [FPEnv][X86] Platform builtins edition: clang should get from the AST the metadata for constrained FP builtins
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the X86 specific builtins.

Differential Revision: https://reviews.llvm.org/D94614
2021-02-03 11:49:17 -05:00
Thomas Lively 4b68b64dcc [WebAssembly] Prototype i8x16 to i32x4 widening instructions
As proposed in https://github.com/WebAssembly/simd/pull/395 and matching the
opcodes used in V8:
https://chromium-review.googlesource.com/c/v8/v8/+/2617385/4/src/wasm/wasm-opcodes.h

Differential Revision: https://reviews.llvm.org/D95557
2021-01-28 10:59:32 -08:00
Thomas Lively 11802eced5 [WebAssembly] Prototype new f64x2 conversions
As proposed in https://github.com/WebAssembly/simd/pull/383.

Differential Revision: https://reviews.llvm.org/D95012
2021-01-20 11:28:06 -08:00
Qiu Chaofan 168be42083 [Clang] Mutate long-double math builtins into f128 under IEEE-quad
Under -mabi=ieeelongdouble on PowerPC, IEEE-quad floating point semantic
is used for long double. This patch mutates call to related builtins
into f128 version on PowerPC. And in theory, this should be applied to
other targets when their backend supports IEEE 128-bit style libcalls.

GCC already has these mutations except nansl, which is not available on
PowerPC along with other variants (nans, nansf).

Reviewed By: RKSimon, nemanjai

Differential Revision: https://reviews.llvm.org/D92080
2021-01-15 16:56:20 +08:00
Lucas Prates 2b1e25befe [AArch64] Adding ACLE intrinsics for the LS64 extension
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.

Based on patches written by Simon Tatham.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D93232
2021-01-14 09:43:58 +00:00
Heejin Ahn 7be271537e [WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
  foo();
} catch (int n) {
  ...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D94038
2021-01-08 06:55:04 -08:00
Wang, Pengfei c102b9697b [X86] Correct the comments about comparison intrinsics. NFCI. 2021-01-08 15:36:15 +08:00
Thomas Lively 497026c902 [WebAssembly] Prototype prefetch instructions
As proposed in https://github.com/WebAssembly/simd/pull/352 and using the
opcodes used in the V8 prototype:
https://chromium-review.googlesource.com/c/v8/v8/+/2543167. These instructions
are only usable via intrinsics and clang builtins to make them opt-in while they
are being benchmarked.

Differential Revision: https://reviews.llvm.org/D93883
2021-01-05 11:32:03 -08:00
Thorsten Schütt 2fd11e0b1e Revert "[NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II)"
This reverts commit efc82c4ad2.
2021-01-04 23:17:45 +01:00
Thorsten Schütt efc82c4ad2 [NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II)
Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D93765
2021-01-04 22:58:26 +01:00
Brandon Bergren 6cee9d0cf8 [PowerPC] Support powerpcle target in Clang [3/5]
Add powerpcle support to clang.

For FreeBSD, assume a freestanding environment for now, as we only need it in the first place to build loader, which runs in the OpenFirmware environment instead of the FreeBSD environment.

For Linux, recognize glibc and musl environments to match current usage in Void Linux PPC.

Adjust driver to match current binutils behavior regarding machine naming.

Adjust and expand tests.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D93919
2021-01-02 12:17:58 -06:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Thomas Lively 5e09e9979b [WebAssembly] Prototype extending pairwise add instructions
As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes
the new instructions available only via clang builtins and LLVM intrinsics to
make their use opt-in while they are still being evaluated for inclusion in the
SIMD proposal.

Depends on D93771.

Differential Revision: https://reviews.llvm.org/D93775
2020-12-28 14:11:14 -08:00
Tom Stellard 3203143f13 CodeGen: Improve generated IR for __builtin_mul_overflow(uint, uint, int)
Add a special case for handling __builtin_mul_overflow with unsigned
inputs and a signed output to avoid emitting the __muloti4 library
call on x86_64.  __muloti4 is not implemented in libgcc, so avoiding
this call fixes compilation of some programs that call
__builtin_mul_overflow with these arguments.

For example, this fixes the build of cpio with clang, which includes code from
gnulib that calls __builtin_mul_overflow with these argument types.

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D84405
2020-12-17 14:30:31 -08:00
Baptiste Saleil c2892978e9 [PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.

Differential Revision: https://reviews.llvm.org/D91974
2020-12-17 13:19:27 -05:00
Simon Pilgrim 4855a1004d [X86] Convert fadd/fmul _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.

Differential Revision: https://reviews.llvm.org/D92940
2020-12-13 15:37:35 +00:00
Melanie Blower 320af6b138 Create SPIRABIInfo to enable SPIR_FUNC calling convention.
Background: Call to library arithmetic functions for div is emitted by the
compiler and it set wrong “C” calling convention for calls to these functions,
whereas library functions are declared with `spir_function` calling convention.
InstCombine optimization replaces such calls with “unreachable” instruction.
It looks like clang lacks SPIRABIInfo class which should specify default
calling conventions for “system” function calls. SPIR supports only
SPIR_FUNC and SPIR_KERNEL calling convention.

Reviewers: Erich Keane, Anastasia

Differential Revision: https://reviews.llvm.org/D92721
2020-12-12 05:48:20 -08:00
Florian Hahn 9c4cddb53a
[Clang] Add vcmla and rotated variants for Arm ACLE.
This patch adds vcmla and the rotated variants as defined in
"Arm Neon Intrinsics Reference for ACLE Q3 2020" [1]

The *_lane_* are still missing, but they can be added separately.

This patch only adds the builtin mapping for AArch64.

[1] https://developer.arm.com/documentation/ihi0073/latest

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D92930
2020-12-10 16:54:08 +00:00
Kevin P. Neal abfbc5579b [FPEnv] clang should get from the AST the metadata for constrained FP builtins
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the non-target specific
builtins.

Differential Revision: https://reviews.llvm.org/D92122
2020-11-30 11:59:37 -05:00
Reid Kleckner 1e843a987d [MS] Add more 128bit cmpxchg intrinsics for AArch64
The MSVC STL for requires this on ARM64.
Requested in https://llvm.org/pr47099

Depends on D92061

Differential Revision: https://reviews.llvm.org/D92062
2020-11-25 12:07:28 -08:00
Reid Kleckner 3bd0672726 [MS] Fix double evaluation of MSVC builtin arguments
This code got quite twisted because we consider some MSVC builtins to be
target agnostic, and some to be target specific. Target specific
intrinsics have a pattern of doing up-front argument evaluation, while
general intrinsics do not evaluate their arguments up front. As we tried
to share codepaths between the target-specific and target-agnostic
handling, we ended up doing double evaluation.

Instead, have each target handle MSVC intrinsics consistently before up
front argument evaluation. This requires passing less data around and is
more consistent with target independent intrinsic handling.

See D50979 for past examples of this bug. I noticed this while looking
into adding some more intrinsics.

Differential Revision: https://reviews.llvm.org/D92061
2020-11-25 11:55:01 -08:00
Florian Hahn ca2e7e5999 [IRGen] Add !annotation metadata for auto-init stores.
This patch updates Clang's IRGen to add !annotation nodes with an
"auto-init" annotation to all stores for auto-initialization.

As discussed in 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
this allows using optimization remarks to track down where auto-init
code was inserted (and not removed by optimizations).

There are a few cases in the tests where !annotation gets dropped by
optimizations. Those optimizations will be updated in subsequent
patches.

This patch is based on a patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, paquette

Differential Revision: https://reviews.llvm.org/D91417
2020-11-16 10:37:02 +00:00
Mehdi Amini 42e88bd6b1 Replace sequences of v.push_back(v[i]); v.erase(&v[i]); with std::rotate (NFC)
The code has a few sequence that looked like:

    Ops.push_back(Ops[0]);
    Ops.erase(Ops.begin());

And are equivalent to:

    std::rotate(Ops.begin(), Ops.begin() + 1, Ops.end());

The latter has the advantage of never reallocating the vector, which
would be a bug in the original code as push_back would read from the
memory it deallocated.
2020-11-14 00:55:33 +00:00
Heejin Ahn 902ea588ea [WebAssembly] Rename atomic.notify and *.atomic.wait
- atomic.notify -> memory.atomic.notify
- i32.atomic.wait -> memory.atomic.wait32
- i64.atomic.wait -> memory.atomic.wait64

See https://github.com/WebAssembly/threads/pull/149.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D91447
2020-11-13 12:04:48 -08:00
Baptiste Saleil 3f78605a8c [PowerPC] Add paired vector load and store builtins and intrinsics
This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs.

Differential Revision: https://reviews.llvm.org/D90799
2020-11-13 12:35:10 -06:00
Qiu Chaofan 7faf62a80b [Clang] Add more fp128 math library function builtins
Since glibc has supported math library functions conforming IEEE 128-bit
floating point types on some platform (like ppc64le), we can fix clang's
math builtins missing this type.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D90593
2020-11-04 17:58:42 +08:00
Baptiste Saleil daa127d77e [PowerPC] Add MMA builtin decoding and definitions
Add MMA builtin decoding. These builtins use the new PowerPC-specific types __vector_pair and __vector_quad.
So to avoid pervasive changes, we use custom type descriptors and custom decoding for these builtins.
We also use custom code generation to expand builtin calls with pointers to simpler intrinsic calls with non-pointer types.

Differential Revision: https://reviews.llvm.org/D81748
2020-11-03 15:08:46 -06:00
Thomas Lively a787e09779 [WebAssembly] Prototype i64x2.bitmask
As proposed in https://github.com/WebAssembly/simd/pull/368.

Differential Revision: https://reviews.llvm.org/D90514
2020-10-30 17:23:30 -07:00
Thomas Lively 0a512a555a [WebAssembly] Prototype i64x2.eq
As proposed in https://github.com/WebAssembly/simd/pull/381. Since it is still
in the prototyping phase, it is only accessible via a target builtin function
and a target intrinsic.

Depends on D90504.

Differential Revision: https://reviews.llvm.org/D90508
2020-10-30 16:38:15 -07:00
Thomas Lively 1cb0b56607 [WebAssembly] Prototype i64x2.widen_{low,high}_i32x4_{s,u}
As proposed in https://github.com/WebAssembly/simd/pull/290. As usual, these
instructions are available only via builtin functions and intrinsics while they
are in the prototyping stage.

Differential Revision: https://reviews.llvm.org/D90504
2020-10-30 15:44:04 -07:00
Thomas Lively be6f50798e [WebAssembly] Implement SIMD signselect instructions
As proposed in https://github.com/WebAssembly/simd/pull/124, using the opcodes
adopted by V8 in
https://chromium-review.googlesource.com/c/v8/v8/+/2486235/2/src/wasm/wasm-opcodes.h.
Uses new builtin functions and a new target intrinsic exclusively to ensure that
the new instructions are only emitted when a user explicitly opts in to using
them since they are still in the prototyping and evaluation phase.

Differential Revision: https://reviews.llvm.org/D90357
2020-10-29 11:06:20 -07:00
Jon Chesterfield dee7704829 [AMDGPU] Add __builtin_amdgcn_grid_size
[AMDGPU] Add __builtin_amdgcn_grid_size

Similar to D76772, loads the data from the dispatch pointer. Marked invariant.

Patch also updates the openmp devicertl to use this builtin.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D90251
2020-10-29 16:25:13 +00:00
Thomas Lively 5b464f2aa5 [WebAssembly] Fix incorrectly named target builtin
Rename __builtin_wasm_q15mulr_saturate_s_i8x16 to
__builtin_wasm_q15mulr_saturate_s_i16x8, fixing the implied lane interpretation
of the result.
2020-10-28 10:22:43 -07:00
Heejin Ahn 98941279b9 [WebAssembly] Clang-format builtins generation (NFC)
Differential Revision: https://reviews.llvm.org/D90294
2020-10-28 10:01:21 -07:00
Thomas Lively 31e944556f [WebAssembly] Prototype extending multiplication SIMD instructions
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.

Differential Revision: https://reviews.llvm.org/D90253
2020-10-28 09:38:59 -07:00
Tyker d3205bbca3 [Annotation] Allows annotation to carry some additional constant arguments.
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D88645
2020-10-26 10:50:05 +01:00
Caroline Concatto e8d9ee9c7c [SVE][CodeGen]Use getFixedSize() function for TypeSize comparison in clang
This patch makes sure that the instance of TypeSize comparison operator
is done with a fixed type size.

Differential Revision: https://reviews.llvm.org/D89312
2020-10-16 10:56:39 +01:00
Fangrui Song 5a338599fb [CGBuiltin] Respect asm labels and redefine_extname for builtins with specialized emitting
rL131311 added `asm()` support for builtin functions, but `asm()` for builtins with
specialized emitting (e.g. memcpy, various math functions) still do not work.

This patch makes these functions work for `asm()` and `#pragma redefine_extname`.
glibc uses `asm()` to redirect internal libc function calls to hidden aliases.

Limitation: such a function is a builtin in clang, but will not be recognized as
a libcall in optimization passes because Clang does not annotate the renamed
function as a libcall.  In GCC -O1 or above, `abs` can be optimized out but we can't.
Additionally, we cannot redirect `__builtin_sin` to `real_sin` in the following example:

  double sin(double x) asm("real_sin");
  double f(double d) { return __builtin_sin(d); }

---

According to @rsmith, the following three statements cannot be simultaneously true:

(1) The frontend function foo has known, builtin semantics X.
(2) The symbol foo has known, builtin semantics X.
(3) It's not correct to lower a call to the frontend function foo to the symbol foo.

People do want (1) (if it is profitable to expand a memcpy, do it).
This also means that people do not want to add -fno-builtin-memcpy.
People do want (3): that is why they use asm("__GI_memcpy") in the first place.

So unfortunately we make a compromise by not refuting (2) (see the limitation above).
For most libcalls, there is a small loss because compilers don't synthesize them.
For the few glibc cares about, it uses `asm("memcpy = __GI_memcpy");` to make
the assembly level redirection.
(Changing function names (e.g. `__memcpy`) is a hit to ergonomics which is not acceptable).

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D88712
2020-10-15 15:14:38 -07:00
Thomas Lively 1992e30c2d [WebAssembly] Prototype i8x16.popcnt
As proposed at https://github.com/WebAssembly/simd/pull/379. Use a target
builtin and intrinsic rather than normal codegen patterns to make the
instruction opt-in until it is merged to the proposal and stabilized in engines.

Differential Revision: https://reviews.llvm.org/D89446
2020-10-15 21:18:22 +00:00
Thomas Lively 3f738d1f5e Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions"
This reverts commit 7c8385a352 with a typing fix
to an instruction selection pattern.
2020-10-15 19:32:34 +00:00
Thomas Lively 7c8385a352 Revert "[WebAssembly] v128.load{8,16,32,64}_lane instructions"
This reverts commit 7c6bfd90ab.
2020-10-15 15:49:36 +00:00
Thomas Lively 7c6bfd90ab [WebAssembly] v128.load{8,16,32,64}_lane instructions
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.

Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.

Differential Revision: https://reviews.llvm.org/D89366
2020-10-15 15:33:10 +00:00
Simon Pilgrim d7fa9030d4 [CodeGen][X86] Emit fshl/fshr ir intrinsics for shiftleft128/shiftright128 ms intrinsics
Now that funnel shift handling is pretty good, we can use the intrinsics directly and avoid a lot of zext/trunc issues.

https://godbolt.org/z/YqhnnM

Differential Revision: https://reviews.llvm.org/D89405
2020-10-15 10:22:41 +01:00
Simon Pilgrim 6c23cbc560 [X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.

The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.

Differential Revision: https://reviews.llvm.org/D87604
2020-10-13 09:28:39 +01:00
Thomas Lively d8f58bf53a [WebAssembly] Prototype i16x8.q15mulr_sat_s
This saturating, rounding, Q-format multiplication instruction is proposed in
https://github.com/WebAssembly/simd/pull/365.

Differential Revision: https://reviews.llvm.org/D88968
2020-10-09 21:17:53 +00:00
Craig Topper a02b449bb1 [X86] Sync AESENC/DEC Key Locker builtins with gcc.
For the wide builtins, pass a single input and output pointer to
the builtins. Emit the GEPs and input loads from CGBuiltin.
2020-10-04 12:09:41 -07:00
Craig Topper 230c57b0bd [X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned.
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.

The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.

Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
2020-10-04 12:09:35 -07:00
Richard Smith 8fb2a235b0 Don't reject calls to MinGW's unusual _setjmp declaration.
We now recognize this function as a builtin despite it having an
unexpected number of parameters; make sure we don't enforce that it has
only 1 argument for its 2 parameters.
2020-10-02 15:12:15 -07:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Craig Topper 288c5776c9 [X86] Use inlineasm flag output for the _bittest* intrinsics.
Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.

If the flag can't be used directly, the backend will emit a setcc.

Differential Revision: https://reviews.llvm.org/D87888
2020-09-28 13:33:22 -07:00