Commit Graph

1607 Commits

Author SHA1 Message Date
Luo, Yuanke 979d876bb4 [X86][AMX] enable amx cast intrinsics in FE.
We have some discission in D99152 and llvm-dev and finially come up with
a solution to add amx specific cast intrinsics. We've support the
intrinsics in llvm IR. This patch is to replace bitcast with amx cast
intrinsics in code emitting in FE.

Differential Revision: https://reviews.llvm.org/D122567
2022-04-02 14:02:35 +08:00
wangyihan 907d3acefc [Clang][CodeGen]Beautify dump format, add indent for nested struct and struct members
Beautify dump format, add indent for nested struct and struct members, also fix test cases in dump-struct-builtin.c
for example:
struct:
```
  struct A {
    int a;
    struct B {
      int b;
      struct C {
        struct D {
          int d;
          union E {
            int x;
            int y;
          } e;
        } d;
        int c;
      } c;
    } b;
  };
```
Before:
```
struct A {
int a = 0
struct B {
    int b = 0
struct C {
struct D {
            int d = 0
union E {
                int x = 0
                int y = 0
                }
            }
        int c = 0
        }
    }
}
```
After:
```
struct A {
    int a = 0
    struct B {
        int b = 0
        struct C {
            struct D {
                int d = 0
                union E {
                    int x = 0
                    int y = 0
                }
            }
            int c = 0
        }
    }
}
```

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D122704
2022-03-31 07:38:37 +08:00
wangyihan de7cd3ccf5 [Clang][CodeGen]Remove anonymous tag locations
Remove anonymous tag locations, powered by 'PrintingPolicy',
@aaron.ballman once suggested removing this extra information in
https://reviews.llvm.org/D122248
struct:

  struct S {
    int a;
    struct /* Anonymous*/ {
      int x;
    } b;
    int c;
  };

Before:

  struct S {
    int a = 0
      struct S::(unnamed at ./builtin_dump_struct.c:20:3) {
        int x = 0
      }
    int c = 0
  }

After:

struct S {
  int a = 0
    struct S::(unnamed) {
      int x = 0
    }
  int c = 0
}

Differntial Revision: https://reviews.llvm.org/D122670
2022-03-29 11:38:29 -07:00
wangyihan 7faa95624e [clang][CodeGen]Fix clang crash and add bitfield support in __builtin_dump_struct
Fix clang crash and add bitfield support in __builtin_dump_struct.

In clang13.0.x, a struct with three or more members and a bitfield at
the same time will cause a crash. In clang15.x, as long as the struct
has one bitfield, it will cause a crash in clang.

Open issue: https://github.com/llvm/llvm-project/issues/54462

Differential Revision: https://reviews.llvm.org/D122248
2022-03-24 12:23:29 -07:00
Alan Zhao 8cd8bd4a5c Implement __cpuid and __cpuidex as Clang builtins
https://reviews.llvm.org/D23944 implemented the #pragma intrinsic from
MSVC. This causes the statement #pragma intrinsic(cpuid) to fail [0]
on Clang because cpuid is currently implemented in intrin.h instead
of a Clang builtin. Reimplementing cpuid (as well as it's releated
function, cpuidex) should resolve this.

[0]: https://crbug.com/1279344

Differential revision: https://reviews.llvm.org/D121653
2022-03-18 18:13:52 +01:00
Changpeng Fang dd5895cc39 AMDGPU: Use the implicit kernargs for code object version 5
Summary:
  Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265
2022-03-17 14:12:36 -07:00
Nikita Popov 2edac9d962 [CodeGen] Avoid some pointer element type accesses 2022-03-17 16:32:45 +01:00
Eli Friedman 04ba344176 [CodeGen] Inline _byteswap_* builtins.
As discussed in D57915.

Fixes https://github.com/llvm/llvm-project/issues/39999 .

Differential Revision: https://reviews.llvm.org/D121865
2022-03-16 16:18:51 -07:00
Arthur Eubanks 2371c5a0e0 [OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Basically the same as D120527.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D121847
2022-03-16 14:11:53 -07:00
Thomas Lively 7e8913d775 [WebAssembly] Fix names of SIMD instructions containing '_zero'
Fix the instruction names to match the WebAssembly spec:

 - `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero`
 - `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero`

Also rename related things like intrinsics, builtins, and test functions to
match.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D121661
2022-03-16 13:34:57 -07:00
Arthur Eubanks 250620f76e [OpaquePtr][AArch64] Use elementtype on ldxr/stxr
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D120527
2022-03-14 10:09:59 -07:00
Kazushi (Jam) Marukawa b1b4b6f366 [Clang][VE] Add vector load intrinsics
Add vector load intrinsic instructions for VE.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D121049
2022-03-12 09:09:57 +09:00
Akira Hatanaka 9bb8c80bea [NFC][Clang][OpaquePtr] Remove calls to Address::deprecated in
CGBuiltin.cpp

Differential Revision: https://reviews.llvm.org/D121153
2022-03-08 09:45:15 -08:00
Stanislav Mekhanoshin 932f628121 [AMDGPU] new gfx940 fp atomics
Differential Revision: https://reviews.llvm.org/D121028
2022-03-07 12:32:02 -08:00
Qiu Chaofan b2497e5435 [PowerPC] Add generic fnmsub intrinsic
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.

But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.

This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D116015
2022-03-07 13:00:06 +08:00
Shao-Ce SUN fa9c8bab0c [RISCV] Support k-ext clang intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D112774
2022-03-05 13:57:18 +08:00
Nikita Popov 5065076698 [CodeGen] Rename deprecated Address constructor
To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().

While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.
2022-02-17 11:26:42 +01:00
Nikita Popov f208644ed3 [CGBuilder] Remove CreateBitCast() method
Use CreateElementBitCast() instead, or don't work on Address
where not necessary.
2022-02-14 15:06:04 +01:00
Sander de Smalen 0b41238ae7 [AArch64] Emit TBAA metadata for SVE load/store intrinsics
In Clang we can attach TBAA metadata based on the load/store intrinsics
based on the operation's element type.

This also contains changes to InstCombine where the AArch64-specific
intrinsics are transformed into generic LLVM load/store operations,
to ensure that all metadata is transferred to the new instruction.

There will be some further work after this patch to also emit TBAA
metadata for SVE's gather/scatter- and struct load/store intrinsics.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D119319
2022-02-11 09:00:29 +00:00
Simon Pilgrim 09857a4bd1 [X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions

This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:

__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_adds_epi8
  // CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
  return _mm256_adds_epi8(a, b);
}
2022-02-08 15:00:10 +00:00
Simon Pilgrim a59faf272e Revert rG6c174ab2ad0676b295f11f6c3913eff9289fa6b9 "[X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat"
Missed some legacy builtin tests that need cleaning up first
2022-02-08 14:45:28 +00:00
Simon Pilgrim 6c174ab2ad [X86] Remove __builtin_ia32_padd/psub saturated intrinsics and use generic __builtin_elementwise_add/sub_sat
D117898 added the generic __builtin_elementwise_add_sat and __builtin_elementwise_sub_sat with the same integer behaviour as the SSE/AVX instructions

This patch removes the __builtin_ia32_padd/psub saturated intrinsics and just uses the generics - the existing tests see no changes:

__m256i test_mm256_adds_epi8(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_adds_epi8
  // CHECK: call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %{{.*}}, <32 x i8> %{{.*}})
  return _mm256_adds_epi8(a, b);
}
2022-02-08 14:21:20 +00:00
Simon Pilgrim c00db97159 [Clang] Add elementwise saturated add/sub builtins
This patch implements `__builtin_elementwise_add_sat` and `__builtin_elementwise_sub_sat` builtins.

These map to the add/sub saturated math intrinsics described here:
https://llvm.org/docs/LangRef.html#saturation-arithmetic-intrinsics

With this in place we should then be able to replace the x86 SSE adds/subs intrinsics with these generic variants - it looks like other targets should be able to use these as well (arm/aarch64/webassembly all have similar examples in cgbuiltin).

Differential Revision: https://reviews.llvm.org/D117898
2022-02-08 11:22:01 +00:00
Nikita Popov c45a99f36b [MatrixBuilder] Require explicit element type in CreateColumnMajorLoad()
This makes the method compatible with opaque pointers.
2022-02-07 16:57:33 +01:00
Nikita Popov cdc0573f75 [MatrixBuilder] Remove unnecessary IRBuilder template (NFC)
IRBuilderBase exists specifically to avoid the need for this.
2022-02-07 16:42:38 +01:00
John Brawn bca998ed3c [AArch64] Generate fcmps when appropriate for neon intrinsics
Differential Revision: https://reviews.llvm.org/D118257
2022-02-04 12:55:38 +00:00
tyb0807 51e188d079 [AArch64] Support for memset tagged intrinsic
This introduces a new ACLE intrinsic for memset tagged
(https://github.com/ARM-software/acle/blob/next-release/main/acle.md#memcpy-family-of-operations-intrinsics---mops).

  void *__builtin_arm_mops_memset_tag(void *, int, size_t)

A corresponding LLVM intrinsic is introduced:

  i8* llvm.aarch64.mops.memset.tag(i8*, i8, i64)

The types match llvm.memset but the return type is not void.

This is part 1/4 of a series of patches split from
https://reviews.llvm.org/D117405 to facilitate reviewing.

Patch by Tomas Matheson

Differential Revision: https://reviews.llvm.org/D117753
2022-01-31 20:49:34 +00:00
David Green 82973edfb7 [ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.

Fixes #53120 and #50905 and #51761.

Differential Revision: https://reviews.llvm.org/D117592
2022-01-27 19:19:46 +00:00
JackAKirk 0ad19a8331 [CUDA,NVPTX] Corrected fragment size for tf32 LD B matrix.
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D118023
2022-01-25 11:29:19 -08:00
Simon Pilgrim e4074432d5 [X86] Remove avx512f integer and/or/xor/min/max reduction intrinsics and use generic equivalents
None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage:

llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c
llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c

Differential Revision: https://reviews.llvm.org/D117881
2022-01-24 11:57:53 +00:00
Simon Pilgrim 3e50593b18 [X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions

This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_max_epu32
  // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
  return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Sibling patch to D117791

Differential Revision: https://reviews.llvm.org/D117798
2022-01-24 11:40:29 +00:00
Simon Pilgrim e5147f82e1 [X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)

This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
  // CHECK-LABEL: test_mm256_abs_epi8
  // CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
  return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Differential Revision: https://reviews.llvm.org/D117791
2022-01-24 11:25:21 +00:00
Simon Pilgrim 0abaf64580 Revert rG4727d29d908f9dd608dd97a58c0af1ad579fd3ca "[X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs"
Some build bots are referencing the `__builtin_ia32_pabs` intrinsics via alternative headers
2022-01-21 12:35:36 +00:00
Simon Pilgrim 3ef88b3184 Revert rG8ee135dcf8ff060656ad481c3e980fe8763576f5 "[X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`"
Some build bots are referencing the `__builtin_ia32_pmax/min` intrinsics via alternative headers
2022-01-21 12:34:19 +00:00
Simon Pilgrim 8ee135dcf8 [X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions

This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_max_epu32
  // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
  return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Sibling patch to D117791

Differential Revision: https://reviews.llvm.org/D117798
2022-01-21 12:24:58 +00:00
Simon Pilgrim 4727d29d90 [X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)

This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
  // CHECK-LABEL: test_mm256_abs_epi8
  // CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
  return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Differential Revision: https://reviews.llvm.org/D117791
2022-01-21 11:59:08 +00:00
Chenbing.Zheng 0be3da1fab [RISCV] Add intrinsic for Zbt extension
RV32: fsl, fsr, fsri
RV64: fsl, fsr, fsri, fslw, fsrw, fsriw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117468
2022-01-20 08:27:05 +00:00
Benjamin Kramer 765dd8b8a4 [CGBuiltin] Simplify code. NFCI. 2022-01-14 16:02:02 +01:00
Jun Zhang 8de0c1feca
[Clang] Add __builtin_reduce_or and __builtin_reduce_and
This patch implements two builtins specified in D111529.
The last __builtin_reduce_add will be seperated into another one.

Differential Revision: https://reviews.llvm.org/D116736
2022-01-14 22:05:26 +08:00
Lian Wang 16877c5d2c [RISCV] Add bfp and bfpw intrinsic in zbf extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116994
2022-01-13 02:53:00 +00:00
Simon Pilgrim 497a4b26c4 CGBuiltin - Use castAs<> instead of getAs<> to avoid dereference of nullptr
The pointer is always dereferenced immediately below, so assert the cast is correct instead of returning nullptr
2022-01-12 15:35:37 +00:00
Marco Elver 732ad8ea62 [clang][auto-init] Provide __builtin_alloca*_uninitialized variants
When `-ftrivial-auto-var-init=` is enabled, allocas unconditionally
receive auto-initialization since [1].

In certain cases, it turns out, this is causing problems. For example,
when using alloca to add a random stack offset, as the Linux kernel does
on syscall entry [2]. In this case, none of the alloca'd stack memory is
ever used, and initializing it should be controllable; furthermore, it
is not always possible to safely call memset (see [2]).

Introduce `__builtin_alloca_uninitialized()` (and
`__builtin_alloca_with_align_uninitialized`), which never performs
initialization when `-ftrivial-auto-var-init=` is enabled.

[1] https://reviews.llvm.org/D60548
[2] https://lkml.kernel.org/r/YbHTKUjEejZCLyhX@elver.google.com

Reviewed By: glider

Differential Revision: https://reviews.llvm.org/D115440
2022-01-12 15:13:10 +01:00
Serge Guelton d2cc6c2d0c Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is
inefficient: it makes its constructor slightly heavier, and involves extra
allocation for each new string attribute. Storing the attribute key/value as
strings implies extra allocation/copy step.

Use a sorted vector instead. Given the low number of attributes generally
involved, this is cheaper, as showcased by

https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions

Differential Revision: https://reviews.llvm.org/D116599
2022-01-10 14:49:53 +01:00
Jun Zhang 5be131922c
[NFC] Test commit.
This is just a test commit to check whether the permission I got is
correct or not.
2022-01-08 10:36:09 +08:00
Jun Zhang b2ed9f3f44
[Clang] Implement the rest of __builtin_elementwise_* functions.
The patch implement the rest of __builtin_elementwise_* functions
specified in D111529, including:
* __builtin_elementwise_floor
* __builtin_elementwise_roundeven
* __builtin_elementwise_trunc

Signed-off-by: Jun <jun@junz.org>

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D115429
2022-01-07 15:11:36 +00:00
Nikita Popov e8b98a5216 [CodeGen] Emit elementtype attributes for indirect inline asm constraints
This implements the clang side of D116531. The elementtype
attribute is added for all indirect constraints (*) and tests are
updated accordingly.

Differential Revision: https://reviews.llvm.org/D116666
2022-01-06 09:29:22 +01:00
Jun Zhang 82020de532
Recommit "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic.""
This reverts the revert commit f552ba6e84.

Recommit with fixed author name.
2022-01-04 13:46:41 +00:00
Florian Hahn f552ba6e84
Revert "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic."
This reverts commit 5c57e6aa57.

Reverted due to a typo in the authors name. Will recommit soon with
fixed authorship.
2022-01-04 13:45:28 +00:00
Jun Zhan 5c57e6aa57
[Clang] Extend emitUnaryBuiltin to avoid duplicate logic.
This patch extends `emitUnaryBuiltin` so that we can better emitting IR when
implement builtins specified in D111529.

Also contains some NFC, applying it to existing code.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D116161
2022-01-04 11:47:41 +00:00
Krzysztof Parzyszek dcb3e8083a [Hexagon] Make conversions to vector predicate types explicit for builtins
HVX does not have load/store instructions for vector predicates (i.e. bool
vectors). Because of that, vector predicates need to be converted to another
type before being stored, and the most convenient representation is an HVX
vector.
As a consequence, in C/C++, source-level builtins that either take or
produce vector predicates take or return regular vectors instead. On the
other hand, the corresponding LLVM intrinsics do have boolean types that,
and so a conversion of the operand or the return value was necessary.
This conversion would happen inside clang's codegen, but was somewhat
fragile.

This patch changes the strategy: a builtin that takes a vector predicate
now really expects a vector predicate. Since such a predicate cannot be
provided via a variable, this builtin must be composed with other builtins
that either convert vector to a predicate (V6_vandvrt) or predicate to a
vector (V6_vandqrt).

For users using builtins defined in hvx_hexagon_protos.h there is no impact:
the conversions were added to that file. Other users will need to insert
- __builtin_HEXAGON_V6_vandvrt[_128B](V, -1) to convert vector V to a
  vector predicate, or
- __builtin_HEXAGON_V6_vandqrt[_128B](Q, -1) to convert vector predicate Q
  to a vector.

Builtins __builtin_HEXAGON_V6_vmaskedstore.* are a temporary exception to
that, but they are deprecated and should not be used anyway. In the future
they will either follow the same rule, or be removed.
2021-12-22 12:52:24 -08:00
Jun Zhan b55ea2fbc0
[Clang] Add __builtin_reduce_xor
This patch implements __builtin_reduce_xor as specified in D111529.

Reviewed By: fhahn, aaron.ballman

Differential Revision: https://reviews.llvm.org/D115231
2021-12-22 10:00:27 +00:00
Sami Tolvanen ec2e26eaf6 [Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken
functions with pointers to the CFI jump table. This is a problem
for low-level code, such as operating system kernels, which may
need the address of an actual function body without the jump table
indirection.

This change adds the __builtin_function_start() builtin, which
accepts an argument that can be constant-evaluated to a function,
and returns the address of the function body.

Link: https://github.com/ClangBuiltLinux/linux/issues/1353

Depends on D108478

Reviewed By: pcc, rjmccall

Differential Revision: https://reviews.llvm.org/D108479
2021-12-20 12:55:33 -08:00
Nikita Popov d930c3155c [CodeGen] Pass element type to EmitCheckedInBoundsGEP()
Same as for other GEP creation methods.
2021-12-15 14:03:33 +01:00
Nikita Popov 481de0ed80 [CodeGen] Prefer CreateElementBitCast() where possible
CreateElementBitCast() can preserve the pointer element type in
the presence of opaque pointers, so use it in place of CreateBitCast()
in some places. This also sometimes simplifies the code a bit.
2021-12-15 11:48:39 +01:00
Nikita Popov 834c8ff587 [CodeGen] Avoid some uses of deprecated Address constructor
Explicitly pass in the element type instead.
2021-12-15 11:13:10 +01:00
Nikita Popov b4f46555d7 [CodeGen] Avoid some pointer element type accesses 2021-12-15 09:29:27 +01:00
Matt Devereau 41def32040 [AArch64][SVE][NEON] Add NEON-SVE-Bridge intrinsics
Adds svset_neonq, svget_neonq, svdup_neonq AArch64 intrinsics.

These are described in the ACLE specification:
https://github.com/ARM-software/acle/pull/72

https://reviews.llvm.org/D114713
2021-12-13 11:31:57 +00:00
Chuanqi Xu 352e36e10d [Coroutines] Remove unused coroutine builtin/intrinsics llvm.coro.param (NFC-ish)
I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.

Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D115222
2021-12-09 14:40:25 +08:00
Jun Zhang 8680f951c2 Add __builtin_elementwise_ceil
This patch implements one of the missing builtin functions specified
in https://reviews.llvm.org/D111529.
2021-12-08 08:29:33 -05:00
Aaron Ballman 6c75ab5f66 Introduce _BitInt, deprecate _ExtInt
WG14 adopted the _ExtInt feature from Clang for C23, but renamed the
type to be _BitInt. This patch does the vast majority of the work to
rename _ExtInt to _BitInt, which accounts for most of its size. The new
type is exposed in older C modes and all C++ modes as a conforming
extension. However, there are functional changes worth calling out:

* Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
* Updates the mangling for the type.
* Updates the documentation and adds a release note to warn users what
is going on.
* Adds new diagnostics for use of _BitInt to call out when it's used as
a Clang extension or as a pre-C23 compatibility concern.
* Adds new tests for the new diagnostic behaviors.

I want to call out the ABI break specifically. We do not believe that
this break will cause a significant imposition for early adopters of
the feature, and so this is being done as a full break. If it turns out
there are critical uses where recompilation is not an option for some
reason, we can consider using ABI tags to ease the transition.
2021-12-06 12:52:01 -05:00
Jay Foad 2774bad112 [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args
The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.

Don't change the API of the corresponding OpenCL builtins.

Differential Revision: https://reviews.llvm.org/D115032
2021-12-04 10:32:11 +00:00
Qiu Chaofan 4f94c02616 [Clang] Mutate bulitin names under IEEE128 on PPC64
Glibc 2.32 and newer uses these symbol names to support IEEE-754 128-bit
float. GCC transforms name of these builtins to align with Glibc header
behavior.

Since Clang doesn't have all GCC-compatible builtins implemented, this
patch only mutates the implemented part.

Note nexttoward is a special case (no nexttowardf128) so it's also
handled here.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D112401
2021-12-03 17:50:18 +08:00
skc7 16b781e6d1 [AMDGPU][clang] Fix __builtin_nontemporal_store() failure on AMDGPU
Reviewed By: yaxunl, sameerds

Differential Revision: https://reviews.llvm.org/D114849
2021-12-02 05:53:25 +00:00
Ahsan Saghir 4c8b8e0154 [PowerPC] Allow MMA built-ins to accept non-void pointers and arrays
Calls to MMA builtins that take pointer to void
do not accept other pointers/arrays whereas normal
functions with the same parameter do. This patch
allows MMA built-ins to accept non-void pointers
and arrays.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D113306
2021-11-16 09:14:41 -06:00
Jon Chesterfield 27177b82d4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-10 15:30:56 +00:00
Jon Chesterfield 0fa45d6d80 Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"
This reverts commit db81d8f6c4.
2021-11-08 20:28:57 +00:00
Jon Chesterfield db81d8f6c4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

The exact set of changes to check-openmp probably needs revision before commit

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-08 18:38:00 +00:00
Florian Hahn 7999355106
[Clang] Add min/max reduction builtins.
This patch implements __builtin_reduce_max and __builtin_reduce_min as
specified in D111529.

The order of operations does not matter for min or max reductions and
they can be directly lowered to the corresponding
llvm.vector.reduce.{fmin,fmax,umin,umax,smin,smax} intrinsic calls.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D112001
2021-11-02 15:01:42 +01:00
Thomas Lively fb67f3d969 [WebAssembly] Add prototype relaxed float to int trunc instructions
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.

These are only exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112186
2021-10-28 14:01:53 -07:00
Florian Hahn 01870d51b8
[Clang] Add elementwise abs builtin.
This patch implements __builtin_elementwise_abs as specified in
D111529.

Reviewed By: aaron.ballman, scanon

Differential Revision: https://reviews.llvm.org/D111986
2021-10-27 21:01:44 +01:00
Florian Hahn 1ef25d28c1
[Clang] Add elementwise min/max builtins.
This patch implements __builtin_elementwise_max and
__builtin_elementwise_min, as specified in D111529.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D111985
2021-10-26 16:53:40 +01:00
Zhi An Ng e1fb13401e [WebAssembly] Add prototype relaxed float min max instructions
Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112146
2021-10-20 09:41:51 -07:00
Zhi An Ng 2542bfa43a [WebAssembly] Add prototype relaxed swizzle instructions
Add i8x16 relaxed_swizzle instructions. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112022
2021-10-19 17:53:04 -07:00
Zhi An Ng da07942834 [WebAssembly] Add prototype relaxed laneselect instructions
Add i8x16, i16x8, i32x4, i64x2 laneselect instructions. These are only
exposed as builtins, and require user opt-in.
2021-10-15 17:45:09 -07:00
Hsiangkai Wang 5158cfef8b [RISCV] After reverting _mt builtins, add `ta` argument for LLVM IR.
Previous patch only reverts C builtins for tail policy. In order to keep
LLVM IR intact, add the `ta` argument in vector builtins.
2021-10-13 19:41:49 +08:00
Hsiangkai Wang ff3ed78304 Revert "[RISCV] Define _m intrinsics as builtins, instead of macros."
This reverts commit 97f0c63783.

As discussed in https://reviews.llvm.org/D110684, it increased the
compile time and the binary size of clang more than 1%. I reverted
this patch first to think about a better way to do it.
2021-10-13 12:21:51 +08:00
Hsiangkai Wang 97f0c63783 [RISCV] Define _m intrinsics as builtins, instead of macros.
In the original design, we levarage _mt intrinsics to define macros for
_m intrinsics. Such as,

```
__builtin_rvv_vadd_vv_i8m1_mt((vbool8_t)(op0), (vint8m1_t)(op1), (vint8m1_t)(op2), (vint8m1_t)(op3), (size_t)(op4), (size_t)VE_TAIL_AGNOSTIC)
```

However, we could not define generic interface for mask intrinsics any
more due to clang_builtin_alias only accepts clang builtins as its
argument.

In the example,

```
 __rvv_overloaded
 __attribute__((clang_builtin_alias(__builtin_rvv_vadd_vv_i8m1_mt)))
  vint8m1_t vadd(vbool8_t op0, vint8m1_t op1, vint8m1_t op2, vint8m1_t
  op3, size_t op4, size_t op5);
```

op5 is the tail policy argument. When users want to use vadd generic
interface for masked vector add, they need to specify tail policy in the
previous design. In this patch, we define _m intrinsics as clang
builtins to solve the problem.

Differential Revision: https://reviews.llvm.org/D110684
2021-10-12 10:47:55 +08:00
Stefan Pintilie 4fc2f4979c [PowerPC] Fix __builtin_ppc_load2r to return short instead of int.
This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D110771
2021-10-04 06:17:02 -05:00
Quinn Pham 67a3d1e275 [PowerPC] swdiv builtins for XL compatibility
This patch is in a series of patches to provide builtins for compatibility with
the XL compiler. This patch implements the software divide builtin as
wrappers for a floating point divide. XL provided these builtins because it
didn't produce software estimates by default at `-Ofast`. When compiled
with `-Ofast` these builtins will produce the software estimate for divide.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D106959
2021-09-29 11:31:07 -05:00
Quinn Pham 70391b3468 [PowerPC] FP compare and test XL compat builtins.
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.

Reviewed By: #powerpc, lei

Differential Revision: https://reviews.llvm.org/D109437
2021-09-28 11:01:51 -05:00
Ahsan Saghir 593b074a09 [PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins
This patch adds the following built-ins:

__builtin_vsx_build_pair
__builtin_mma_build_acc

Reviewed By: #powerpc, nemanjai, lei

Differential Revision: https://reviews.llvm.org/D107647
2021-09-27 19:51:28 -05:00
Wang, Pengfei 7d6889964a [X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110336
2021-09-27 09:27:04 +08:00
Thomas Lively 2f519825ba [WebAssembly] Add prototype relaxed SIMD fma/fms instructions
Add experimental clang builtins, LLVM intrinsics, and backend definitions for
the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal:
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md.
Do not allow these instructions to be selected without explicit user opt-in.

Differential Revision: https://reviews.llvm.org/D110295
2021-09-23 11:01:36 -07:00
Xiang1 Zhang c81d6ab875 [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109488
2021-09-13 18:03:27 +08:00
Xiang1 Zhang bdce8d40c6 Revert "[X86] Adjust Keylocker handle mem size"
This reverts commit 3731de6b7f.
2021-09-13 18:00:46 +08:00
Xiang1 Zhang 3731de6b7f [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 17:59:33 +08:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Andrei Elovikov 1724a16437 [NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp
...that is located under llvm/lib/Support/.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D108423
2021-08-30 09:48:48 -07:00
Wang, Pengfei c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Andrei Elovikov f5c2889488 [NFC][clang] Use X86 Features declaration from X86TargetParser
...instead of redeclaring them in clang's own X86Target.def. They were already
required to be in sync (IIUC), so no reason to maintain two identical lists.

Reviewed By: erichkeane, craig.topper

Differential Revision: https://reviews.llvm.org/D108151
2021-08-23 12:30:28 -07:00
Simon Pilgrim 7f48bd3bed CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC.
Don't pass the struct by value.
2021-08-22 12:13:17 +01:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Thomas Lively 88962cea46 [WebAssembly] Restore builtins and intrinsics for pmin/pmax
Partially reverts 85157c0079, which had removed these builtins and intrinsics
in favor of normal codegen patterns. It turns out that it is possible for the
patterns to be split over multiple basic blocks, however, which means that DAG
ISel is not able to select them to the pmin/pmax instructions. To make sure the
SIMD intrinsics generate the correct instructions in these cases, reintroduce
the clang builtins and corresponding LLVM intrinsics, but also keep the normal
pattern matching as well.

Differential Revision: https://reviews.llvm.org/D108387
2021-08-20 09:21:31 -07:00
Martin Storsjö cc3affd8b0 [clang] [MSVC] Implement __mulh and __umulh builtins for aarch64
The code is based on the same __mulh and __umulh intrinsics for
x86.

This should fix PR51128.

Differential Revision: https://reviews.llvm.org/D106721
2021-08-19 11:29:55 +03:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Anshil Gandhi 39dac1f7f6 [clang] Add clang builtins support for gfx90a
Implement target builtins for gfx90a including fadd64, fadd32, add2h,
max and min on various global, flat and ds address spaces for which
intrinsics are implemented.

Differential Revision: https://reviews.llvm.org/D106909
2021-08-05 02:08:06 -06:00