Commit Graph

1159 Commits

Author SHA1 Message Date
David Major 027bb52790 [COFF][ARM64] Reorder handling of aarch64 MSVC builtins
In `CodeGenFunction::EmitAArch64BuiltinExpr()`, bulk move all of the aarch64 MSVC-builtin cases to an earlier point in the function (the `// Handle non-overloaded intrinsics first` switch block) in order to avoid an unreachable in `GetNeonType()`. The NEON type-overloading logic is not appropriate for the Windows builtins.

Fixes https://llvm.org/pr42775

Differential Revision: https://reviews.llvm.org/D65403

llvm-svn: 367323
2019-07-30 15:32:49 +00:00
JF Bastien dbc0a5df8d Allow prefetching from non-zero address spaces
Summary:
This is useful for targets which have prefetch instructions for non-default address spaces.

<rdar://problem/42662136>

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, jkorous, dexonsmith, cfe-commits, llvm-commits, RKSimon, hfinkel, t.p.northover, craig.topper, anemet

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65254

llvm-svn: 367032
2019-07-25 16:11:57 +00:00
Christudasan Devadasan 8c5e6fa657 Updated the signature for some stack related intrinsics (CLANG)
Modified the intrinsics
int_addressofreturnaddress,
int_frameaddress & int_sponentry.
This commit depends on the changes in rL366679

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D64563

llvm-svn: 366683
2019-07-22 12:50:30 +00:00
Guanzhong Chen 5204f7611f [WebAssembly] Compute and export TLS block alignment
Summary:
Add immutable WASM global `__tls_align` which stores the alignment
requirements of the TLS segment.

Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang.

The expected usage has now changed to:

    __wasm_init_tls(memalign(__builtin_wasm_tls_align(),
                             __builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton

Reviewed By: tlively

Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65028

llvm-svn: 366624
2019-07-19 23:34:16 +00:00
Guanzhong Chen 801fa8e6b9 [WebAssembly] Implement __builtin_wasm_tls_base intrinsic
Summary:
Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local
block and scan through it for memory leaks.

Reviewers: tlively, aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64900

llvm-svn: 366475
2019-07-18 17:53:22 +00:00
Matt Arsenault e56865d40c AMDGPU: Add some missing builtins
llvm-svn: 366286
2019-07-17 00:01:03 +00:00
Guanzhong Chen 42bba4b852 [WebAssembly] Implement thread-local storage (local-exec model)
Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.

`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.

`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.

`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.

To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.

The expected usage is to run something like the following upon thread startup:

    __wasm_init_tls(malloc(__builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, kripken, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64537

llvm-svn: 366272
2019-07-16 22:00:45 +00:00
Kyrylo Tkachov eb72138340 [AArch64] Implement __jcvt intrinsic from Armv8.3-A
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.

This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).

make check-all didn't show any new failures.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

Differential Revision: https://reviews.llvm.org/D64495

llvm-svn: 366197
2019-07-16 09:27:39 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Ulrich Weigand b98bf60ef7 [SystemZ] Add support for new cpu architecture - arch13
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10303.

Note: No currently available Z system supports the arch13
architecture.  Once new systems become available, the
official system name will be added as supported -march name.

llvm-svn: 365933
2019-07-12 18:14:51 +00:00
Benjamin Kramer 3b5e60b695 [CodeGen] NVPTX: Switch from atomic.load.add.f32 to atomicrmw fadd
llvm-svn: 365798
2019-07-11 17:44:11 +00:00
Craig Topper caf6b71ab2 [X86] Change the IR sequence for _mm_storeh_pi and _mm_storel_pi to perform the store as a <2 x float> instead of i64.
This is similar to what we do for loadl_pi and loadh_pi.

llvm-svn: 365669
2019-07-10 17:11:29 +00:00
Pengfei Wang a50bbfc470 [NFC] [X86] Fix scan-build complaining
Summary:
Remove unused variable. This fixes bug:
https://bugs.llvm.org/show_bug.cgi?id=42526

Signed-off-by: pengfei <pengfei.wang@intel.com>

Reviewers: RKSimon, xiangzhangllvm, craig.topper

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D64389

llvm-svn: 365473
2019-07-09 12:41:12 +00:00
Yonghong Song 048493f882 [BPF] Preserve debuginfo array/union/struct type/access index
For background of BPF CO-RE project, please refer to
  http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.

In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.

Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
  addr = preserve_array_access_index(base, index, dimension)
  addr = preserve_union_access_index(base, di_index)
  addr = preserve_struct_access_index(base, gep_index, di_index)
here,
  base: the base pointer for the array/union/struct access.
  index: the last access index for array, the same for IR/DebugInfo layout.
  dimension: the array dimension.
  gep_index: the access index based on IR layout.
  di_index: the access index based on user/debuginfo types.

If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
  base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().

For example, for the following example,
  $ cat test.c
  struct sk_buff {
     int i;
     int b1:1;
     int b2:2;
     union {
       struct {
         int o1;
         int o2;
       } o;
       struct {
         char flags;
         char dev_id;
       } dev;
       int netid;
     } u[10];
  };

  static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
      = (void *) 4;

  #define _(x) (__builtin_preserve_access_index(x))

  int bpf_prog(struct sk_buff *ctx) {
    char dev_id;
    bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
    return dev_id;
  }
  $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
    test.c >& log

The generated IR looks like below:
  ...
  define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
    %2 = alloca %struct.sk_buff*, align 8
    %3 = alloca i8, align 1
    store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
    call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
    call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
    call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
    %4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
    %5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
    %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
         %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
    %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
         [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
    %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
         %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
    %9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
    %10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
         %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
    %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
    %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
    %13 = sext i8 %12 to i32, !dbg !54
    call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
    ret i32 %13, !dbg !57
  }

  !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
  !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
  !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)

Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.

For &ctx->u[5].dev.dev_id,
  . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
  . The "%7 = ..." represents array subscript "5".
  . The "%8 = ..." represents union member "dev" with index 1 for DI layout.
  . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.

Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.

The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61809

llvm-svn: 365438
2019-07-09 04:21:50 +00:00
Yonghong Song e085b40e9c Revert "[BPF] Preserve debuginfo array/union/struct type/access index"
This reverts commit r365435.

Forgot adding the Differential Revision link. Will add to the
commit message and resubmit.

llvm-svn: 365436
2019-07-09 04:15:12 +00:00
Yonghong Song f21eeafcd9 [BPF] Preserve debuginfo array/union/struct type/access index
For background of BPF CO-RE project, please refer to
  http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.

In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.

Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
  addr = preserve_array_access_index(base, index, dimension)
  addr = preserve_union_access_index(base, di_index)
  addr = preserve_struct_access_index(base, gep_index, di_index)
here,
  base: the base pointer for the array/union/struct access.
  index: the last access index for array, the same for IR/DebugInfo layout.
  dimension: the array dimension.
  gep_index: the access index based on IR layout.
  di_index: the access index based on user/debuginfo types.

If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
  base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().

For example, for the following example,
  $ cat test.c
  struct sk_buff {
     int i;
     int b1:1;
     int b2:2;
     union {
       struct {
         int o1;
         int o2;
       } o;
       struct {
         char flags;
         char dev_id;
       } dev;
       int netid;
     } u[10];
  };

  static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
      = (void *) 4;

  #define _(x) (__builtin_preserve_access_index(x))

  int bpf_prog(struct sk_buff *ctx) {
    char dev_id;
    bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
    return dev_id;
  }
  $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
    test.c >& log

The generated IR looks like below:
  ...
  define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
    %2 = alloca %struct.sk_buff*, align 8
    %3 = alloca i8, align 1
    store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
    call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
    call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
    call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
    %4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
    %5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
    %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
         %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
    %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
         [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
    %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
         %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
    %9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
    %10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
         %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
    %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
    %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
    %13 = sext i8 %12 to i32, !dbg !54
    call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
    ret i32 %13, !dbg !57
  }

  !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
  !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
  !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)

Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.

For &ctx->u[5].dev.dev_id,
  . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
  . The "%7 = ..." represents array subscript "5".
  . The "%8 = ..." represents union member "dev" with index 1 for DI layout.
  . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.

Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.

The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 365435
2019-07-09 04:04:21 +00:00
Fangrui Song 7264a474b7 Change std::{lower,upper}_bound to llvm::{lower,upper}_bound or llvm::partition_point. NFC
llvm-svn: 365006
2019-07-03 08:13:17 +00:00
Stanislav Mekhanoshin 8a8131a3f6 [AMDGPU] gfx1010 wave32 clang support
Differential Revision: https://reviews.llvm.org/D63209

llvm-svn: 363341
2019-06-13 23:47:59 +00:00
Pengfei Wang 244062eece [X86] Enable intrinsics that convert float and bf16 data to each other
Scalar version :
_mm_cvtsbh_ss , _mm_cvtness_sbh

Vector version:
_mm512_cvtpbh_ps , _mm256_cvtpbh_ps
_mm512_maskz_cvtpbh_ps , _mm256_maskz_cvtpbh_ps
_mm512_mask_cvtpbh_ps , _mm256_mask_cvtpbh_ps

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62363

llvm-svn: 363018
2019-06-11 01:17:28 +00:00
Tim Northover c46827c7ed LLVM IR: Generate new-style byval-with-Type from Clang
LLVM IR recently added a Type parameter to the byval Attribute, so that
when pointers become opaque and no longer have an element type the
information will still be present in IR.

For now the Type parameter is optional (which is why Clang didn't need
this change at the time), but it will become mandatory soon.

llvm-svn: 362652
2019-06-05 21:12:14 +00:00
Pengfei Wang cc3629d545 [X86] Add VP2INTERSECT instructions
Support intel AVX512 VP2INTERSECT instructions in clang

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62367

llvm-svn: 362196
2019-05-31 06:09:35 +00:00
Adhemerval Zanella 1468991073 [clang] Handle lrint/llrint builtins
As for other floating-point rounding builtins that can be optimized
when build with -fno-math-errno, this patch adds support for lrint
and llrint.  It currently only optimize for AArch64 backend.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62019

llvm-svn: 361878
2019-05-28 21:16:04 +00:00
Dmitri Gribenko 39192043bb Delete default constructors, copy constructors, move constructors, copy assignment, move assignment operators on Expr, Stmt and Decl
Reviewers: ilya-biryukov, rsmith

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D62187

llvm-svn: 361468
2019-05-23 09:22:43 +00:00
Simon Pilgrim 823458f9b8 [CGBuiltin] dumpRecord - remove unused field offset. NFCI.
llvm-svn: 361238
2019-05-21 10:48:42 +00:00
Craig Topper af7a188453 [Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.

Differential Revision: https://reviews.llvm.org/D62026

llvm-svn: 361169
2019-05-20 16:27:09 +00:00
Craig Topper 20040db9a6 [X86] Stop implicitly enabling avx512vl when avx512bf16 is enabled.
Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic.

After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic.

llvm-svn: 360924
2019-05-16 18:28:17 +00:00
Adhemerval Zanella 0d9dcd7bf0 [clang] Handle lround/llround builtins
As for other floating-point rounding builtins that can be optimized
when build with -fno-math-errno, this patch adds support for lround
and llround.  It currently only optimize for AArch64 backend.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61392

llvm-svn: 360896
2019-05-16 13:43:25 +00:00
Martin Storsjo 7037a13679 [AArch64] Add __builtin_sponentry, for calling setjmp in MinGW
In MinGW, setjmp isn't expanded as a builtin in the compiler (like it
is for MSVC), but manually hooked up as calls to the right underlying
functions in headers. Using the actual CRT's real setjmp/longjmp
functions requires this intrinsic. (Currently this is worked around by
using MinGW specific reimplementations of setjmp/longjmp on aarch64.)

Differential Revision: https://reviews.llvm.org/D61592

llvm-svn: 360082
2019-05-06 21:19:07 +00:00
Luo, Yuanke 844f662932 Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper Lake
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Patch by LiuTianle

Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: mgorny, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60552

llvm-svn: 360018
2019-05-06 08:25:11 +00:00
Richard Smith 31cfb311c5 Reinstate r359059, reverted in r359361, with a fix to properly prevent
us emitting the operand of __builtin_constant_p if it has side-effects.

Original commit message:

Fix interactions between __builtin_constant_p and constexpr to match
current trunk GCC.

GCC permits information from outside the operand of
__builtin_constant_p (but in the same constant evaluation context) to be
used within that operand; clang now does so too. A few other minor
deviations from GCC's behavior showed up in my testing and are also
fixed (matching GCC):
  * Clang now supports nullptr_t as the argument type for
    __builtin_constant_p
    * Clang now returns true from __builtin_constant_p if called with a
    null pointer
    * Clang now returns true from __builtin_constant_p if called with an
    integer cast to pointer type

llvm-svn: 359367
2019-04-27 02:58:17 +00:00
Javed Absar 18b0c40bc5 [AArch64] Add support for MTE intrinsics
This provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined.
Each intrinsic is described in detail in the ACLE Q1 2019 documentation:
https://developer.arm.com/docs/101028/latest
Reviewed By: Tim Nortover, David Spickett
Differential Revision: https://reviews.llvm.org/D60485

llvm-svn: 359348
2019-04-26 21:08:11 +00:00
Artem Belevich 5fe85a003f [CUDA] Implemented _[bi]mma* builtins.
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)

Differential Revision: https://reviews.llvm.org/D60279

llvm-svn: 359248
2019-04-25 22:28:09 +00:00
Diogo N. Sampaio eb312ddfdf [Aarch64] Add v8.2-a half precision element extract intrinsics
Summary:
Implements the intrinsics define on the ACLE to extract half precision fp scalar elements from float16x4_t and float16x8_t vector types.
a.k.a:
vduph_lane_f16
vduph_laneq_f16

Reviewers: pablooliveira, olista01, LukeGeeson, DavidSpickett

Reviewed By: DavidSpickett

Subscribers: DavidSpickett, javed.absar, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60272

llvm-svn: 358276
2019-04-12 10:43:48 +00:00
JF Bastien ef202c308b Variable auto-init: also auto-init alloca
Summary:
alloca isn’t auto-init’d right now because it’s a different path in clang that
all the other stuff we support (it’s a builtin, not an expression).
Interestingly, alloca doesn’t have a type (as opposed to even VLA) so we can
really only initialize it with memset.

<rdar://problem/49794007>

Subscribers: jkorous, dexonsmith, cfe-commits, rjmccall, glider, kees, kcc, pcc

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60548

llvm-svn: 358243
2019-04-12 00:11:27 +00:00
Alexey Sotkin 1b01f9728f [OpenCL] Re-fix invalid address space generation for clk_event_t arguments of enqueue_kernel builtin function
Summary:
https://reviews.llvm.org/D53809 fixed wrong address space(assert in debug build)
generated for event_ret argument. But exactly the same problem exists for
event_wait_list argument. This patch should fix both.

Reviewers: Anastasia, yaxunl

Reviewed By:  Anastasia

Subscribers: kristina, ebevhan, cfe-commits

Differential Revision: https://reviews.llvm.org/D59985

llvm-svn: 358151
2019-04-11 06:18:17 +00:00
Vedant Kumar 37b0f9ad95 [os_log] Mark os_log_helper `nounwind`
Allow the optimizer to remove unnecessary EH cleanups surrounding calls
to os_log_helper, to save some code size.

As a follow-up, it might be worthwhile to add a BasicNoexcept exception
spec to os_log_helper, and to then teach CGCall to emit direct calls for
callees which can't throw. This could save some compile-time.

Differential Revision: https://reviews.llvm.org/D60108

llvm-svn: 357501
2019-04-02 17:42:38 +00:00
Reid Kleckner 73253bdefc [MS] Make __iso_volatile_* available on all targets
Future versions of MSVC make these intrinsics available on x86 & x64,
according to:
http://lists.llvm.org/pipermail/cfe-dev/2019-March/061711.html

The purpose of these builtins is to emit plain, non-atomic, volatile
stores when /volatile:ms (-cc1 -fms-volatile) is enabled.

llvm-svn: 357220
2019-03-28 22:59:09 +00:00
Amara Emerson c10b24691a [AArch64] Split the neon.addp intrinsic into integer and fp variants.
This is the result of discussions on the list about how to deal with intrinsics
which require codegen to disambiguate them via only the integer/fp overloads.
It causes problems for GlobalISel as some of that information is lost during
translation, while with other operations like IR instructions the information is
encoded into the instruction opcode.

This patch changes clang to emit the new faddp intrinsic if the vector operands
to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to
upgrade existing calls to aarch64.neon.addp with fp vector arguments, and
we remove the workarounds introduced for GlobalISel in r355865.

This is a more permanent solution to PR40968.

Differential Revision: https://reviews.llvm.org/D59655

llvm-svn: 356722
2019-03-21 22:31:37 +00:00
Heejin Ahn 7e66a50bb4 [WebAssembly] Use rethrow intrinsic in the rethrow block
Summary:
Because in wasm we merge all catch clauses into one big catchpad, in
case none of the types in catch handlers matches after we test against
each of them, we should unwind to the next EH enclosing scope. For this,
we should NOT use a call to `__cxa_rethrow` but rather a call to our own
rethrow intrinsic, because what we're trying to do here is just to
transfer the control flow into the next enclosing EH pad (or the
caller). Calls to `__cxa_rethrow` should only be used after a call to
`__cxa_begin_catch`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D59353

llvm-svn: 356317
2019-03-16 05:39:12 +00:00
Erik Pilkington 02886e5476 Revert "Add a new attribute, fortify_stdlib"
This reverts commit r353765. After talking with our c stdlib folks, we decided
to use the existing pass_object_size attribute to implement _FORTIFY_SOURCE
wrappers, like Bionic does (I didn't realize that pass_object_size could be used
for this purpose). Sorry for the flip/flop, and thanks to James Y. Knight for
pointing this out to me.

llvm-svn: 356103
2019-03-13 21:37:01 +00:00
Kristina Brooks 103799c060 Fix accidentally used hard tabs. NFC
Big sorry. This undoes the indentation mess I made
in r354751.

llvm-svn: 354752
2019-02-24 18:06:10 +00:00
Kristina Brooks 716cbfb464 Wrap code for builtin_assume_aligned at 80 col.NFC
Minor style fix to avoid going over 80 cols in handling
of case for Builtin::BI__builtin_assume_aligned. NFC.

llvm-svn: 354751
2019-02-24 17:57:33 +00:00
Thomas Lively de7a0a1526 [WebAssembly] Bulk memory intrinsics and builtins
Summary:
implements llvm intrinsics and clang intrinsics for
memory.init and data.drop.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D57736

llvm-svn: 353983
2019-02-13 22:11:16 +00:00
Nico Weber acf81a7c14 Re-enable the test disabled in r353836 and hopefully make it pass in gcc builds
Argument evaluation order is different between gcc and clang, so pull out
the Builder calls to make the generated IR independent of the host compiler's
argument evaluation order.  Thanks to rnk for reminding me of this clang/gcc
difference.

llvm-svn: 353969
2019-02-13 19:04:26 +00:00
Erik Pilkington e3cd735ea6 Add a new attribute, fortify_stdlib
This attribute applies to declarations of C stdlib functions
(sprintf, memcpy...) that have known fortified variants
(__sprintf_chk, __memcpy_chk, ...). When applied, clang will emit
calls to the fortified variant functions instead of calls to the
defaults.

In GCC, this is done by adding gnu_inline-style wrapper functions,
but that doesn't work for us for variadic functions because we don't
support __builtin_va_arg_pack (and have no intention to).

This attribute takes two arguments, the first is 'type' argument
passed through to __builtin_object_size, and the second is a flag
argument that gets passed through to the variadic checking variants.

rdar://47905754

Differential revision: https://reviews.llvm.org/D57918

llvm-svn: 353765
2019-02-11 23:21:39 +00:00
James Y Knight 751fe286dc [opaque pointer types] Cleanup CGBuilder's Create*GEP.
The various EltSize, Offset, DataLayout, and StructLayout arguments
are all computable from the Address's element type and the DataLayout
which the CGBuilder already has access to.

After having previously asserted that the computed values are the same
as those passed in, now remove the redundant arguments from
CGBuilder's Create*GEP functions.

Differential Revision: https://reviews.llvm.org/D57767

llvm-svn: 353629
2019-02-09 22:22:28 +00:00
Volodymyr Sapsai 1570571ded [CodeGen][ObjC] Fix assert on calling `__builtin_constant_p` with ObjC objects.
When we are calling `__builtin_constant_p` with ObjC objects of
different classes, we hit the assertion

> Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file include/llvm/Support/Casting.h, line 254.

It happens because LLVM types for `ObjCInterfaceType` are opaque and
have no name (see `CodeGenTypes::ConvertType`). As the result, for
different ObjC classes we have different `is_constant` intrinsics with
the same name `llvm.is.constant.p0s_s`. When we try to reuse an
intrinsic with the same name, we fail because of type mismatch.

Fix by bitcasting `ObjCObjectPointerType` to `id` prior to passing as an
argument to `__builtin_constant_p`. This results in using intrinsic
`llvm.is.constant.p0i8` and correct types.

rdar://problem/47499250

Reviewers: rjmccall, ahatanak, void

Reviewed By: void, ahatanak

Subscribers: ddunbar, jkorous, hans, dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D57427

llvm-svn: 353577
2019-02-08 23:02:13 +00:00
Eli Friedman 3189d5f48c [COFF, ARM64] Fix types for _ReadStatusReg, _WriteStatusReg
r344765 added those intrinsics, but used the wrong types.

Patch by Mike Hommey

Differential Revision: https://reviews.llvm.org/D57636

llvm-svn: 353493
2019-02-08 01:17:49 +00:00
Tom Tan dcb9e08fae [COFF, ARM64] Add ARM64 support for MS intrinsic _fastfail
The MSDN document was also updated to reflect this, but it probably will take a few days to show in below link.

https://docs.microsoft.com/en-us/cpp/intrinsics/fastfail

Differential Revision: https://reviews.llvm.org/D57631

llvm-svn: 353337
2019-02-06 20:08:26 +00:00
James Y Knight 76f787424d [opaque pointer types] More trivial changes to pass FunctionType to CallInst.
Change various functions to use FunctionCallee or Function*.

Pass function type through __builtin_dump_struct's dumpRecord helper.

llvm-svn: 353199
2019-02-05 19:17:50 +00:00
James Y Knight 9871db064d [opaque pointer types] Pass function types for runtime function calls.
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.

Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.

Differential Revision: https://reviews.llvm.org/D57668

llvm-svn: 353184
2019-02-05 16:42:33 +00:00
James Y Knight 8799caee8d [opaque pointer types] Trivial changes towards CallInst requiring
explicit function types.

llvm-svn: 353009
2019-02-03 21:53:49 +00:00
Erik Pilkington 9c3b588db9 Add a new builtin: __builtin_dynamic_object_size
This builtin has the same UI as __builtin_object_size, but has the
potential to be evaluated dynamically. It is meant to be used as a
drop-in replacement for libraries that use __builtin_object_size when
a dynamic checking mode is enabled. For instance,
__builtin_object_size fails to provide any extra checking in the
following function:

  void f(size_t alloc) {
    char* p = malloc(alloc);
    strcpy(p, "foobar"); // expands to __builtin___strcpy_chk(p, "foobar", __builtin_object_size(p, 0))
  }

This is an overflow if alloc < 7, but because LLVM can't fold the
object size intrinsic statically, it folds __builtin_object_size to
-1. With __builtin_dynamic_object_size, alloc is passed through to
__builtin___strcpy_chk.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56760

llvm-svn: 352665
2019-01-30 20:34:53 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
James Y Knight 3933addd30 Cleanup: replace uses of CallSite with CallBase.
llvm-svn: 352595
2019-01-30 02:54:28 +00:00
Matt Arsenault b72888647b AMDGPU: Add ds append/consume builtins
llvm-svn: 352443
2019-01-28 23:59:18 +00:00
Craig Topper 07b6d3de1b [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument.
Custom lower the builtins to these intrinsics. This enables the middle end to optimize out bitcasts for the masks.

llvm-svn: 352344
2019-01-28 07:03:10 +00:00
Craig Topper bd7884ed79 [X86] Custom codegen 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics.
Summary:
The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics.

For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency.

For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this.

Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics.

To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this.

So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56998

llvm-svn: 352267
2019-01-26 02:42:01 +00:00
Simon Pilgrim a7bcd72c0a [X86] Replace VPCOM/VPCOMU with generic integer comparisons (clang)
These intrinsics can always be replaced with generic integer comparisons without any regression in codegen, even for -O0/-fast-isel cases.

Noticed while cleaning up vector integer comparison costs for PR40376.

A future commit will remove/autoupgrade the existing VPCOM/VPCOMU llvm intrinsics.

llvm-svn: 351687
2019-01-20 16:40:33 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Anton Korobeynikov 81cff31ccf CodeGen: Cast llvm.flt.rounds result to match __builtin_flt_rounds
llvm.flt.rounds returns an i32, but the builtin expects an integer. 
On targets where integers are not 32-bits clang tries to bitcast the result, causing an assertion failure.

The patch enables newlib build for msp430.

Patch by Edward Jones!

Differential Revision: https://reviews.llvm.org/D24461

llvm-svn: 351449
2019-01-17 15:21:55 +00:00
Craig Topper 015585abb2 [X86] Add custom emission for the avx512 scatter builtins to convert from scalar integer to vXi1 for the mask arguments to the intrinsics.
llvm-svn: 351408
2019-01-17 00:34:19 +00:00
Craig Topper 931779761e Recommit r351160 "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
V8 has been fixed now.

llvm-svn: 351391
2019-01-16 22:56:25 +00:00
Craig Topper bb5b06603b [X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
We need to custom handle these so we can turn the scalar mask into a vXi1 vector.

Differential Revision: https://reviews.llvm.org/D56530

llvm-svn: 351390
2019-01-16 22:34:33 +00:00
Benjamin Kramer 9c53890833 Revert "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
This reverts commit r351160. Breaks building v8.

llvm-svn: 351210
2019-01-15 17:23:36 +00:00
Roman Lebedev bd1c087019 [clang][UBSan] Sanitization for alignment assumptions.
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```

This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.

Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.

This is a second commit, the original one was r351105,
which was mass-reverted in r351159 because 2 compiler-rt tests were failing.

Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall

Reviewed By: rjmccall

Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D54589

llvm-svn: 351177
2019-01-15 09:44:25 +00:00
Craig Topper 69aed7c364 [X86] Make _xgetbv/_xsetbv on non-windows platforms
Summary:
This patch attempts to redo what was tried in r278783, but was reverted.

These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves.

To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name.

I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin.

I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it.

Reviewers: rnk, RKSimon, spatel

Reviewed By: rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56686

llvm-svn: 351160
2019-01-15 05:03:18 +00:00
Vlad Tsyrklevich 86e68fda3b Revert alignment assumptions changes
Revert r351104-6, r351109, r351110, r351119, r351134, and r351153. These
changes fail on the sanitizer bots.

llvm-svn: 351159
2019-01-15 03:38:02 +00:00
Roman Lebedev 7892c37455 [clang][UBSan] Sanitization for alignment assumptions.
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```

This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.

Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.

Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall

Reviewed By: rjmccall

Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D54589

llvm-svn: 351105
2019-01-14 19:09:27 +00:00
Dan Gohman 51532a524e [WebAssembly] Remove old builtins
This removes the old grow_memory and mem.grow-style builtins, leaving just
the memory.grow-style builtins.

Differential Revision: https://reviews.llvm.org/D56645

llvm-svn: 351089
2019-01-14 18:28:10 +00:00
Craig Topper 49488407aa [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351036
2019-01-14 08:46:51 +00:00
Craig Topper 689b3b71af [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector.
We'll do the scalar<->vXi1 conversions with bitcasts in IR.

Fixes PR40258

llvm-svn: 351029
2019-01-14 00:03:55 +00:00
Craig Topper cd9e232a4d Recommit r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
The MSVC limit hit in AutoUpgrade.cpp has been worked around for now.

llvm-svn: 350568
2019-01-07 21:00:41 +00:00
Craig Topper 33c9088783 Revert r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
Had to revert the LLVM patch this depends on to fix a MSVC compiler limit in AutoUpgrade.cpp

llvm-svn: 350563
2019-01-07 19:39:25 +00:00
Craig Topper e34f2bb807 [X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins.
Differential Revision: https://reviews.llvm.org/D56365

llvm-svn: 350555
2019-01-07 19:10:22 +00:00
Haibo Huang 303b2333e4 Declares __cpu_model as dso local
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.

Differential Revision: https://reviews.llvm.org/D53850

llvm-svn: 349825
2018-12-20 21:33:59 +00:00
Simon Pilgrim 4597379227 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (clang)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

LLVM counterpart: https://reviews.llvm.org/D55938

Differential Revision: https://reviews.llvm.org/D55937

llvm-svn: 349796
2018-12-20 19:01:13 +00:00
Simon Pilgrim 313dc85ce0 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (clang)
This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics.

LLVM counterpart: https://reviews.llvm.org/D55894

Differential Revision: https://reviews.llvm.org/D55890

llvm-svn: 349743
2018-12-20 11:53:45 +00:00
Simon Pilgrim a7b30b4a58 [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (clang)
Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble.

Differential Revision: https://reviews.llvm.org/D55879

llvm-svn: 349631
2018-12-19 14:43:47 +00:00
Vedant Kumar 77dfca88b2 [CodeGen] Handle mixed-width ops in mixed-sign mul-with-overflow lowering
The special lowering for __builtin_mul_overflow introduced in r320902
fixed an ICE seen when passing mixed-sign operands to the builtin.

This patch extends the special lowering to cover mixed-width, mixed-sign
operands. In a few common scenarios, calls to muloti4 will no longer be
emitted.

This should address the latest comments in PR34920 and work around the
link failure seen in:

  https://bugzilla.redhat.com/show_bug.cgi?id=1657544

Testing:
- check-clang
- A/B output comparison with: https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

Differential Revision: https://reviews.llvm.org/D55843

llvm-svn: 349542
2018-12-18 21:05:03 +00:00
Eric Fiselier 261875054e [Clang] Add __builtin_launder
Summary:
This patch adds `__builtin_launder`, which is required to implement `std::launder`. Additionally GCC provides `__builtin_launder`, so thing brings Clang in-line with GCC.

I'm not exactly sure what magic `__builtin_launder` requires, but  based on previous discussions this patch applies a `@llvm.invariant.group.barrier`. As noted in previous discussions, this may not be enough to correctly handle vtables.

Reviewers: rnk, majnemer, rsmith

Reviewed By: rsmith

Subscribers: kristina, Romain-Geissler-1A, erichkeane, amharc, jroelofs, cfe-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40218

llvm-svn: 349195
2018-12-14 21:11:28 +00:00
Craig Topper 1f2b181689 [Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h
intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature.

For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops.

Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled.

This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available.

Should fix PR40014

Differential Revision: https://reviews.llvm.org/D55677

llvm-svn: 349098
2018-12-14 00:21:02 +00:00
Haibo Huang e177082972 Revert "Declares __cpu_model as dso local"
This reverts r348978

llvm-svn: 348982
2018-12-12 22:39:51 +00:00
Haibo Huang 6b22f59207 Declares __cpu_model as dso local
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.

Differential Revision: https://reviews.llvm.org/D53850

llvm-svn: 348978
2018-12-12 22:04:12 +00:00
Raphael Isemann b23ccecbb0 Misc typos fixes in ./lib folder
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned`

Reviewers: teemperor

Reviewed By: teemperor

Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D55475

llvm-svn: 348755
2018-12-10 12:37:46 +00:00
Craig Topper 6d7a7ef9eb [X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc.
The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature.

This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.

llvm-svn: 348738
2018-12-10 06:07:59 +00:00
Fangrui Song 407659ab0a Revert "Revert r347417 "Re-Reinstate 347294 with a fix for the failures.""
It seems the two failing tests can be simply fixed after r348037

Fix 3 cases in Analysis/builtin-functions.cpp
Delete the bad CodeGen/builtin-constant-p.c for now

llvm-svn: 348053
2018-11-30 23:41:18 +00:00
Fangrui Song f5d3335d75 Revert r347417 "Re-Reinstate 347294 with a fix for the failures."
Kept the "indirect_builtin_constant_p" test case in test/SemaCXX/constant-expression-cxx1y.cpp
while we are investigating why the following snippet fails:

  extern char extern_var;
  struct { int a; } a = {__builtin_constant_p(extern_var)};

llvm-svn: 348039
2018-11-30 21:26:09 +00:00
Hans Wennborg 48ee4ad325 Re-commit r347417 "Re-Reinstate 347294 with a fix for the failures."
This was reverted in r347656 due to me thinking it caused a miscompile of
Chromium. Turns out it was the Chromium code that was broken.

llvm-svn: 347756
2018-11-28 14:04:12 +00:00
Hans Wennborg 8c79706e89 Revert r347417 "Re-Reinstate 347294 with a fix for the failures."
This caused a miscompile in Chrome (see crbug.com/908372) that's
illustrated by this small reduction:

  static bool f(int *a, int *b) {
    return !__builtin_constant_p(b - a) || (!(b - a));
  }

  int arr[] = {1,2,3};

  bool g() {
    return f(arr, arr + 3);
  }

  $ clang -O2 -S -emit-llvm a.cc -o -

g() should return true, but after r347417 it became false for some reason.

This also reverts the follow-up commits.

r347417:
> Re-Reinstate 347294 with a fix for the failures.
>
> Don't try to emit a scalar expression for a non-scalar argument to
> __builtin_constant_p().
>
> Third time's a charm!

r347446:
> The result of is.constant() is unsigned.

r347480:
> A __builtin_constant_p() returns 0 with a function type.

r347512:
> isEvaluatable() implies a constant context.
>
> Assume that we're in a constant context if we're asking if the expression can
> be compiled into a constant initializer. This fixes the issue where a
> __builtin_constant_p() in a compound literal was diagnosed as not being
> constant, even though it's always possible to convert the builtin into a
> constant.

r347531:
> A "constexpr" is evaluated in a constant context. Make sure this is reflected
> if a __builtin_constant_p() is a part of a constexpr.

llvm-svn: 347656
2018-11-27 14:01:40 +00:00
Sanjay Patel c6fa5bc7c7 [CodeGen] translate MS rotate builtins to LLVM funnel-shift intrinsics
This was originally part of:
D50924

and should resolve PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387

...but it was reverted because some bots using a gcc host compiler 
would crash for unknown reasons with this included in the patch. 
Trying again now to see if that's still a problem.

llvm-svn: 347527
2018-11-25 17:53:16 +00:00
Bill Wendling 46acc72cf4 A __builtin_constant_p() returns 0 with a function type.
llvm-svn: 347480
2018-11-22 22:58:06 +00:00
Bill Wendling 2a6c59ea2a The result of is.constant() is unsigned.
llvm-svn: 347446
2018-11-22 09:31:08 +00:00
Bill Wendling 6ff1751f7d Re-Reinstate 347294 with a fix for the failures.
Don't try to emit a scalar expression for a non-scalar argument to
__builtin_constant_p().

Third time's a charm!

llvm-svn: 347417
2018-11-21 20:44:18 +00:00
Nico Weber 9f0246d473 Revert r347364 again, the fix was incomplete.
llvm-svn: 347389
2018-11-21 12:47:43 +00:00
Bill Wendling 91549ed15f Reinstate 347294 with a fix for the failures.
EvaluateAsInt() is sometimes called in a constant context. When that's the
case, we need to specify it as so.

llvm-svn: 347364
2018-11-20 23:24:16 +00:00
Nico Weber 6438972553 Revert 347294, it turned many bots on lab.llvm.org:8011/console red.
llvm-svn: 347314
2018-11-20 15:27:43 +00:00
Bill Wendling 107b0e9881 Use is.constant intrinsic for __builtin_constant_p
Summary:
A __builtin_constant_p may end up with a constant after inlining. Use
the is.constant intrinsic if it's a variable that's in a context where
it may resolve to a constant, e.g., an argument to a function after
inlining.

Reviewers: rsmith, shafik

Subscribers: jfb, kristina, cfe-commits, nickdesaulniers, jyknight

Differential Revision: https://reviews.llvm.org/D54355

llvm-svn: 347294
2018-11-20 08:53:30 +00:00
Alexey Sotkin 692f12b389 [OpenCL] Fix invalid address space generation for clk_event_t
Summary:
Addrspace(32) was generated when putting 0 in clk_event_t * event_ret
parameter for enqueue_kernel function.

Patch by Viktoria Maksimova

Reviewers: Anastasia, yaxunl, AlexeySotkin

Reviewed By:  Anastasia, AlexeySotkin

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D53809

llvm-svn: 346838
2018-11-14 09:40:05 +00:00
Erich Keane de6480a38c [NFC] Move storage of dispatch-version to GlobalDecl
As suggested by Richard Smith, and initially put up for review here:
https://reviews.llvm.org/D53341, this patch removes a hack that was used
to ensure that proper target-feature lists were used when emitting
cpu-dispatch (and eventually, target-clones) implementations. As a part
of this, the GlobalDecl object is proliferated to a bunch more
locations.

Originally, this was put up for review (see above) to get acceptance on
the approach, though discussion with Richard in San Diego showed he
approved of the approach taken here.  Thus, I believe this is acceptable
for Review-After-commit

Differential Revision: https://reviews.llvm.org/D53341

Change-Id: I0a0bd673340d334d93feac789d653e03d9f6b1d5
llvm-svn: 346757
2018-11-13 15:48:08 +00:00