llvm-project

Commit Graph

Author	SHA1	Message	Date
JF Bastien	dbc0a5df8d	Allow prefetching from non-zero address spaces Summary: This is useful for targets which have prefetch instructions for non-default address spaces. <rdar://problem/42662136> Subscribers: nemanjai, javed.absar, hiraditya, kbarton, jkorous, dexonsmith, cfe-commits, llvm-commits, RKSimon, hfinkel, t.p.northover, craig.topper, anemet Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65254 llvm-svn: 367032	2019-07-25 16:11:57 +00:00
Christudasan Devadasan	8c5e6fa657	Updated the signature for some stack related intrinsics (CLANG) Modified the intrinsics int_addressofreturnaddress, int_frameaddress & int_sponentry. This commit depends on the changes in rL366679 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64563 llvm-svn: 366683	2019-07-22 12:50:30 +00:00
Guanzhong Chen	5204f7611f	[WebAssembly] Compute and export TLS block alignment Summary: Add immutable WASM global `__tls_align` which stores the alignment requirements of the TLS segment. Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang. The expected usage has now changed to: __wasm_init_tls(memalign(__builtin_wasm_tls_align(), __builtin_wasm_tls_size())); Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65028 llvm-svn: 366624	2019-07-19 23:34:16 +00:00
Guanzhong Chen	801fa8e6b9	[WebAssembly] Implement __builtin_wasm_tls_base intrinsic Summary: Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local block and scan through it for memory leaks. Reviewers: tlively, aheejin, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64900 llvm-svn: 366475	2019-07-18 17:53:22 +00:00
Matt Arsenault	e56865d40c	AMDGPU: Add some missing builtins llvm-svn: 366286	2019-07-17 00:01:03 +00:00
Guanzhong Chen	42bba4b852	[WebAssembly] Implement thread-local storage (local-exec model) Summary: Thread local variables are placed inside a `.tdata` segment. Their symbols are offsets from the start of the segment. The address of a thread local variable is computed as `__tls_base` + the offset from the start of the segment. `.tdata` segment is a passive segment and `memory.init` is used once per thread to initialize the thread local storage. `__tls_base` is a wasm global. Since each thread has its own wasm instance, it is effectively thread local. Currently, `__tls_base` must be initialized at thread startup, and so cannot be used with dynamic libraries. `__tls_base` is to be initialized with a new linker-synthesized function, `__wasm_init_tls`, which takes as an argument a block of memory to use as the storage for thread locals. It then initializes the block of memory and sets `__tls_base`. As `__wasm_init_tls` will handle the memory initialization, the memory does not have to be zeroed. To help allocating memory for thread-local storage, a new compiler intrinsic is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns the size of the thread-local storage for the current function. The expected usage is to run something like the following upon thread startup: __wasm_init_tls(malloc(__builtin_wasm_tls_size())); Reviewers: tlively, aheejin, kripken, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64537 llvm-svn: 366272	2019-07-16 22:00:45 +00:00
Kyrylo Tkachov	eb72138340	[AArch64] Implement __jcvt intrinsic from Armv8.3-A The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined. This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function. The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used. I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example). make check-all didn't show any new failures. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics Differential Revision: https://reviews.llvm.org/D64495 llvm-svn: 366197	2019-07-16 09:27:39 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
Ulrich Weigand	b98bf60ef7	[SystemZ] Add support for new cpu architecture - arch13 This patch series adds support for the next-generation arch13 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Support for low-level builtins mapped to new LLVM intrinsics. - New high-level intrinsics in vecintrin.h. - Indicate support by defining __VEC__ == 10303. Note: No currently available Z system supports the arch13 architecture. Once new systems become available, the official system name will be added as supported -march name. llvm-svn: 365933	2019-07-12 18:14:51 +00:00
Benjamin Kramer	3b5e60b695	[CodeGen] NVPTX: Switch from atomic.load.add.f32 to atomicrmw fadd llvm-svn: 365798	2019-07-11 17:44:11 +00:00
Craig Topper	caf6b71ab2	[X86] Change the IR sequence for _mm_storeh_pi and _mm_storel_pi to perform the store as a <2 x float> instead of i64. This is similar to what we do for loadl_pi and loadh_pi. llvm-svn: 365669	2019-07-10 17:11:29 +00:00
Pengfei Wang	a50bbfc470	[NFC] [X86] Fix scan-build complaining Summary: Remove unused variable. This fixes bug: https://bugs.llvm.org/show_bug.cgi?id=42526 Signed-off-by: pengfei <pengfei.wang@intel.com> Reviewers: RKSimon, xiangzhangllvm, craig.topper Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D64389 llvm-svn: 365473	2019-07-09 12:41:12 +00:00
Yonghong Song	048493f882	[BPF] Preserve debuginfo array/union/struct type/access index For background of BPF CO-RE project, please refer to http://vger.kernel.org/bpfconf2019.html In summary, BPF CO-RE intends to compile bpf programs adjustable on struct/union layout change so the same program can run on multiple kernels with adjustment before loading based on native kernel structures. In order to do this, we need keep track of GEP(getelementptr) instruction base and result debuginfo types, so we can adjust on the host based on kernel BTF info. Capturing such information as an IR optimization is hard as various optimization may have tweaked GEP and also union is replaced by structure it is impossible to track fieldindex for union member accesses. Three intrinsic functions, preserve_{array,union,struct}_access_index, are introducted. addr = preserve_array_access_index(base, index, dimension) addr = preserve_union_access_index(base, di_index) addr = preserve_struct_access_index(base, gep_index, di_index) here, base: the base pointer for the array/union/struct access. index: the last access index for array, the same for IR/DebugInfo layout. dimension: the array dimension. gep_index: the access index based on IR layout. di_index: the access index based on user/debuginfo types. If using these intrinsics blindly, i.e., transforming all GEPs to these intrinsics and later on reducing them to GEPs, we have seen up to 7% more instructions generated. To avoid such an overhead, a clang builtin is proposed: base = __builtin_preserve_access_index(base) such that user wraps to-be-relocated GEPs in this builtin and preserve__access_index intrinsics only apply to those GEPs. Such a buyin will prevent performance degradation if people do not use CO-RE, even for programs which use bpf_probe_read(). For example, for the following example, $ cat test.c struct sk_buff { int i; int b1:1; int b2:2; union { struct { int o1; int o2; } o; struct { char flags; char dev_id; } dev; int netid; } u[10]; }; static int (bpf_probe_read)(void dst, int size, const void unsafe_ptr) = (void ) 4; #define _(x) (__builtin_preserve_access_index(x)) int bpf_prog(struct sk_buff ctx) { char dev_id; bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id)); return dev_id; } $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \ test.c >& log The generated IR looks like below: ... define dso_local i32 @bpf_prog(%struct.sk_buff) #0 !dbg !15 { %2 = alloca %struct.sk_buff, align 8 %3 = alloca i8, align 1 store %struct.sk_buff* %0, %struct.sk_buff %2, align 8, !tbaa !45 call void @llvm.dbg.declare(metadata %struct.sk_buff %2, metadata !43, metadata !DIExpression()), !dbg !49 call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50 call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51 %4 = load i32 (i8, i32, i8), i32 (i8, i32, i8)* @bpf_probe_read, align 8, !dbg !52, !tbaa !45 %5 = load %struct.sk_buff, %struct.sk_buff* %2, align 8, !dbg !53, !tbaa !45 %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs( %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19 %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons( [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53 %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons( %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26 %9 = bitcast %union.anon* %8 to %struct.anon.0, !dbg !53 %10 = call i8 @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s( %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34 %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52 %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55 %13 = sext i8 %12 to i32, !dbg !54 call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56 ret i32 %13, !dbg !57 } !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20) !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27) !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35) Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index attached to instructions to provide struct/union debuginfo type information. For &ctx->u[5].dev.dev_id, . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout. . The "%7 = ..." represents array subscript "5". . The "%8 = ..." represents union member "dev" with index 1 for DI layout. . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout. Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and examining all preserve_*_access_index calls, the debuginfo struct/union/array access index can be achieved. The intrinsics also contain enough information to regenerate codes for IR layout. For array and structure intrinsics, the proper GEP can be constructed. For union intrinsics, replacing all uses of "addr" with "base" should be enough. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61809 llvm-svn: 365438	2019-07-09 04:21:50 +00:00
Yonghong Song	e085b40e9c	Revert "[BPF] Preserve debuginfo array/union/struct type/access index" This reverts commit r365435. Forgot adding the Differential Revision link. Will add to the commit message and resubmit. llvm-svn: 365436	2019-07-09 04:15:12 +00:00
Yonghong Song	f21eeafcd9	[BPF] Preserve debuginfo array/union/struct type/access index For background of BPF CO-RE project, please refer to http://vger.kernel.org/bpfconf2019.html In summary, BPF CO-RE intends to compile bpf programs adjustable on struct/union layout change so the same program can run on multiple kernels with adjustment before loading based on native kernel structures. In order to do this, we need keep track of GEP(getelementptr) instruction base and result debuginfo types, so we can adjust on the host based on kernel BTF info. Capturing such information as an IR optimization is hard as various optimization may have tweaked GEP and also union is replaced by structure it is impossible to track fieldindex for union member accesses. Three intrinsic functions, preserve_{array,union,struct}_access_index, are introducted. addr = preserve_array_access_index(base, index, dimension) addr = preserve_union_access_index(base, di_index) addr = preserve_struct_access_index(base, gep_index, di_index) here, base: the base pointer for the array/union/struct access. index: the last access index for array, the same for IR/DebugInfo layout. dimension: the array dimension. gep_index: the access index based on IR layout. di_index: the access index based on user/debuginfo types. If using these intrinsics blindly, i.e., transforming all GEPs to these intrinsics and later on reducing them to GEPs, we have seen up to 7% more instructions generated. To avoid such an overhead, a clang builtin is proposed: base = __builtin_preserve_access_index(base) such that user wraps to-be-relocated GEPs in this builtin and preserve__access_index intrinsics only apply to those GEPs. Such a buyin will prevent performance degradation if people do not use CO-RE, even for programs which use bpf_probe_read(). For example, for the following example, $ cat test.c struct sk_buff { int i; int b1:1; int b2:2; union { struct { int o1; int o2; } o; struct { char flags; char dev_id; } dev; int netid; } u[10]; }; static int (bpf_probe_read)(void dst, int size, const void unsafe_ptr) = (void ) 4; #define _(x) (__builtin_preserve_access_index(x)) int bpf_prog(struct sk_buff ctx) { char dev_id; bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id)); return dev_id; } $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \ test.c >& log The generated IR looks like below: ... define dso_local i32 @bpf_prog(%struct.sk_buff) #0 !dbg !15 { %2 = alloca %struct.sk_buff, align 8 %3 = alloca i8, align 1 store %struct.sk_buff* %0, %struct.sk_buff %2, align 8, !tbaa !45 call void @llvm.dbg.declare(metadata %struct.sk_buff %2, metadata !43, metadata !DIExpression()), !dbg !49 call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50 call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51 %4 = load i32 (i8, i32, i8), i32 (i8, i32, i8)* @bpf_probe_read, align 8, !dbg !52, !tbaa !45 %5 = load %struct.sk_buff, %struct.sk_buff* %2, align 8, !dbg !53, !tbaa !45 %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs( %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19 %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons( [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53 %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons( %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26 %9 = bitcast %union.anon* %8 to %struct.anon.0, !dbg !53 %10 = call i8 @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s( %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34 %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52 %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55 %13 = sext i8 %12 to i32, !dbg !54 call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56 ret i32 %13, !dbg !57 } !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20) !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27) !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35) Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index attached to instructions to provide struct/union debuginfo type information. For &ctx->u[5].dev.dev_id, . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout. . The "%7 = ..." represents array subscript "5". . The "%8 = ..." represents union member "dev" with index 1 for DI layout. . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout. Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and examining all preserve_*_access_index calls, the debuginfo struct/union/array access index can be achieved. The intrinsics also contain enough information to regenerate codes for IR layout. For array and structure intrinsics, the proper GEP can be constructed. For union intrinsics, replacing all uses of "addr" with "base" should be enough. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 365435	2019-07-09 04:04:21 +00:00
Fangrui Song	7264a474b7	Change std::{lower,upper}_bound to llvm::{lower,upper}_bound or llvm::partition_point. NFC llvm-svn: 365006	2019-07-03 08:13:17 +00:00
Stanislav Mekhanoshin	8a8131a3f6	[AMDGPU] gfx1010 wave32 clang support Differential Revision: https://reviews.llvm.org/D63209 llvm-svn: 363341	2019-06-13 23:47:59 +00:00
Pengfei Wang	244062eece	[X86] Enable intrinsics that convert float and bf16 data to each other Scalar version : _mm_cvtsbh_ss , _mm_cvtness_sbh Vector version: _mm512_cvtpbh_ps , _mm256_cvtpbh_ps _mm512_maskz_cvtpbh_ps , _mm256_maskz_cvtpbh_ps _mm512_mask_cvtpbh_ps , _mm256_mask_cvtpbh_ps Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62363 llvm-svn: 363018	2019-06-11 01:17:28 +00:00
Tim Northover	c46827c7ed	LLVM IR: Generate new-style byval-with-Type from Clang LLVM IR recently added a Type parameter to the byval Attribute, so that when pointers become opaque and no longer have an element type the information will still be present in IR. For now the Type parameter is optional (which is why Clang didn't need this change at the time), but it will become mandatory soon. llvm-svn: 362652	2019-06-05 21:12:14 +00:00
Pengfei Wang	cc3629d545	[X86] Add VP2INTERSECT instructions Support intel AVX512 VP2INTERSECT instructions in clang Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62367 llvm-svn: 362196	2019-05-31 06:09:35 +00:00
Adhemerval Zanella	1468991073	[clang] Handle lrint/llrint builtins As for other floating-point rounding builtins that can be optimized when build with -fno-math-errno, this patch adds support for lrint and llrint. It currently only optimize for AArch64 backend. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62019 llvm-svn: 361878	2019-05-28 21:16:04 +00:00
Dmitri Gribenko	39192043bb	Delete default constructors, copy constructors, move constructors, copy assignment, move assignment operators on Expr, Stmt and Decl Reviewers: ilya-biryukov, rsmith Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D62187 llvm-svn: 361468	2019-05-23 09:22:43 +00:00
Simon Pilgrim	823458f9b8	[CGBuiltin] dumpRecord - remove unused field offset. NFCI. llvm-svn: 361238	2019-05-21 10:48:42 +00:00
Craig Topper	af7a188453	[Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64 We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well. Differential Revision: https://reviews.llvm.org/D62026 llvm-svn: 361169	2019-05-20 16:27:09 +00:00
Craig Topper	20040db9a6	[X86] Stop implicitly enabling avx512vl when avx512bf16 is enabled. Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic. After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic. llvm-svn: 360924	2019-05-16 18:28:17 +00:00
Adhemerval Zanella	0d9dcd7bf0	[clang] Handle lround/llround builtins As for other floating-point rounding builtins that can be optimized when build with -fno-math-errno, this patch adds support for lround and llround. It currently only optimize for AArch64 backend. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61392 llvm-svn: 360896	2019-05-16 13:43:25 +00:00
Martin Storsjo	7037a13679	[AArch64] Add __builtin_sponentry, for calling setjmp in MinGW In MinGW, setjmp isn't expanded as a builtin in the compiler (like it is for MSVC), but manually hooked up as calls to the right underlying functions in headers. Using the actual CRT's real setjmp/longjmp functions requires this intrinsic. (Currently this is worked around by using MinGW specific reimplementations of setjmp/longjmp on aarch64.) Differential Revision: https://reviews.llvm.org/D61592 llvm-svn: 360082	2019-05-06 21:19:07 +00:00
Luo, Yuanke	844f662932	Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper Lake Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Patch by LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon Reviewed By: craig.topper Subscribers: mgorny, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60552 llvm-svn: 360018	2019-05-06 08:25:11 +00:00
Richard Smith	31cfb311c5	Reinstate r359059, reverted in r359361, with a fix to properly prevent us emitting the operand of __builtin_constant_p if it has side-effects. Original commit message: Fix interactions between __builtin_constant_p and constexpr to match current trunk GCC. GCC permits information from outside the operand of __builtin_constant_p (but in the same constant evaluation context) to be used within that operand; clang now does so too. A few other minor deviations from GCC's behavior showed up in my testing and are also fixed (matching GCC): * Clang now supports nullptr_t as the argument type for __builtin_constant_p * Clang now returns true from __builtin_constant_p if called with a null pointer * Clang now returns true from __builtin_constant_p if called with an integer cast to pointer type llvm-svn: 359367	2019-04-27 02:58:17 +00:00
Javed Absar	18b0c40bc5	[AArch64] Add support for MTE intrinsics This provides intrinsics support for Memory Tagging Extension (MTE), which was introduced with the Armv8.5-a architecture. These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined. Each intrinsic is described in detail in the ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest Reviewed By: Tim Nortover, David Spickett Differential Revision: https://reviews.llvm.org/D60485 llvm-svn: 359348	2019-04-26 21:08:11 +00:00
Artem Belevich	5fe85a003f	[CUDA] Implemented _[bi]mma* builtins. These builtins provide access to the new integer and sub-integer variants of MMA (matrix multiply-accumulate) instructions provided by CUDA-10.x on sm_75 (AKA Turing) GPUs. Also added a feature for PTX 6.4. While Clang/LLVM does not generate any PTX instructions that need it, we still need to pass it through to ptxas in order to be able to compile code that uses the new 'mma' instruction as inline assembly (e.g used by NVIDIA's CUTLASS library https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101) Differential Revision: https://reviews.llvm.org/D60279 llvm-svn: 359248	2019-04-25 22:28:09 +00:00
Diogo N. Sampaio	eb312ddfdf	[Aarch64] Add v8.2-a half precision element extract intrinsics Summary: Implements the intrinsics define on the ACLE to extract half precision fp scalar elements from float16x4_t and float16x8_t vector types. a.k.a: vduph_lane_f16 vduph_laneq_f16 Reviewers: pablooliveira, olista01, LukeGeeson, DavidSpickett Reviewed By: DavidSpickett Subscribers: DavidSpickett, javed.absar, kristof.beyls, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60272 llvm-svn: 358276	2019-04-12 10:43:48 +00:00
JF Bastien	ef202c308b	Variable auto-init: also auto-init alloca Summary: alloca isn’t auto-init’d right now because it’s a different path in clang that all the other stuff we support (it’s a builtin, not an expression). Interestingly, alloca doesn’t have a type (as opposed to even VLA) so we can really only initialize it with memset. <rdar://problem/49794007> Subscribers: jkorous, dexonsmith, cfe-commits, rjmccall, glider, kees, kcc, pcc Tags: #clang Differential Revision: https://reviews.llvm.org/D60548 llvm-svn: 358243	2019-04-12 00:11:27 +00:00
Alexey Sotkin	1b01f9728f	[OpenCL] Re-fix invalid address space generation for clk_event_t arguments of enqueue_kernel builtin function Summary: https://reviews.llvm.org/D53809 fixed wrong address space(assert in debug build) generated for event_ret argument. But exactly the same problem exists for event_wait_list argument. This patch should fix both. Reviewers: Anastasia, yaxunl Reviewed By: Anastasia Subscribers: kristina, ebevhan, cfe-commits Differential Revision: https://reviews.llvm.org/D59985 llvm-svn: 358151	2019-04-11 06:18:17 +00:00
Vedant Kumar	37b0f9ad95	[os_log] Mark os_log_helper `nounwind` Allow the optimizer to remove unnecessary EH cleanups surrounding calls to os_log_helper, to save some code size. As a follow-up, it might be worthwhile to add a BasicNoexcept exception spec to os_log_helper, and to then teach CGCall to emit direct calls for callees which can't throw. This could save some compile-time. Differential Revision: https://reviews.llvm.org/D60108 llvm-svn: 357501	2019-04-02 17:42:38 +00:00
Reid Kleckner	73253bdefc	[MS] Make __iso_volatile_* available on all targets Future versions of MSVC make these intrinsics available on x86 & x64, according to: http://lists.llvm.org/pipermail/cfe-dev/2019-March/061711.html The purpose of these builtins is to emit plain, non-atomic, volatile stores when /volatile:ms (-cc1 -fms-volatile) is enabled. llvm-svn: 357220	2019-03-28 22:59:09 +00:00
Amara Emerson	c10b24691a	[AArch64] Split the neon.addp intrinsic into integer and fp variants. This is the result of discussions on the list about how to deal with intrinsics which require codegen to disambiguate them via only the integer/fp overloads. It causes problems for GlobalISel as some of that information is lost during translation, while with other operations like IR instructions the information is encoded into the instruction opcode. This patch changes clang to emit the new faddp intrinsic if the vector operands to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to upgrade existing calls to aarch64.neon.addp with fp vector arguments, and we remove the workarounds introduced for GlobalISel in r355865. This is a more permanent solution to PR40968. Differential Revision: https://reviews.llvm.org/D59655 llvm-svn: 356722	2019-03-21 22:31:37 +00:00
Heejin Ahn	7e66a50bb4	[WebAssembly] Use rethrow intrinsic in the rethrow block Summary: Because in wasm we merge all catch clauses into one big catchpad, in case none of the types in catch handlers matches after we test against each of them, we should unwind to the next EH enclosing scope. For this, we should NOT use a call to `__cxa_rethrow` but rather a call to our own rethrow intrinsic, because what we're trying to do here is just to transfer the control flow into the next enclosing EH pad (or the caller). Calls to `__cxa_rethrow` should only be used after a call to `__cxa_begin_catch`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D59353 llvm-svn: 356317	2019-03-16 05:39:12 +00:00
Erik Pilkington	02886e5476	Revert "Add a new attribute, fortify_stdlib" This reverts commit r353765. After talking with our c stdlib folks, we decided to use the existing pass_object_size attribute to implement _FORTIFY_SOURCE wrappers, like Bionic does (I didn't realize that pass_object_size could be used for this purpose). Sorry for the flip/flop, and thanks to James Y. Knight for pointing this out to me. llvm-svn: 356103	2019-03-13 21:37:01 +00:00
Kristina Brooks	103799c060	Fix accidentally used hard tabs. NFC Big sorry. This undoes the indentation mess I made in r354751. llvm-svn: 354752	2019-02-24 18:06:10 +00:00
Kristina Brooks	716cbfb464	Wrap code for builtin_assume_aligned at 80 col.NFC Minor style fix to avoid going over 80 cols in handling of case for Builtin::BI__builtin_assume_aligned. NFC. llvm-svn: 354751	2019-02-24 17:57:33 +00:00
Thomas Lively	de7a0a1526	[WebAssembly] Bulk memory intrinsics and builtins Summary: implements llvm intrinsics and clang intrinsics for memory.init and data.drop. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D57736 llvm-svn: 353983	2019-02-13 22:11:16 +00:00
Nico Weber	acf81a7c14	Re-enable the test disabled in r353836 and hopefully make it pass in gcc builds Argument evaluation order is different between gcc and clang, so pull out the Builder calls to make the generated IR independent of the host compiler's argument evaluation order. Thanks to rnk for reminding me of this clang/gcc difference. llvm-svn: 353969	2019-02-13 19:04:26 +00:00
Erik Pilkington	e3cd735ea6	Add a new attribute, fortify_stdlib This attribute applies to declarations of C stdlib functions (sprintf, memcpy...) that have known fortified variants (__sprintf_chk, __memcpy_chk, ...). When applied, clang will emit calls to the fortified variant functions instead of calls to the defaults. In GCC, this is done by adding gnu_inline-style wrapper functions, but that doesn't work for us for variadic functions because we don't support __builtin_va_arg_pack (and have no intention to). This attribute takes two arguments, the first is 'type' argument passed through to __builtin_object_size, and the second is a flag argument that gets passed through to the variadic checking variants. rdar://47905754 Differential revision: https://reviews.llvm.org/D57918 llvm-svn: 353765	2019-02-11 23:21:39 +00:00
James Y Knight	751fe286dc	[opaque pointer types] Cleanup CGBuilder's CreateGEP. The various EltSize, Offset, DataLayout, and StructLayout arguments are all computable from the Address's element type and the DataLayout which the CGBuilder already has access to. After having previously asserted that the computed values are the same as those passed in, now remove the redundant arguments from CGBuilder's CreateGEP functions. Differential Revision: https://reviews.llvm.org/D57767 llvm-svn: 353629	2019-02-09 22:22:28 +00:00
Volodymyr Sapsai	1570571ded	[CodeGen][ObjC] Fix assert on calling `__builtin_constant_p` with ObjC objects. When we are calling `__builtin_constant_p` with ObjC objects of different classes, we hit the assertion > Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file include/llvm/Support/Casting.h, line 254. It happens because LLVM types for `ObjCInterfaceType` are opaque and have no name (see `CodeGenTypes::ConvertType`). As the result, for different ObjC classes we have different `is_constant` intrinsics with the same name `llvm.is.constant.p0s_s`. When we try to reuse an intrinsic with the same name, we fail because of type mismatch. Fix by bitcasting `ObjCObjectPointerType` to `id` prior to passing as an argument to `__builtin_constant_p`. This results in using intrinsic `llvm.is.constant.p0i8` and correct types. rdar://problem/47499250 Reviewers: rjmccall, ahatanak, void Reviewed By: void, ahatanak Subscribers: ddunbar, jkorous, hans, dexonsmith, cfe-commits Differential Revision: https://reviews.llvm.org/D57427 llvm-svn: 353577	2019-02-08 23:02:13 +00:00
Eli Friedman	3189d5f48c	[COFF, ARM64] Fix types for _ReadStatusReg, _WriteStatusReg r344765 added those intrinsics, but used the wrong types. Patch by Mike Hommey Differential Revision: https://reviews.llvm.org/D57636 llvm-svn: 353493	2019-02-08 01:17:49 +00:00
Tom Tan	dcb9e08fae	[COFF, ARM64] Add ARM64 support for MS intrinsic _fastfail The MSDN document was also updated to reflect this, but it probably will take a few days to show in below link. https://docs.microsoft.com/en-us/cpp/intrinsics/fastfail Differential Revision: https://reviews.llvm.org/D57631 llvm-svn: 353337	2019-02-06 20:08:26 +00:00
James Y Knight	76f787424d	[opaque pointer types] More trivial changes to pass FunctionType to CallInst. Change various functions to use FunctionCallee or Function*. Pass function type through __builtin_dump_struct's dumpRecord helper. llvm-svn: 353199	2019-02-05 19:17:50 +00:00
James Y Knight	9871db064d	[opaque pointer types] Pass function types for runtime function calls. Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a FunctionCallee as an argument, and CreateRuntimeFunction has been modified to return a FunctionCallee. All callers have been updated. Additionally, CreateBuiltinFunction is removed, as it was redundant with CreateRuntimeFunction after some previous changes. Differential Revision: https://reviews.llvm.org/D57668 llvm-svn: 353184	2019-02-05 16:42:33 +00:00
James Y Knight	8799caee8d	[opaque pointer types] Trivial changes towards CallInst requiring explicit function types. llvm-svn: 353009	2019-02-03 21:53:49 +00:00
Erik Pilkington	9c3b588db9	Add a new builtin: __builtin_dynamic_object_size This builtin has the same UI as __builtin_object_size, but has the potential to be evaluated dynamically. It is meant to be used as a drop-in replacement for libraries that use __builtin_object_size when a dynamic checking mode is enabled. For instance, __builtin_object_size fails to provide any extra checking in the following function: void f(size_t alloc) { char* p = malloc(alloc); strcpy(p, "foobar"); // expands to __builtin___strcpy_chk(p, "foobar", __builtin_object_size(p, 0)) } This is an overflow if alloc < 7, but because LLVM can't fold the object size intrinsic statically, it folds __builtin_object_size to -1. With __builtin_dynamic_object_size, alloc is passed through to __builtin___strcpy_chk. rdar://32212419 Differential revision: https://reviews.llvm.org/D56760 llvm-svn: 352665	2019-01-30 20:34:53 +00:00
Erik Pilkington	600e9deacf	Add a 'dynamic' parameter to the objectsize intrinsic This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664	2019-01-30 20:34:35 +00:00
James Y Knight	3933addd30	Cleanup: replace uses of CallSite with CallBase. llvm-svn: 352595	2019-01-30 02:54:28 +00:00
Matt Arsenault	b72888647b	AMDGPU: Add ds append/consume builtins llvm-svn: 352443	2019-01-28 23:59:18 +00:00
Craig Topper	07b6d3de1b	[X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument. Custom lower the builtins to these intrinsics. This enables the middle end to optimize out bitcasts for the masks. llvm-svn: 352344	2019-01-28 07:03:10 +00:00
Craig Topper	bd7884ed79	[X86] Custom codegen 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics. Summary: The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics. For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency. For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this. Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics. To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this. So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D56998 llvm-svn: 352267	2019-01-26 02:42:01 +00:00
Simon Pilgrim	a7bcd72c0a	[X86] Replace VPCOM/VPCOMU with generic integer comparisons (clang) These intrinsics can always be replaced with generic integer comparisons without any regression in codegen, even for -O0/-fast-isel cases. Noticed while cleaning up vector integer comparison costs for PR40376. A future commit will remove/autoupgrade the existing VPCOM/VPCOMU llvm intrinsics. llvm-svn: 351687	2019-01-20 16:40:33 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Anton Korobeynikov	81cff31ccf	CodeGen: Cast llvm.flt.rounds result to match __builtin_flt_rounds llvm.flt.rounds returns an i32, but the builtin expects an integer. On targets where integers are not 32-bits clang tries to bitcast the result, causing an assertion failure. The patch enables newlib build for msp430. Patch by Edward Jones! Differential Revision: https://reviews.llvm.org/D24461 llvm-svn: 351449	2019-01-17 15:21:55 +00:00
Craig Topper	015585abb2	[X86] Add custom emission for the avx512 scatter builtins to convert from scalar integer to vXi1 for the mask arguments to the intrinsics. llvm-svn: 351408	2019-01-17 00:34:19 +00:00
Craig Topper	931779761e	Recommit r351160 "[X86] Make _xgetbv/_xsetbv on non-windows platforms" V8 has been fixed now. llvm-svn: 351391	2019-01-16 22:56:25 +00:00
Craig Topper	bb5b06603b	[X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar We need to custom handle these so we can turn the scalar mask into a vXi1 vector. Differential Revision: https://reviews.llvm.org/D56530 llvm-svn: 351390	2019-01-16 22:34:33 +00:00
Benjamin Kramer	9c53890833	Revert "[X86] Make _xgetbv/_xsetbv on non-windows platforms" This reverts commit r351160. Breaks building v8. llvm-svn: 351210	2019-01-15 17:23:36 +00:00
Roman Lebedev	bd1c087019	[clang][UBSan] Sanitization for alignment assumptions. Summary: UB isn't nice. It's cool and powerful, but not nice. Having a way to detect it is nice though. [[ https://wg21.link/p1007r3 \| P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says: ``` We propose to add this functionality via a library function instead of a core language attribute. ... If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour. ``` This differential teaches clang to sanitize all the various variants of this assume-aligned attribute. Requires D54588 for LLVM IRBuilder changes. The compiler-rt part is D54590. This is a second commit, the original one was r351105, which was mass-reverted in r351159 because 2 compiler-rt tests were failing. Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall Reviewed By: rjmccall Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D54589 llvm-svn: 351177	2019-01-15 09:44:25 +00:00
Craig Topper	69aed7c364	[X86] Make _xgetbv/_xsetbv on non-windows platforms Summary: This patch attempts to redo what was tried in r278783, but was reverted. These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves. To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name. I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin. I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it. Reviewers: rnk, RKSimon, spatel Reviewed By: rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D56686 llvm-svn: 351160	2019-01-15 05:03:18 +00:00
Vlad Tsyrklevich	86e68fda3b	Revert alignment assumptions changes Revert r351104-6, r351109, r351110, r351119, r351134, and r351153. These changes fail on the sanitizer bots. llvm-svn: 351159	2019-01-15 03:38:02 +00:00
Roman Lebedev	7892c37455	[clang][UBSan] Sanitization for alignment assumptions. Summary: UB isn't nice. It's cool and powerful, but not nice. Having a way to detect it is nice though. [[ https://wg21.link/p1007r3 \| P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says: ``` We propose to add this functionality via a library function instead of a core language attribute. ... If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour. ``` This differential teaches clang to sanitize all the various variants of this assume-aligned attribute. Requires D54588 for LLVM IRBuilder changes. The compiler-rt part is D54590. Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall Reviewed By: rjmccall Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D54589 llvm-svn: 351105	2019-01-14 19:09:27 +00:00
Dan Gohman	51532a524e	[WebAssembly] Remove old builtins This removes the old grow_memory and mem.grow-style builtins, leaving just the memory.grow-style builtins. Differential Revision: https://reviews.llvm.org/D56645 llvm-svn: 351089	2019-01-14 18:28:10 +00:00
Craig Topper	49488407aa	[X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead. Fixes PR40259 llvm-svn: 351036	2019-01-14 08:46:51 +00:00
Craig Topper	689b3b71af	[X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector. We'll do the scalar<->vXi1 conversions with bitcasts in IR. Fixes PR40258 llvm-svn: 351029	2019-01-14 00:03:55 +00:00
Craig Topper	cd9e232a4d	Recommit r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins." The MSVC limit hit in AutoUpgrade.cpp has been worked around for now. llvm-svn: 350568	2019-01-07 21:00:41 +00:00
Craig Topper	33c9088783	Revert r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins." Had to revert the LLVM patch this depends on to fix a MSVC compiler limit in AutoUpgrade.cpp llvm-svn: 350563	2019-01-07 19:39:25 +00:00
Craig Topper	e34f2bb807	[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins. Differential Revision: https://reviews.llvm.org/D56365 llvm-svn: 350555	2019-01-07 19:10:22 +00:00
Haibo Huang	303b2333e4	Declares __cpu_model as dso local __builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local. Differential Revision: https://reviews.llvm.org/D53850 llvm-svn: 349825	2018-12-20 21:33:59 +00:00
Simon Pilgrim	4597379227	[X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (clang) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. LLVM counterpart: https://reviews.llvm.org/D55938 Differential Revision: https://reviews.llvm.org/D55937 llvm-svn: 349796	2018-12-20 19:01:13 +00:00
Simon Pilgrim	313dc85ce0	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (clang) This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics. LLVM counterpart: https://reviews.llvm.org/D55894 Differential Revision: https://reviews.llvm.org/D55890 llvm-svn: 349743	2018-12-20 11:53:45 +00:00
Simon Pilgrim	a7b30b4a58	[X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (clang) Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble. Differential Revision: https://reviews.llvm.org/D55879 llvm-svn: 349631	2018-12-19 14:43:47 +00:00
Vedant Kumar	77dfca88b2	[CodeGen] Handle mixed-width ops in mixed-sign mul-with-overflow lowering The special lowering for __builtin_mul_overflow introduced in r320902 fixed an ICE seen when passing mixed-sign operands to the builtin. This patch extends the special lowering to cover mixed-width, mixed-sign operands. In a few common scenarios, calls to muloti4 will no longer be emitted. This should address the latest comments in PR34920 and work around the link failure seen in: https://bugzilla.redhat.com/show_bug.cgi?id=1657544 Testing: - check-clang - A/B output comparison with: https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081 Differential Revision: https://reviews.llvm.org/D55843 llvm-svn: 349542	2018-12-18 21:05:03 +00:00
Eric Fiselier	261875054e	[Clang] Add __builtin_launder Summary: This patch adds `__builtin_launder`, which is required to implement `std::launder`. Additionally GCC provides `__builtin_launder`, so thing brings Clang in-line with GCC. I'm not exactly sure what magic `__builtin_launder` requires, but based on previous discussions this patch applies a `@llvm.invariant.group.barrier`. As noted in previous discussions, this may not be enough to correctly handle vtables. Reviewers: rnk, majnemer, rsmith Reviewed By: rsmith Subscribers: kristina, Romain-Geissler-1A, erichkeane, amharc, jroelofs, cfe-commits, Prazek Differential Revision: https://reviews.llvm.org/D40218 llvm-svn: 349195	2018-12-14 21:11:28 +00:00
Craig Topper	1f2b181689	[Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature. For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops. Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled. This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available. Should fix PR40014 Differential Revision: https://reviews.llvm.org/D55677 llvm-svn: 349098	2018-12-14 00:21:02 +00:00
Haibo Huang	e177082972	Revert "Declares __cpu_model as dso local" This reverts r348978 llvm-svn: 348982	2018-12-12 22:39:51 +00:00
Haibo Huang	6b22f59207	Declares __cpu_model as dso local __builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local. Differential Revision: https://reviews.llvm.org/D53850 llvm-svn: 348978	2018-12-12 22:04:12 +00:00
Raphael Isemann	b23ccecbb0	Misc typos fixes in ./lib folder Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned` Reviewers: teemperor Reviewed By: teemperor Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D55475 llvm-svn: 348755	2018-12-10 12:37:46 +00:00
Craig Topper	6d7a7ef9eb	[X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc. The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature. This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden. llvm-svn: 348738	2018-12-10 06:07:59 +00:00
Fangrui Song	407659ab0a	Revert "Revert r347417 "Re-Reinstate 347294 with a fix for the failures."" It seems the two failing tests can be simply fixed after r348037 Fix 3 cases in Analysis/builtin-functions.cpp Delete the bad CodeGen/builtin-constant-p.c for now llvm-svn: 348053	2018-11-30 23:41:18 +00:00
Fangrui Song	f5d3335d75	Revert r347417 "Re-Reinstate 347294 with a fix for the failures." Kept the "indirect_builtin_constant_p" test case in test/SemaCXX/constant-expression-cxx1y.cpp while we are investigating why the following snippet fails: extern char extern_var; struct { int a; } a = {__builtin_constant_p(extern_var)}; llvm-svn: 348039	2018-11-30 21:26:09 +00:00
Hans Wennborg	48ee4ad325	Re-commit r347417 "Re-Reinstate 347294 with a fix for the failures." This was reverted in r347656 due to me thinking it caused a miscompile of Chromium. Turns out it was the Chromium code that was broken. llvm-svn: 347756	2018-11-28 14:04:12 +00:00
Hans Wennborg	8c79706e89	Revert r347417 "Re-Reinstate 347294 with a fix for the failures." This caused a miscompile in Chrome (see crbug.com/908372) that's illustrated by this small reduction: static bool f(int a, int b) { return !__builtin_constant_p(b - a) \|\| (!(b - a)); } int arr[] = {1,2,3}; bool g() { return f(arr, arr + 3); } $ clang -O2 -S -emit-llvm a.cc -o - g() should return true, but after r347417 it became false for some reason. This also reverts the follow-up commits. r347417: > Re-Reinstate 347294 with a fix for the failures. > > Don't try to emit a scalar expression for a non-scalar argument to > __builtin_constant_p(). > > Third time's a charm! r347446: > The result of is.constant() is unsigned. r347480: > A __builtin_constant_p() returns 0 with a function type. r347512: > isEvaluatable() implies a constant context. > > Assume that we're in a constant context if we're asking if the expression can > be compiled into a constant initializer. This fixes the issue where a > __builtin_constant_p() in a compound literal was diagnosed as not being > constant, even though it's always possible to convert the builtin into a > constant. r347531: > A "constexpr" is evaluated in a constant context. Make sure this is reflected > if a __builtin_constant_p() is a part of a constexpr. llvm-svn: 347656	2018-11-27 14:01:40 +00:00
Sanjay Patel	c6fa5bc7c7	[CodeGen] translate MS rotate builtins to LLVM funnel-shift intrinsics This was originally part of: D50924 and should resolve PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387 ...but it was reverted because some bots using a gcc host compiler would crash for unknown reasons with this included in the patch. Trying again now to see if that's still a problem. llvm-svn: 347527	2018-11-25 17:53:16 +00:00
Bill Wendling	46acc72cf4	A __builtin_constant_p() returns 0 with a function type. llvm-svn: 347480	2018-11-22 22:58:06 +00:00
Bill Wendling	2a6c59ea2a	The result of is.constant() is unsigned. llvm-svn: 347446	2018-11-22 09:31:08 +00:00
Bill Wendling	6ff1751f7d	Re-Reinstate 347294 with a fix for the failures. Don't try to emit a scalar expression for a non-scalar argument to __builtin_constant_p(). Third time's a charm! llvm-svn: 347417	2018-11-21 20:44:18 +00:00
Nico Weber	9f0246d473	Revert r347364 again, the fix was incomplete. llvm-svn: 347389	2018-11-21 12:47:43 +00:00
Bill Wendling	91549ed15f	Reinstate 347294 with a fix for the failures. EvaluateAsInt() is sometimes called in a constant context. When that's the case, we need to specify it as so. llvm-svn: 347364	2018-11-20 23:24:16 +00:00
Nico Weber	6438972553	Revert 347294, it turned many bots on lab.llvm.org:8011/console red. llvm-svn: 347314	2018-11-20 15:27:43 +00:00
Bill Wendling	107b0e9881	Use is.constant intrinsic for __builtin_constant_p Summary: A __builtin_constant_p may end up with a constant after inlining. Use the is.constant intrinsic if it's a variable that's in a context where it may resolve to a constant, e.g., an argument to a function after inlining. Reviewers: rsmith, shafik Subscribers: jfb, kristina, cfe-commits, nickdesaulniers, jyknight Differential Revision: https://reviews.llvm.org/D54355 llvm-svn: 347294	2018-11-20 08:53:30 +00:00
Alexey Sotkin	692f12b389	[OpenCL] Fix invalid address space generation for clk_event_t Summary: Addrspace(32) was generated when putting 0 in clk_event_t * event_ret parameter for enqueue_kernel function. Patch by Viktoria Maksimova Reviewers: Anastasia, yaxunl, AlexeySotkin Reviewed By: Anastasia, AlexeySotkin Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D53809 llvm-svn: 346838	2018-11-14 09:40:05 +00:00
Erich Keane	de6480a38c	[NFC] Move storage of dispatch-version to GlobalDecl As suggested by Richard Smith, and initially put up for review here: https://reviews.llvm.org/D53341, this patch removes a hack that was used to ensure that proper target-feature lists were used when emitting cpu-dispatch (and eventually, target-clones) implementations. As a part of this, the GlobalDecl object is proliferated to a bunch more locations. Originally, this was put up for review (see above) to get acceptance on the approach, though discussion with Richard in San Diego showed he approved of the approach taken here. Thus, I believe this is acceptable for Review-After-commit Differential Revision: https://reviews.llvm.org/D53341 Change-Id: I0a0bd673340d334d93feac789d653e03d9f6b1d5 llvm-svn: 346757	2018-11-13 15:48:08 +00:00
Jonas Devlieghere	64a2630825	Pass the function type instead of the return type to FunctionDecl::Create Fix places where the return type of a FunctionDecl was being used in place of the function type FunctionDecl::Create() takes as its T parameter the type of function that should be created, not the return type. Passing in the return type looks to have been copypasta'd around a bit, but the number of correct usages outweighs the incorrect ones so I've opted for keeping what T is the same and fixing up the call sites instead. This fixes a crash in Clang when attempting to compile the following snippet of code with -fblocks -fsanitize=function -x objective-c++ (my original repro case): void g(void(^)()); void f() { __block int a = 0; g(^(){ a++; }); } as well as the following which only requires -fsanitize=function -x c++: void f(char * buf) { __builtin_os_log_format(buf, ""); } Patch by: Ben (bobsayshilol) Differential revision: https://reviews.llvm.org/D53263 llvm-svn: 346601	2018-11-11 00:56:15 +00:00
Kadir Cetinkaya	b1501462e2	T was unused on assertion disabled builds. llvm-svn: 346216	2018-11-06 08:59:25 +00:00
Akira Hatanaka	908aabb783	Cast to uint64_t instead of to unsigned. This is a follow-up to r346211. llvm-svn: 346212	2018-11-06 07:12:28 +00:00
Akira Hatanaka	d572cf496d	os_log: Allow specifying mask type in format string. A mask type is a 1 to 8-byte string that follows the "mask." annotation in the format string. This enables obfuscating data in the event the provided privacy level isn't enabled. rdar://problem/36756282 llvm-svn: 346211	2018-11-06 07:05:14 +00:00
Mandeep Singh Grang	574caddc0d	[COFF, ARM64] Implement InterlockedDecrement_ builtins This is eight in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54068 llvm-svn: 346208	2018-11-06 05:07:43 +00:00
Mandeep Singh Grang	fdf74d9751	[COFF, ARM64] Implement InterlockedIncrement_ builtins This is seventh in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54067 llvm-svn: 346207	2018-11-06 05:05:32 +00:00
Mandeep Singh Grang	c89157b5c1	[COFF, ARM64] Implement InterlockedAnd_ builtins This is sixth in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54066 llvm-svn: 346206	2018-11-06 05:03:13 +00:00
Mandeep Singh Grang	806f10701b	[COFF, ARM64] Implement InterlockedXor_ builtins This is fifth in a series of patches to move intrinsic definitions out of intrin.h. Note: This was reviewed and approved in D54065 but somehow that diff was messed up. Committing this again with the proper diff. llvm-svn: 346205	2018-11-06 04:55:20 +00:00
Mandeep Singh Grang	d9f70b1495	Revert "[COFF, ARM64] Implement InterlockedXor_ builtins" This reverts commit cc3d3cd0fbeb88412d332354c261ff139c4ede6b. llvm-svn: 346192	2018-11-06 01:14:24 +00:00
Mandeep Singh Grang	d8a4455d97	[COFF, ARM64] Implement InterlockedXor_ builtins Summary: This is fifth in a series of patches to move intrinsic definitions out of intrin.h. Reviewers: rnk, efriedma, mstorsjo, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54065 llvm-svn: 346191	2018-11-06 01:12:29 +00:00
Mandeep Singh Grang	ec62b31e2c	[COFF, ARM64] Implement InterlockedOr_ builtins This is fourth in a series of patches to move intrinsic definitions out of intrin.h. llvm-svn: 346190	2018-11-06 01:11:25 +00:00
Mandeep Singh Grang	6b880689f0	[COFF, ARM64] Implement InterlockedCompareExchange_ builtins Summary: This is third in a series of patches to move intrinsic definitions out of intrin.h. Reviewers: rnk, efriedma, mstorsjo, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54062 llvm-svn: 346189	2018-11-06 00:36:48 +00:00
Mandeep Singh Grang	7fa07e554d	[COFF, ARM64] Implement InterlockedExchange_ builtins Summary: Windows SDK needs these intrinsics to be proper builtins. This is second in a series of patches to move intrinsic defintions out of intrin.h. Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54046 llvm-svn: 346044	2018-11-02 21:18:23 +00:00
Mandeep Singh Grang	6cef4e5c87	[COFF, ARM64] Change setjmp for AArch64 Windows to use Intrinsic.sponentry Summary: ARM64 setjmp expects sp on entry instead of framepointer. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, eli.friedman, ssijaric, mstorsjo, rnk, compnerd Reviewed By: mgrang Subscribers: efriedma, javed.absar, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D53998 llvm-svn: 346024	2018-11-02 18:10:07 +00:00
Tim Northover	314fbfa1c4	Reapply Logging: make os_log buffer size an integer constant expression. The size of an os_log buffer is known at any stage of compilation, so making it a constant expression means that the common idiom of declaring a buffer for it won't result in a VLA. That allows the compiler to skip saving and restoring the stack pointer around such buffers. This also moves the OSLog and other FormatString helpers from libclangAnalysis to libclangAST to avoid a circular dependency. llvm-svn: 345971	2018-11-02 13:14:11 +00:00
Reid Kleckner	4dc0b1ac60	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882	2018-11-01 19:54:45 +00:00
Mandeep Singh Grang	5c39b6ab7f	Revert "[COFF, ARM64] Change setjmp for AArch64 Windows to use Intrinsic.sponentry" This reverts commit 619111f5ccf349b635e4987ec02d15777c571495. llvm-svn: 345872	2018-11-01 18:38:26 +00:00
Tim Northover	eedc0f0f1a	Revert "Reapply Logging: make os_log buffer size an integer constant expression." Still more dependency hell. llvm-svn: 345871	2018-11-01 18:37:42 +00:00
Tim Northover	c1ac697ab7	Reapply Logging: make os_log buffer size an integer constant expression. The size of an os_log buffer is known at any stage of compilation, so making it a constant expression means that the common idiom of declaring a buffer for it won't result in a VLA. That allows the compiler to skip saving and restoring the stack pointer around such buffers. This also moves the OSLog helpers from libclangAnalysis to libclangAST to avoid a circular dependency. llvm-svn: 345866	2018-11-01 18:04:49 +00:00
Tim Northover	d686dbbc7c	Revert "Logging: make os_log buffer size an integer constant expression. This also reverts a couple of follow-up commits trying to fix the dependency issues. Latest revision added a cyclic dependency that can't just be patched up in 5 minutes. llvm-svn: 345846	2018-11-01 16:15:24 +00:00
Tim Northover	a94ecc619b	Logging: make os_log buffer size an integer constant expression. The size of an os_log buffer is known at any stage of compilation, so making it a constant expression means that the common idiom of declaring a buffer for it won't result in a VLA. That allows the compiler to skip saving and restoring the stack pointer around such buffers. llvm-svn: 345828	2018-11-01 13:49:54 +00:00
Mandeep Singh Grang	be0e78e017	[COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic llvm-svn: 345808	2018-11-01 01:35:34 +00:00
Thomas Lively	6940328d02	[WebAssembly] Fix type names in truncation builtins Summary: Use the same convention as all the other WebAssembly builtin names. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D53724 llvm-svn: 345804	2018-11-01 01:03:17 +00:00
Mandeep Singh Grang	e7c7934a11	[COFF, ARM64] Change setjmp for AArch64 Windows to use Intrinsic.sponentry Summary: ARM64 setjmp expects sp on entry instead of framepointer. Reviewers: mgrang, rnk, TomTan, compnerd, mstorsjo, efriedma Reviewed By: mstorsjo Subscribers: javed.absar, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D53684 llvm-svn: 345792	2018-10-31 23:17:36 +00:00
Eli Friedman	b262d1631e	[ARM64] [Windows] Implement _InterlockedExchangeAdd_ builtins. These apparently need to be proper builtins to handle the Windows SDK. Differential Revision: https://reviews.llvm.org/D53916 llvm-svn: 345779	2018-10-31 21:31:09 +00:00
Bryan Chan	223307b3dc	[AArch64] Implement FP16FML intrinsics Generate the FP16FML intrinsics into arm_neon.h (AArch64 only for now). Add two new type modifiers to NeonEmitter to handle the new prototypes. Define __ARM_FEATURE_FP16FML when +fp16fml is enabled and guard the intrinsics with the macro in arm_neon.h. Based on a patch by Gao Yiling. Differential Revision: https://reviews.llvm.org/D53633 llvm-svn: 345344	2018-10-25 23:47:00 +00:00
Thomas Lively	d4bf99a540	[WebAssembly] Bitselect and min/max builtins Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D53685 llvm-svn: 345301	2018-10-25 19:11:41 +00:00
Thomas Lively	535b4df75a	[WebAssembly] Lower to target-independent saturating add Summary: Goes along with D53721. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D53722 llvm-svn: 345300	2018-10-25 19:06:15 +00:00
Craig Topper	4d8ced1807	[X86] Add support for more than 32 features for __builtin_cpu_is libgcc supports more than 32 features by adding a new 32-bit variable __cpu_features2. This adds the clang support for checking these feature bits. Patches for compiler-rt and llvm to support this are coming as well. Probably still need an additional patch for target multiversioning in clang. Differential Revision: https://reviews.llvm.org/D53458 llvm-svn: 344832	2018-10-20 03:51:52 +00:00
Craig Topper	9c8f3c9654	[X86] When checking the bits in cpu_features for function multiversioning dispatcher in the resolver, make sure all the required bits are set. Not just one of them Summary: The multiversioning code repurposed the code from __builtin_cpu_supports for checking if a single feature is enabled. That code essentially performed (_cpu_features & (1 << C)) != 0. But with the multiversioning path, the mask is no longer guaranteed to be a power of 2. So we return true anytime any one of the bits in the mask is set not just all of the bits. The correct check is (_cpu_features & mask) == mask Reviewers: erichkeane, echristo Reviewed By: echristo Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D53460 llvm-svn: 344824	2018-10-20 01:30:00 +00:00
Mandeep Singh Grang	2147b1af95	[COFF, ARM64] Add _ReadStatusReg and_WriteStatusReg intrinsics Reviewers: rnk, compnerd, mstorsjo, efriedma, TomTan, haripul, javed.absar Reviewed By: efriedma Subscribers: dmajor, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D53115 llvm-svn: 344765	2018-10-18 23:35:35 +00:00
Yaxun Liu	aae1e87f4b	AMDGPU: add __builtin_amdgcn_update_dpp Emit llvm.amdgcn.update.dpp for both __builtin_amdgcn_mov_dpp and __builtin_amdgcn_update_dpp. The first argument to llvm.amdgcn.update.dpp will be undef for __builtin_amdgcn_mov_dpp. Differential Revision: https://reviews.llvm.org/D52320 llvm-svn: 344665	2018-10-17 02:32:26 +00:00
Thomas Lively	07ce6df879	[WebAssembly] Saturating float-to-int builtins Summary: Depends on D53007 and D53004. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D53009 llvm-svn: 344205	2018-10-11 00:07:55 +00:00
Mandeep Singh Grang	df7929676d	[COFF, ARM64] Add _InterlockedAdd intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D52811 llvm-svn: 343894	2018-10-05 21:57:41 +00:00
Mandeep Singh Grang	15e0f7fa28	[COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D52807 llvm-svn: 343881	2018-10-05 19:49:36 +00:00
Thomas Lively	d2a293c562	[WebAssembly] abs and sqrt builtins Summary: Depends on D52910. Reviewers: aheejin, dschuff, craig.topper Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D52913 llvm-svn: 343838	2018-10-05 01:02:54 +00:00
Thomas Lively	291d75b0de	[WebAssembly] any_true and all_true builtins Summary: Depends on D52858. Reviewers: aheejin, dschuff, craig.topper Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D52910 llvm-svn: 343837	2018-10-05 00:59:37 +00:00
Thomas Lively	9034a47e79	[WebAssembly] saturating arithmetic builtins Summary: Depends on D52856. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D52858 llvm-svn: 343836	2018-10-05 00:58:56 +00:00
Thomas Lively	a347436f09	[WebAssembly] __builtin_wasm_replace_lane_* builtins Summary: Depends on D52852. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D52856 llvm-svn: 343835	2018-10-05 00:58:07 +00:00
Thomas Lively	d6792c0c28	[WebAssembly] __builtin_wasm_extract_lane_* builtins Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D52852 llvm-svn: 343834	2018-10-05 00:54:44 +00:00
Mandeep Singh Grang	ecc82ef0c2	[COFF, ARM64] Add __getReg intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma Reviewed By: efriedma Subscribers: peter.smith, efriedma, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D52838 llvm-svn: 343824	2018-10-04 22:32:42 +00:00
Mandeep Singh Grang	aef87980a9	[COFF, ARM64] Add _ReadWriteBarrier intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar Reviewed By: rnk Subscribers: kristof.beyls, chrib, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D52809 llvm-svn: 343699	2018-10-03 17:24:21 +00:00
Craig Topper	fb5d9f2849	[X86] For lzcnt/tzcnt intrinsics use cttz/ctlz intrinsics with zero_undef flag set to false. Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction. By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction. Differential Revision: https://reviews.llvm.org/D52392 llvm-svn: 343126	2018-09-26 17:01:44 +00:00
QingShan Zhang	accb65b994	[PowerPC] [Clang] Add vector int128 pack/unpack builtins unsigned long long builtin_unpack_vector_int128 (vector int128_t, int); vector int128_t builtin_pack_vector_int128 (unsigned long long, unsigned long long); Builtins should behave the same way as in GCC. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52074 llvm-svn: 342614	2018-09-20 05:04:57 +00:00
Craig Topper	ecf2e2fe31	[X86] Custom emit __builtin_rdtscp so we can emit an explicit store for the out parameter This is the clang side of D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed. Differential Revision: https://reviews.llvm.org/D51805 llvm-svn: 341699	2018-09-07 19:14:24 +00:00
Craig Topper	52a61fc2ac	[X86] Modify addcarry/subborrow builtins to emit an 2 result and intrinsic and an store instruction. This is the clang side of D51769. The llvm intrinsics now return two results instead of using an out parameter. Differential Revision: https://reviews.llvm.org/D51771 llvm-svn: 341678	2018-09-07 16:58:57 +00:00
Craig Topper	d88f76a891	[X86] Add ktest intrinsics to match gcc and icc. These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc. Includes these intrinsics: _ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8 _ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8 _ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8 _ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8 llvm-svn: 341265	2018-08-31 22:29:56 +00:00
Craig Topper	42a4d0822e	[X86] Add k-mask conversion and load/store instrinsics to match gcc and icc. This adds: _cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64 _cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64 _load_mask8, _load_mask16, _load_mask32, _load_mask64 _store_mask8, _store_mask16, _store_mask32, _store_mask64 These are currently missing from the Intel Intrinsics Guide webpage. llvm-svn: 341251	2018-08-31 20:41:06 +00:00
Craig Topper	2aa8efc820	[X86] Add kshift intrinsics to match gcc and icc. This adds the following intrinsics: _kshiftli_mask8 _kshiftli_mask16 _kshiftli_mask32 _kshiftli_mask64 _kshiftri_mask8 _kshiftri_mask16 _kshiftri_mask32 _kshiftri_mask64 llvm-svn: 341234	2018-08-31 18:22:52 +00:00
Craig Topper	a65bf65e0b	[X86] Add kadd intrinsics to match gcc and icc. This adds the following intrinsics: _kadd_mask64 _kadd_mask32 _kadd_mask16 _kadd_mask8 These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc. llvm-svn: 340879	2018-08-28 22:32:14 +00:00
Craig Topper	cb5fd56c7f	[X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks. This matches gcc and icc despite not being documented in the Intel Intrinsics Guide. llvm-svn: 340798	2018-08-28 06:28:25 +00:00
Craig Topper	c330ca8611	[X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers. This also adds a second intrinsic name for the 16-bit mask versions. These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed. llvm-svn: 340719	2018-08-27 06:20:22 +00:00
Nico Weber	14a577bfd1	Eliminate instances of `EmitScalarExpr(E->getArg(n))` in EmitX86BuiltinExpr(). EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that work again. This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice previously. This change fixes that bug. https://reviews.llvm.org/D50979 llvm-svn: 340348	2018-08-21 22:19:55 +00:00
Sanjay Patel	ad82390d3f	[CodeGen] add rotate builtins that map to LLVM funnel shift This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing) with 1 change: Remove the changes to make microsoft builtins also use the LLVM intrinsics. This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: https://reviews.llvm.org/D49242 With improved codegen in: https://reviews.llvm.org/rL337966 https://reviews.llvm.org/rL339359 And basic IR optimization added in: https://reviews.llvm.org/rL338218 https://reviews.llvm.org/rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 llvm-svn: 340141	2018-08-19 16:50:30 +00:00
Sanjay Patel	a09ae4b8a6	revert r340137: [CodeGen] add rotate builtins At least a couple of bots (gcc host compiler on PPC only?) are showing the compiler dying while trying to compile. llvm-svn: 340138	2018-08-19 15:31:42 +00:00
Sanjay Patel	446529b0d9	[CodeGen] add/fix rotate builtins that map to LLVM funnel shift (retry) This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing) with 2 changes: 1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash). 2. The original commit had a formatting bug in the docs (missing an underscore). Original commit message: This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: https://reviews.llvm.org/D49242 With improved codegen in: https://reviews.llvm.org/rL337966 https://reviews.llvm.org/rL339359 And basic IR optimization added in: https://reviews.llvm.org/rL338218 https://reviews.llvm.org/rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 llvm-svn: 340137	2018-08-19 14:44:47 +00:00
Sanjay Patel	39b4dd2da7	revert r340135: [CodeGen] add rotate builtins At least a couple of bots (PPC only?) are showing the compiler dying while trying to compile: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/11065/steps/build%20stage%201/logs/stdio http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/18267/steps/build%20stage%201/logs/stdio llvm-svn: 340136	2018-08-19 13:48:06 +00:00
Sanjay Patel	9116f0438c	[CodeGen] add rotate builtins This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: D49242 With improved codegen in: rL337966 rL339359 And basic IR optimization added in: rL338218 rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 llvm-svn: 340135	2018-08-19 13:12:40 +00:00
Nico Weber	b2c53d3393	Make __shiftleft128 / __shiftright128 real compiler built-ins. r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h. Microsoft's STL plans on using these functions, and they're using intrin0.h which just has declarations of built-ins to not pull in the huge intrin.h header in the standard library headers. That requires that these functions are real built-ins. https://reviews.llvm.org/D50907 llvm-svn: 340048	2018-08-17 17:19:06 +00:00
Craig Topper	72a7606433	[X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select builtin instead. llvm-svn: 339845	2018-08-16 07:28:06 +00:00
Tomasz Krupa	e8cf972d86	[X86] Lowering addus/subus intrinsics to native IR Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D46892 llvm-svn: 339651	2018-08-14 08:01:38 +00:00
Stephen Kelly	f2ceec4811	Port getLocStart -> getBeginLoc Reviewers: teemperor! Subscribers: jholewinski, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D50350 llvm-svn: 339385	2018-08-09 21:08:08 +00:00
Craig Topper	0a4f6be443	[Builtins] Implement __builtin_clrsb to be compatible with gcc gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros. This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2. Differential Revision: https://reviews.llvm.org/D50168 llvm-svn: 339282	2018-08-08 19:55:52 +00:00
Scott Linder	f8b3df4dec	[OpenCL] Restore r338899 (reverted in r338904), fixing stack-use-after-return Always emit alloca in entry block for enqueue_kernel builtin. Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block. llvm-svn: 339150	2018-08-07 15:52:49 +00:00
Vlad Tsyrklevich	c7d3d34b98	Revert "[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin" This reverts commit r338899, it was causing ASan test failures on sanitizer-x86_64-linux-fast. llvm-svn: 338904	2018-08-03 17:47:58 +00:00
Scott Linder	91f578467c	[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block. Differential Revision: https://reviews.llvm.org/D50104 llvm-svn: 338899	2018-08-03 15:50:52 +00:00
Heejin Ahn	00aa81b4df	[WebAssembly] Support for atomic.wait / atomic.wake builtins Summary: Add support for atomic.wait / atomic.wake builtins based on the Wasm thread proposal. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D49396 llvm-svn: 338771	2018-08-02 21:44:40 +00:00
Matt Arsenault	c65f966d76	Try to make builtin address space declarations not useless The way address space declarations for builtins currently work is nearly useless. The code assumes the address spaces used for builtins is a confusingly named "target address space" from user code using __attribute__((address_space(N))) that matches the builtin declaration. There's no way to use this to declare a builtin that returns a language specific address space. The terminology used is highly cofusing since it has nothing to do with the the address space selected by the target to use for a language address space. This feature is essentially unused as-is. AMDGPU and NVPTX are the only in-tree targets attempting to use this. The AMDGPU builtins certainly do not behave as intended (i.e. all of the builtins returning pointers can never compile because the numbered address space never matches the expected named address space). The NVPTX builtins are missing tests for some, and the others seem to rely on an implicit addrspacecast. Change the used address space for builtins based on a target hook to allow using a language address space for a builtin. This allows the same builtin declaration to be used for multiple languages with similarly purposed address spaces (e.g. the same AMDGPU builtin can be used in OpenCL and CUDA even though the constant address spaces are arbitarily different). This breaks the possibility of using arbitrary numbered address spaces alongside the named address spaces for builtins. If this is an issue we probably need to introduce another builtin declaration character to distinguish language address spaces from so-called "target address spaces". llvm-svn: 338707	2018-08-02 12:14:28 +00:00
Fangrui Song	6907ce2f8f	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338291	2018-07-30 19:24:48 +00:00
Ivan A. Kosarev	8264bb8d34	[NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsics This patch adds support for vrndi_f32() and vrndiq_f32() intrinsics in AArch32 mode and for vrndns_f32() intrinsic in AArch64 mode. Differential Revision: https://reviews.llvm.org/D48829 llvm-svn: 337690	2018-07-23 13:26:37 +00:00
Erich Keane	3efe00206f	Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552	2018-07-20 14:13:28 +00:00
Fangrui Song	99337e246c	Change \t to spaces llvm-svn: 337530	2018-07-20 08:19:20 +00:00
Nemanja Ivanovic	2600b839d5	NFC: Remove extraneous semicolons as pointed out in the differential review The commit for https://reviews.llvm.org/D49424 missed the comment about the extraneous semicolons. Remove them. llvm-svn: 337451	2018-07-19 12:49:27 +00:00
Nemanja Ivanovic	1ac56bd33f	[PowerPC] Handle __builtin_xxpermdi the same way as GCC does The codegen for this builtin was initially implemented to match GCC. However, due to interest from users GCC changed behaviour to account for the big endian bias of the instruction and correct it. This patch brings the handling inline with GCC. Fixes https://bugs.llvm.org/show_bug.cgi?id=38192 Differential Revision: https://reviews.llvm.org/D49424 llvm-svn: 337449	2018-07-19 12:44:15 +00:00
Mandeep Singh Grang	0054f48b44	[COFF] Add more missing MSVC ARM64 intrinsics Summary: Added the following intrinsics: _BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64 _InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64, _InterlockedExchangeAdd64, _InterlockedExchangeSub64, _InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64. Reviewers: compnerd, mstorsjo, rnk, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49445 llvm-svn: 337327	2018-07-17 22:03:24 +00:00
Craig Topper	1faf953d75	[X86] Remove custom handling for __builtin_ia32_divss_round_mask and __builtin_ia32_divsd_round_mask. llvm-svn: 336628	2018-07-10 00:50:03 +00:00
Craig Topper	638426fc36	[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622	2018-07-10 00:37:25 +00:00
Craig Topper	74c10e3236	[Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583	2018-07-09 19:00:16 +00:00
Craig Topper	8a8d72794f	[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma llvm-svn: 336507	2018-07-08 01:10:47 +00:00
Craig Topper	f89f62a680	[X86] When creating a select for scalar masked sqrt and div builtins make sure we optimize the all ones mask case. This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases. Factor the code into a helper function to make this easy. llvm-svn: 336472	2018-07-06 22:46:52 +00:00
Craig Topper	be4c2933a2	[X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with native IR using llvm.fma intrinsic. This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns. llvm-svn: 336417	2018-07-06 07:14:47 +00:00
Craig Topper	284c5f342c	[X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388	2018-07-05 20:38:31 +00:00
Gabor Buella	9679eb6527	[X86] Fix some vector cmp builtins - TRUE/FALSE predicates This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355	2018-07-05 14:26:56 +00:00
Craig Topper	8bf793fb35	[X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead. llvm-svn: 335945	2018-06-29 05:43:33 +00:00
Craig Topper	851f363691	[X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm. llvm-svn: 335745	2018-06-27 15:57:57 +00:00
Ivan A. Kosarev	a9f484ac4a	[NEON] Support vldNq intrinsics in AArch32 (Clang part) This patch reworks the support for dup NEON intrinsics as described in D48439. Differential Revision: https://reviews.llvm.org/D48440 llvm-svn: 335734	2018-06-27 13:58:43 +00:00
Craig Topper	4ef61aecbd	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes. llvm-svn: 335564	2018-06-26 00:44:02 +00:00
Gabor Buella	716863c820	[X86] Lower _mm[256\|512]_cmp[.]_mask intrinsics to native llvm IR Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339	2018-06-22 11:59:16 +00:00
Craig Topper	342b095689	[X86] Update handling in CGBuiltin to be tolerant of out of range immediates. D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308	2018-06-21 23:39:47 +00:00
Tomasz Krupa	83ba6fa98d	Fix a bug introduced by rL334850 Summary: All *_sqrt_round_s[s\|d] intrinsics should execute a square root on zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around. Reviewers: itaraban, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D48288 llvm-svn: 334964	2018-06-18 17:57:05 +00:00
Tomasz Krupa	f1792bb3d6	[X86] Lowering sqrt intrinsics to native IR Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850	2018-06-15 18:05:59 +00:00
Luke Geeson	da2b2e8c26	[AArch64] Reverted rC334696 with Clang VCVTA test fix llvm-svn: 334820	2018-06-15 10:10:45 +00:00
Craig Topper	31730ae761	[X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773	2018-06-14 22:02:35 +00:00
Tomasz Krupa	82aa42af49	[X86] Lowering Mask Scalar intrinsics to native IR (Clang part) Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741	2018-06-14 17:36:23 +00:00
Luke Geeson	bb399f8013	[AArch64] reverting rC334693 due to build failures llvm-svn: 334696	2018-06-14 08:59:33 +00:00
Luke Geeson	010bbbf390	[AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A llvm-svn: 334693	2018-06-14 08:28:56 +00:00
Mandeep Singh Grang	2d28383097	[COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility. Reviewers: mstorsjo, compnerd, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D48132 llvm-svn: 334639	2018-06-13 18:49:35 +00:00
Craig Topper	201b9dd334	[X86] Fix operand order in the shuffle created for blend builtins. This was broken when the builtin was added in r334249. llvm-svn: 334422	2018-06-11 17:06:01 +00:00
Craig Topper	3cce6a7ed9	[X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins. Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them. Reviewers: RKSimon, delena, spatel, GBuella Reviewed By: RKSimon Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D47693 llvm-svn: 334366	2018-06-10 17:27:05 +00:00
Ivan A. Kosarev	73c76c35a5	[NEON] Support VST1xN intrinsics in AArch32 mode (Clang part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47446 llvm-svn: 334362	2018-06-10 09:28:10 +00:00
Craig Topper	88097d9355	[X86] Add back some masked vector truncate builtins. Custom IRgen a a few others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331	2018-06-08 21:50:08 +00:00
Craig Topper	5f50f33806	[X86] Fold masking into subvector extract builtins. I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330	2018-06-08 21:50:07 +00:00
Craig Topper	03f4f04b91	[X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking. llvm-svn: 334311	2018-06-08 18:00:25 +00:00
Craig Topper	422a1bbb84	[X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking. llvm-svn: 334266	2018-06-08 07:18:33 +00:00
Craig Topper	03de166ccd	[X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking. llvm-svn: 334265	2018-06-08 06:13:16 +00:00
Craig Topper	3428beeb2f	[X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261	2018-06-08 03:24:47 +00:00
Craig Topper	acf5601961	[X86] Add builtins for vpermilps/pd instructions to enable target feature checking. llvm-svn: 334256	2018-06-08 00:59:27 +00:00
Craig Topper	7d17d7278b	[X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range. llvm-svn: 334249	2018-06-08 00:00:21 +00:00
Craig Topper	9392136414	[X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking. llvm-svn: 334244	2018-06-07 23:03:08 +00:00
Reid Kleckner	aa46ed9278	[MS] Re-add support for the ARM interlocked bittest intrinscs Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239	2018-06-07 21:39:04 +00:00
Craig Topper	e56819eb69	[X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking. We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237	2018-06-07 21:27:41 +00:00
Craig Topper	d3623155a2	[X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208	2018-06-07 17:28:03 +00:00
Craig Topper	b92c77d176	[X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159	2018-06-07 02:46:02 +00:00
Reid Kleckner	11c99ed05f	[MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 libvcruntime.lib Factor out the common setjmp call emission code. Based on a patch by Chris January Differential Revision: https://reviews.llvm.org/D47784 llvm-svn: 334112	2018-06-06 18:39:47 +00:00
Reid Kleckner	05df851327	Fix std::tuple errors llvm-svn: 334060	2018-06-06 01:44:10 +00:00
Reid Kleckner	368d52b7e0	Implement bittest intrinsics generically for non-x86 platforms I tested these locally on an x86 machine by disabling the inline asm codepath and confirming that it does the same bitflips as we do with the inline asm. Addresses code review feedback. llvm-svn: 334059	2018-06-06 01:35:08 +00:00
Craig Topper	f3914b74c1	[X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057	2018-06-06 00:24:55 +00:00
Craig Topper	6b5b5ce06c	[X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only use it with an index of 0. This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand. llvm-svn: 334054	2018-06-05 22:40:03 +00:00
Reid Kleckner	1d9c249db5	Reimplement the bittest intrinsic family as builtins with inline asm We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978	2018-06-05 01:33:40 +00:00
Reid Kleckner	89fbd55145	Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958	2018-06-04 21:39:20 +00:00
Craig Topper	6fb26f93ef	[X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call. llvm-svn: 333853	2018-06-03 19:42:59 +00:00
Craig Topper	f886b44693	[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC llvm-svn: 333851	2018-06-03 19:02:57 +00:00
Craig Topper	8508c1db98	Revert r333848 "[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC" Looks like I missed some changes to make this work. llvm-svn: 333850	2018-06-03 18:41:22 +00:00
Craig Topper	d4a610f6f7	[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC llvm-svn: 333848	2018-06-03 18:08:37 +00:00
Craig Topper	21f56f5b9c	[X86] When emitting masked loads/stores don't check for all ones mask. This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics. We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too. llvm-svn: 333847	2018-06-03 18:08:36 +00:00
Ivan A. Kosarev	9c40c0ad0c	[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333829	2018-06-02 17:42:59 +00:00
John McCall	280c656031	Cap "voluntary" vector alignment at 16 for all Darwin platforms. This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791	2018-06-01 21:34:26 +00:00
Dan Gohman	9f8ee03772	[WebAssembly] Update to the new names for the memory builtin functions. The WebAssembly committee has decided on the names `memory.size` and `memory.grow` for the memory intrinsics, so update the clang builtin functions to follow those names, keeping both sets of old names in place for compatibility. llvm-svn: 333712	2018-06-01 00:05:51 +00:00
Gabor Buella	70d8d51073	[X86] Lowering FMA intrinsics to native IR (Clang part) This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555	2018-05-30 15:27:49 +00:00
Simon Tatham	89e31fa7fc	Support __iso_volatile_load8 etc on aarch64-win32. These intrinsics are used by MSVC's header files on AArch64 Windows as well as AArch32, so we should support them for both targets. I've factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate functions that EmitAArch64BuiltinExpr can call as well. Reviewers: javed.absar, mstorsjo Reviewed By: mstorsjo Subscribers: kristof.beyls, cfe-commits Differential Revision: https://reviews.llvm.org/D47476 llvm-svn: 333513	2018-05-30 07:54:05 +00:00
Craig Topper	f2043b08b4	[X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333061	2018-05-23 04:51:54 +00:00
Sanjay Patel	74c7fb002f	[CodeGen] use nsw negation for builtin abs The clang builtins have the same semantics as the stdlib functions. The stdlib functions are defined in section 7.20.6.1 of the C standard with: "If the result cannot be represented, the behavior is undefined." That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would be UB/poison. Differential Revision: https://reviews.llvm.org/D47202 llvm-svn: 333038	2018-05-22 23:02:13 +00:00
Craig Topper	8e3689c066	[X86] Remove mask argument from some builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333027	2018-05-22 20:48:24 +00:00
Sanjay Patel	1ff6b27940	[CodeGen] produce the LLVM canonical form of abs We chose the 'slt' form as canonical in IR with: rL332819 ...so we should generate that form directly for efficiency. llvm-svn: 332989	2018-05-22 15:36:50 +00:00
Craig Topper	288bd2e5a0	[X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead. Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions. To avoid that just generate IR directly in CGBuiltin.cpp. Differential Revision: https://reviews.llvm.org/D47125 llvm-svn: 332891	2018-05-21 20:58:23 +00:00
Daniil Fukalov	1b14a3ad3d	[AMDGPU] fixes for lds f32 builtins 1. added restrictions to memory scope, order and volatile parameters 2. added custom processing for these builtins - currently is not used code, needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm tree) 3. builtins renamed as requested Differential Revision: https://reviews.llvm.org/D43281 llvm-svn: 332848	2018-05-21 16:18:07 +00:00
Craig Topper	74ac0eda68	[X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector. This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs. Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well. llvm-svn: 331958	2018-05-10 05:43:43 +00:00
Craig Topper	2b248849ae	[Builtins] Improve the IR emitted for MSVC compatible rotr/rotl builtins to match what the middle and backends understand Previously we emitted something like rotl(x, n) { n &= bitwidth-1; return n != 0 ? ((x << n) \| (x >> (bitwidth - n)) : x; } We use a select to avoid the undefined behavior on the (bitwidth - n) shift. The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select. A better pattern is (x << (n & mask)) \| (x << (-n & mask)) where mask is bitwidth - 1. Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it. Differential Revision: https://reviews.llvm.org/D46656 llvm-svn: 331943	2018-05-10 00:05:13 +00:00
Yaxun Liu	3cab24aa4f	[OpenCL] Fix typos in emitted enqueue kernel function names Two typos: vaarg => vararg get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple Differential Revision: https://reviews.llvm.org/D46601 llvm-svn: 331895	2018-05-09 17:07:06 +00:00
Adrian Prantl	9fc8faf9e6	Remove \brief commands from doxygen comments. This is similar to the LLVM change https://reviews.llvm.org/D46290. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46320 llvm-svn: 331834	2018-05-09 01:00:01 +00:00
Oliver Stannard	2fcee8bd52	[ARM,AArch64] Add intrinsics for dot product instructions The ACLE spec which describes these intrinsics hasn't been published yet, but this is based on the final draft which will be published soon, and these have already been implemented by GCC. Differential revision: https://reviews.llvm.org/D46109 llvm-svn: 331039	2018-04-27 14:03:32 +00:00
Sven van Haastregt	4700faa28e	[OpenCL] Add separate read_only and write_only pipe IR types SPIR-V encodes the read_only and write_only access qualifiers of pipes, so separate LLVM IR types are required to target SPIR-V. Other backends may also find this useful. These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which replace `opencl.pipe_t`. This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...) which took a read_only pipe with separate versions for read_only and write_only pipes, namely: * __get_pipe_num_packets_ro(...) * __get_pipe_num_packets_wo(...) * __get_pipe_max_packets_ro(...) * __get_pipe_max_packets_wo(...) These separate versions exist to avoid needing a bitcast to one of the two qualified pipe types. Patch by Stuart Brady. Differential Revision: https://reviews.llvm.org/D46015 llvm-svn: 331026	2018-04-27 10:37:04 +00:00
Chandler Carruth	16429acacb	[x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics The LLVM commit introduces a crash in LLVM's instruction selection. I filed http://llvm.org/PR37260 with the test case. llvm-svn: 330997	2018-04-26 21:46:01 +00:00
Alexander Ivchenko	d96ddccdb4	Lowering x86 adds/addus/subs/subus intrinsics (clang) This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44786 llvm-svn: 330323	2018-04-19 12:15:11 +00:00
Artem Belevich	0ae8590354	[NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions. The new instructions were added added for sm_70+ GPUs in CUDA-9.1. Differential Revision: https://reviews.llvm.org/D45068 llvm-svn: 330296	2018-04-18 21:51:48 +00:00
Keith Wyss	f437e35671	[XRay] Add clang builtin for xray typed events. Summary: A clang builtin for xray typed events. Differs from __xray_customevent(...) by the presence of a type tag that is vended by compiler-rt in typical usage. This allows xray handlers to expand logged events with their type description and plugins to process traced events based on type. This change depends on D45633 for the intrinsic definition. Reviewers: dberris, pelikan, rnk, eizan Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D45716 llvm-svn: 330220	2018-04-17 21:32:43 +00:00
Aaron Ballman	fe93546b11	Add modifiers for unsigned char and signed char field printing for __builtin_dump_struct. Patch by Paul Semel. llvm-svn: 330188	2018-04-17 14:00:06 +00:00
Aaron Ballman	b6a7702297	Add checks for format specifiers used by __builtin_dump_struct and added a new specifier for null-terminated constant strings. Patch by Paul Semel. llvm-svn: 330185	2018-04-17 11:57:47 +00:00
Ivan A. Kosarev	9cdb2c75d9	[NEON] Support vrndns_f32 intrinsic Differential Revision: https://reviews.llvm.org/D45515 llvm-svn: 330012	2018-04-13 12:46:02 +00:00
Dean Michael Berris	488f7c2b67	[XRay][clang] Add flag to choose instrumentation bundles Summary: This change addresses http://llvm.org/PR36926 by allowing users to pick which instrumentation bundles to use, when instrumenting with XRay. In particular, the flag `-fxray-instrumentation-bundle=` has four valid values: - `all`: the default, emits all instrumentation kinds - `none`: equivalent to -fnoxray-instrument - `function`: emits the entry/exit instrumentation - `custom`: emits the custom event instrumentation These can be combined either as comma-separated values, or as repeated flag values. Reviewers: echristo, kpw, eizan, pelikan Reviewed By: pelikan Subscribers: mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D44970 llvm-svn: 329985	2018-04-13 02:31:58 +00:00
Aaron Ballman	0652534131	Introduce a new builtin, __builtin_dump_struct, that is useful for dumping structure contents at runtime in circumstances where debuggers may not be easily available (such as in kernel work). Patch by Paul Semel. llvm-svn: 329762	2018-04-10 21:58:13 +00:00

... 3 4 5 6 7 ...

1308 Commits