llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	c7536a5d60	AMDGPU: Remove legacy ldexp builtin llvm-svn: 275623	2016-07-15 21:33:06 +00:00
Matt Arsenault	c86671da09	AMDGPU: Update for rsq intrinsic changes llvm-svn: 275622	2016-07-15 21:33:02 +00:00
Wei Ding	ea41f356bb	AMDGPU: Add Clang Builtin for v_lerp_u8 Differential Revision: http://reviews.llvm.org/D22380 llvm-svn: 275577	2016-07-15 16:43:03 +00:00
Jan Vesely	d7e03a5bd9	AMDGPU: Export workitem builtins Reviewers: tstellardAMD Differential Revision: http://reviews.llvm.org/D20299 llvm-svn: 275030	2016-07-10 22:38:04 +00:00
Craig Topper	f2f1a099a7	[CodeGen] Use llvm::Type::getVectorNumElements instead of casting to llvm::VectorType and calling getNumElements. This is equivalent and shorter. llvm-svn: 274823	2016-07-08 02:17:35 +00:00
Craig Topper	0160063aeb	[X86] Reuse existing lambda and remove unnecessary argument from vector cmp builtin handling. NFC llvm-svn: 274821	2016-07-08 01:57:24 +00:00
Craig Topper	925ef0a135	[X86] Remove a couple calls to create V2F64 and V4F32 types for builtin handling. Just get the type from the operand of the builtin instead. NFC llvm-svn: 274820	2016-07-08 01:48:44 +00:00
Craig Topper	425d02d33e	[X86] Use native IR for immediate values 0-7 of packed fp cmp builtins. This makes them the same as what is done when using the SSE builtins for these same encodings. llvm-svn: 274608	2016-07-06 06:27:31 +00:00
Craig Topper	46e7555d4b	[AVX512] Use the generic ctlz intrinsic to implement the vplzcntd/q builtins. llvm-svn: 274603	2016-07-06 04:24:29 +00:00
Anastasia Stulova	db7a31cce7	[OpenCL] An implementation of device side enqueue (DSE) from OpenCL v2.0 s6.13.17. - Added new Builtins: enqueue_kernel, get_kernel_work_group_size and get_kernel_preferred_work_group_size_multiple. These Builtins use custom check to diagnose parameters of the passed Blocks i. e. variable number of 'local void*' type params, and check different overloads specified in Table 6.31 of OpenCL v2.0. - IR is generated as an internal library call for each OpenCL Builtin, reusing ObjC Block implementation. Review: http://reviews.llvm.org/D20249 llvm-svn: 274540	2016-07-05 11:31:24 +00:00
Anastasia Stulova	7f8d6dc0ef	[OpenCL] Make OpenCL Builtins added according to the right version. Currently we only have OpenCL 2.0 Builtins i.e. pipes or address space conversions. They have to be added only in the version 2.0 compilation mode to make the identifiers available for use in the other versions. Review: http://reviews.llvm.org/D20249 llvm-svn: 274509	2016-07-04 16:07:18 +00:00
Craig Topper	ac1823f6e9	[AVX512] Modify what indices we emit for the zero vector we use for zero extension of the result of a v2i1 or v4i1 masked compare. This way we emit something that the backend easily interprets as a concatenation rather than a true shuffle. This delivers slightly better codegen with the current backend capabilities. llvm-svn: 274484	2016-07-04 07:09:46 +00:00
Matt Arsenault	f652caea65	Emit more intrinsics for builtin functions This is important for building libclc. Since r273039 tests are failing due to now emitting calls to these functions instead of emitting the DAG node. The libm function names are implemented for OpenCL, and should call the locally defined versions, so -fno-builtin is used. The IR Some functions use the __builtins and expect the intrinsics to be emitted. Without this we end up with nobuiltin calls to intrinsics or to unsupported library calls. llvm-svn: 274370	2016-07-01 17:38:14 +00:00
Igor Breger	2c880cf9b1	[AVX512] Zero extend cmp intrinsic return value. Differential Revision: http://reviews.llvm.org/D21746 llvm-svn: 274110	2016-06-29 08:14:17 +00:00
Matt Arsenault	64665bc50d	AMDGPU: Add builtin to read exec mask llvm-svn: 273965	2016-06-28 00:13:17 +00:00
Craig Topper	d1691c7026	[AVX512] Replace masked integer cmp and ucmp builtins with native IR. llvm-svn: 273378	2016-06-22 04:47:58 +00:00
Simon Pilgrim	d39d026324	[X86][SSE4A] Use native IR for mask movntsd/movntss intrinsics. Depends on llvm side commit r273002. llvm-svn: 273003	2016-06-17 14:28:16 +00:00
Ranjeet Singh	ca2b3e7b5c	[ARM] Add mrrc/mrrc2 intrinsics and update existing mcrr/mcrr2 intrinsics. Reapplying patch in r272777 which was reverted because the llvm patch which added support for generating the mcrr/mcrr2 instructions from the intrinsic was causing an assertion failure. This has now been fixed in llvm. llvm-svn: 272983	2016-06-17 00:59:41 +00:00
Sanjay Patel	dbd68dd09d	[x86] generate IR for AVX2 integer min/max builtins Sibling patch to r272932: http://reviews.llvm.org/rL272932 llvm-svn: 272933	2016-06-16 18:45:01 +00:00
Marcin Koscielnicki	a46fade624	[Builtin] Make __builtin_thread_pointer target-independent. This is now supported for ARM, AArch64, PowerPC, SystemZ, SPARC, Mips. Differential Revision: http://reviews.llvm.org/D19589 llvm-svn: 272893	2016-06-16 13:41:54 +00:00
Sanjay Patel	280cfd1a69	[x86] translate SSE packed FP comparison builtins to IR As noted in the code comment, a potential follow-on would be to remove the builtins themselves. Other than ord/unord, this already works as expected. Eg: typedef float v4sf __attribute__((__vector_size__(16))); v4sf fcmpgt(v4sf a, v4sf b) { return a > b; } Differential Revision: http://reviews.llvm.org/D21268 llvm-svn: 272840	2016-06-15 21:20:04 +00:00
Sanjay Patel	7495ec026e	[x86] generate IR for SSE integer min/max builtins Sibling patch to r272806: http://reviews.llvm.org/rL272806 llvm-svn: 272807	2016-06-15 17:18:50 +00:00
Ranjeet Singh	d48760da64	Reverting r272777 because one of the tests added in the llvm patch is causing an assertion to fail. llvm-svn: 272790	2016-06-15 14:21:28 +00:00
Craig Topper	a54c21e742	[AVX512] Use native IR for mask pcmpeq/pcmpgt intrinsics. llvm-svn: 272787	2016-06-15 14:06:34 +00:00
Ranjeet Singh	8d5ad5bdf2	[ARM] Add mrrc/mrrc2 intrinsics and update existing mcrr/mcrr2 intrinsics. Patch adds intrinsics for mrrc/mrrc2. The intrinsics for mrrc/mrrc2 return a single uint64_t to represent two 32 bit values. The mcrr/mcrr2 intrinsic was changed to accept a single uint64_t instead of two 32 bit values as the input for consistency. Differential Revision: http://reviews.llvm.org/D21179 llvm-svn: 272777	2016-06-15 11:32:18 +00:00
Simon Pilgrim	532de1ceb9	Fix unused variable warning llvm-svn: 272541	2016-06-13 10:05:19 +00:00
Simon Pilgrim	beca5f295c	[Clang][X86] Convert non-temporal store builtins to generic __builtin_nontemporal_store in headers We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores. The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load Differential Revision: http://reviews.llvm.org/D21272 llvm-svn: 272540	2016-06-13 09:57:52 +00:00
Craig Topper	d1cb4ceacd	[CodeGen] Update to use an ArrayRef of uint32_t instead of int in calls to CreateShuffleVector to match llvm interface change. llvm-svn: 272492	2016-06-12 00:41:24 +00:00
Craig Topper	2769bb5753	[X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well. llvm-svn: 272246	2016-06-09 05:15:12 +00:00
Craig Topper	c1442973c8	[X86] Reuse the EmitX86Select routine to handle the select for masked palignr too. llvm-svn: 272245	2016-06-09 05:15:00 +00:00
Igor Breger	aadb876200	[AVX512] Emit select instruction instead of using x86 specific instrinsics. This will allow us to remove the x86 instrinics from the backend. Differential Revision: http://reviews.llvm.org/D21060 llvm-svn: 272141	2016-06-08 13:59:20 +00:00
Craig Topper	f51cc07719	[AVX512] Convert masked palignr builtins directly to native IR similar to the other palignr builtins, but with a select to handle masking. llvm-svn: 271873	2016-06-06 06:13:01 +00:00
Craig Topper	4b060e31c9	[AVX512] Convert masked load builtins to generic masked load intrinsics instead of the x86 specific ones. This will allow the x86 intrinsics to be removed from the backend. llvm-svn: 271253	2016-05-31 06:58:07 +00:00
Craig Topper	6e891fbdd2	[AVX512] Emit generic masked store instrinsics instead of using x86 specific intrinsics. This will allow us to remove the x86 instrinics from the backend. llvm-svn: 271246	2016-05-31 01:50:10 +00:00
Craig Topper	b8b4b7eb01	[X86] Simplify alignr builtin support by recognizing that NumLaneElts is always 16. NFC llvm-svn: 271176	2016-05-29 07:06:02 +00:00
Craig Topper	832caf041f	[CodeGen] Use the ArrayRef form CreateShuffleVector instead of building ConstantVectors or ConstantDataVectors and calling the other form. llvm-svn: 271165	2016-05-29 02:39:30 +00:00
Matt Arsenault	2d51059ebb	AMDGPU: Add fract builtin llvm-svn: 271080	2016-05-28 00:43:27 +00:00
David Majnemer	e6abf3d29f	[CodeGen] Don't crash when sizeof(long) != 4 for some intrins _InterlockedIncrement and _InterlockedDecrement have 'long' in their prototypes. We assumed 'long' was the same size as an i32 which is incorrect for other targets. This fixes PR27892. llvm-svn: 270953	2016-05-27 02:06:19 +00:00
Yaxun Liu	f7449a179b	[OpenCL] Add to_{global\|local\|private} builtin functions. OpenCL builtin functions to_{global\|local\|private} accepts argument of pointer type to arbitrary pointee type, and return a pointer to the same pointee type in different addr space, i.e. global gentype to_global(gentype p); It is not desirable to declare it as global void to_global(void ); in opencl header file since it misses diagnostics. This patch implements these builtin functions as Clang builtin functions. In the builtin def file they are defined to have signature void(void). When handling call expressions, their declarations are re-written to have correct parameter type and return type corresponding to the call argument. In codegen call to addr void to_addr(void) is generated with addrcasts or bitcasts to facilitate implementation in builtin library. Differential Revision: http://reviews.llvm.org/D19932 llvm-svn: 270261	2016-05-20 19:54:38 +00:00
Benjamin Kramer	f4c520d5d2	Add all the avx512 flavors to __builtin_cpu_supports's list. This is matching what trunk gcc is accepting. Also adds a missing ssse3 case. PR27779. The amount of duplication here is annoying, maybe it should be factored into a separate .def file? llvm-svn: 270224	2016-05-20 15:21:08 +00:00
Justin Lebar	2e4ecfdebe	[CUDA] Implement __ldg using intrinsics. Summary: Previously it was implemented as inline asm in the CUDA headers. This change allows us to use the [addr+imm] addressing mode when executing ld.global.nc instructions. This translates into a 1.3x speedup on some benchmarks that call this instruction from within an unrolled loop. Reviewers: tra, rsmith Subscribers: jhen, cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D19990 llvm-svn: 270150	2016-05-19 22:49:13 +00:00
Derek Schuff	dbd24b4593	[WebAssembly] Rename memory_size intrinsic to current_memory This follows the recent change in the wasm spec. llvm-svn: 268256	2016-05-02 17:26:19 +00:00
Marcin Koscielnicki	4005070e1b	[AArch64] Fix D19098 fallout. The intrinsic is now called llvm.thread.pointer, not llvm.aarch64.thread.pointer. Also, the code handling it in CGBuiltin.cpp is dead - it's already covered by GCCBuiltin. Remove it. Differential Revision: http://reviews.llvm.org/D19099 llvm-svn: 266817	2016-04-19 20:51:00 +00:00
Ahmed Bougacha	1d9de10130	[ARM NEON] Define vfms_f32 on ARM, and all vfms using vfma. r259537 added vfma/vfms to armv7, but the builtin was only lowered on the AArch64 side. Instead of supporting it on ARM, get rid of it. The vfms builtin lowered to: %nb = fsub float -0.0, %b %r = @llvm.fma.f32(%a, %nb, %c) Instead, define the operation in terms of vfma, and swap the multiplicands. It now lowers to: %na = fsub float -0.0, %a %r = @llvm.fma.f32(%na, %b, %c) This matches the instruction more closely, and lets current LLVM generate the "natural" operand ordering: fmls.2s v0, v1, v2 instead of the crooked (but equivalent): fmls.2s v0, v2, v1 Except for theses changes, assembly is identical. LLVM accepts both commutations, and the LLVM tests in: test/CodeGen/AArch64/arm64-fmadd.ll test/CodeGen/AArch64/fp-dp3.ll test/CodeGen/AArch64/neon-fma.ll test/CodeGen/ARM/fusedMAC.ll already check either the new one only, or both. Also verified against the test-suite unittests. llvm-svn: 266807	2016-04-19 19:44:45 +00:00
Sanjay Patel	ae7a9df7bf	make __builtin_isfinite more efficient (PR27145) isinf (is infinite) and isfinite should be implemented with the same function except we change the comparison operator. See PR27145 for more details: https://llvm.org/bugs/show_bug.cgi?id=27145 Ref: forked off of the discussion in D18513. Differential Revision: http://reviews.llvm.org/D18648 llvm-svn: 265675	2016-04-07 14:29:05 +00:00
JF Bastien	92f4ef1017	NFC: make AtomicOrdering an enum class Summary: See LLVM change D18775 for details, this change depends on it. Reviewers: jyknight, reames Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18776 llvm-svn: 265569	2016-04-06 17:26:42 +00:00
Matt Arsenault	3fb963389e	AMDGPU: Add frexp_mant + frexp_exp builtins llvm-svn: 264960	2016-03-30 22:57:40 +00:00
Aaron Ballman	abd466ed04	Silencing warnings from MSVC 2015 Update 2. Both of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264932	2016-03-30 21:33:34 +00:00
Matt Arsenault	08087c52eb	Add missing __builtin_bitreverse8 Also add documentation for bitreverse builtins llvm-svn: 264203	2016-03-23 22:14:43 +00:00
Justin Lebar	717d2b0a0d	[CUDA] Implement atomicInc and atomicDec builtins These functions cannot be implemented as atomicrmw or cmpxchg instructions, so they are implemented as a call to the NVVM intrinsics @llvm.nvvm.atomic.load.inc.32.p0i32 and @llvm.nvvm.atomic.load.dec.32.p0i32. Patch by Jason Henline. Reviewers: jlebar Differential Revision: http://reviews.llvm.org/D18322 llvm-svn: 264009	2016-03-22 00:09:28 +00:00

1 2 3 4 5 ...

695 Commits