llvm-project

Commit Graph

Author	SHA1	Message	Date
Johannes Doerfert	48d6f52401	[CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive A copy-paste error caused UB in the definition of the unsigned long long versions of the shfl intrinsics. Reported and diagnosed by @trws. Differential Revision: https://reviews.llvm.org/D129536	2022-07-21 12:36:54 -05:00
Evgeny Mankov	c23147106f	[clang][CUDA][Windows] Fix compilation error on Windows with `uint32_t __nvvm_get_smem_pointer` The change fixes https://github.com/llvm/llvm-project/issues/54609 (the second reported issue) by eliminating a compilation error occurring only on Windows while trying to compile any CUDA source file by clang (-x cuda). [Repro] clang -x cuda <any_cu_source> [Error] __clang_cuda_runtime_wrapper.h:473: __clang_cuda_intrinsics.h(517,19): error GC871EEFB: unknown type name 'uint32_t'; did you mean 'cuuint32_t'? __device__ inline uint32_t __nvvm_get_smem_pointer(void *__ptr) { ^ C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/include\cuda.h:57:26: note: 'cuuint32_t' declared here typedef unsigned __int32 cuuint32_t; Reviewed By: tra Differential Revision: https://reviews.llvm.org/D122897	2022-04-21 00:41:20 +03:00
Kristina Bessonova	57aaab3b17	[NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32) NVVM IR specification defines them with i32 return type: declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value) declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value) ... The i32 return value is a 32-bit mask where bit position in mask corresponds to thread’s laneid. as well as PTX ISA: 9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync match.any.sync.type d, a, membermask; match.all.sync.type d[\|p], a, membermask; ... Destination d is a 32-bit mask where bit position in mask corresponds to thread’s laneid. Additionally, ptxas doesn't accept intructions, produced by NVPTX backend. After this patch, it compiles with no issues. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D120499	2022-03-01 12:26:16 +02:00
Artem Belevich	f526ee5b85	[CUDA] Provide address space conversion builtins. CUDA-11 headers rely on these NVCC builtins. Despite having `__nv` previx, those are not provided by libdevice. Differential Revision: https://reviews.llvm.org/D111665	2021-10-12 14:56:39 -07:00
Artem Belevich	cc14de88da	[CUDA] Fix order of memcpy arguments in __shfl_*(<64-bit type>). Wrong argument order resulted in broken shfl ops for 64-bit types.	2020-01-23 13:17:52 -08:00
Artem Belevich	ce94ec661f	[CUDA] Use activemask.b32 instruction to implement __activemask w/ CUDA-9.2+ vote.ballot instruction is gone in recent CUDA versions and vote.sync.ballot can not be used because it needs a thread mask parameter. Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32 instruction for this. Differential Revision: https://reviews.llvm.org/D66665 llvm-svn: 370792	2019-09-03 17:31:58 +00:00
Chandler Carruth	4cf5743b77	Move the builtin headers to use the new license file header. Summary: These all had somewhat custom file headers with different text from the ones I searched for previously, and so I missed them. Thanks to Hal and Kristina and others who prompted me to fix this, and sorry it took so long. Reviewers: hfinkel Subscribers: mcrosier, javed.absar, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60406 llvm-svn: 357941	2019-04-08 20:51:30 +00:00
Artem Belevich	5832eb4cfd	[CUDA] added missing __ldg(const signed char *) Differential Revision: https://reviews.llvm.org/D45780 llvm-svn: 330280	2018-04-18 18:33:43 +00:00
Artem Belevich	3cebc738b6	[CUDA] More fixes for __shfl_* intrinsics. * __shfl_{up,down}* uses unsigned int for the third parameter. * added [unsigned] long overloads for non-sync shuffles. Differential Revision: https://reviews.llvm.org/D41521 llvm-svn: 321326	2017-12-21 23:52:09 +00:00
Artem Belevich	a659d2590e	[NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin in clang. Differential Revision: https://reviews.llvm.org/D40872 llvm-svn: 319909	2017-12-06 17:50:05 +00:00
Artem Belevich	4631ef1e43	[CUDA] Added overloads for '[unsigned] long' variants of shfl builtins. Differential Revision: https://reviews.llvm.org/D40871 llvm-svn: 319908	2017-12-06 17:40:35 +00:00
Jonas Hahnfeld	f21a60233c	[CUDA] Fix name of __activemask() The name has two underscores in the official CUDA documentation: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions Differential Revision: https://reviews.llvm.org/D38468 llvm-svn: 314691	2017-10-02 17:50:11 +00:00
Artem Belevich	bab95c7087	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223	2017-09-26 17:07:23 +00:00
Justin Lebar	d31d5e6aa2	Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142	2017-09-25 19:41:56 +00:00
Artem Belevich	9941ee9529	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135	2017-09-25 18:53:57 +00:00
Artem Belevich	4d80105792	[CUDA] Fix names of __nvvm_vote* intrinsics. Also fixed a syntax error in activemask(). Differential Revision: https://reviews.llvm.org/D38188 llvm-svn: 314129	2017-09-25 17:55:26 +00:00
Artem Belevich	b542f1f3df	[CUDA] Fixed order of words in the names of shfl builtins. Differential Revision: https://reviews.llvm.org/D38147 llvm-svn: 313899	2017-09-21 18:46:39 +00:00
Artem Belevich	42960b4188	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38148 llvm-svn: 313898	2017-09-21 18:44:49 +00:00
Artem Belevich	4654dc89be	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820	2017-09-20 21:23:07 +00:00
Justin Lebar	b8f7a3b8b1	[CUDA] Rename keywords used in macro so they don't conflict with MSVC. Summary: MSVC seems to use "__in" and "__out" for its own purposes, so we have to pick different names in this macro. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D28325 llvm-svn: 291138	2017-01-05 16:54:11 +00:00
Justin Bogner	2f8de9fb4f	NVPTX: Rename __builtin_ptx_shfl -> __nvvm_shfl To match "NVPTX: Make the llvm.nvvm.shfl intrinsics and builtin names consistent" in LLVM. llvm-svn: 274663	2016-07-06 19:52:32 +00:00
Justin Lebar	4fb5711751	[CUDA] Implement __shfl* intrinsics in clang headers. Summary: Clang changes to make use of the LLVM intrinsics added in D21160. Reviewers: tra Subscribers: jholewinski, cfe-commits Differential Revision: http://reviews.llvm.org/D21162 llvm-svn: 272299	2016-06-09 20:04:57 +00:00
Justin Lebar	720f8da33a	[CUDA] Fix order of vectorized ldg intrinsics' elements. Summary: The order is [x, y, z, w], not [w, x, y, z]. Subscribers: cfe-commits, tra Differential Revision: http://reviews.llvm.org/D20794 llvm-svn: 271215	2016-05-30 17:12:55 +00:00
Justin Lebar	2e4ecfdebe	[CUDA] Implement __ldg using intrinsics. Summary: Previously it was implemented as inline asm in the CUDA headers. This change allows us to use the [addr+imm] addressing mode when executing ld.global.nc instructions. This translates into a 1.3x speedup on some benchmarks that call this instruction from within an unrolled loop. Reviewers: tra, rsmith Subscribers: jhen, cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D19990 llvm-svn: 270150	2016-05-19 22:49:13 +00:00

24 Commits