This builtin allows the creation of custom scheduling pipelines on a per-region
basis. Like the sched_barrier builtin this is intended to be used either for
testing, in situations where the default scheduler heuristics cannot be
improved, or in critical kernels where users are trying to get performance that
is close to handwritten assembly. Obviously using these builtins will require
extra work from the kernel writer to maintain the desired behavior.
The builtin can be used to create groups of instructions called "scheduling
groups" where ordering between the groups is enforced by the scheduler.
__builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter
is a mask that determines the types of instructions that you would like to
synchronize around and add to a scheduling group. These instructions will be
selected from the bottom up starting from the sched_group_barrier's location
during instruction scheduling. The second parameter is the number of matching
instructions that will be associated with this sched_group_barrier. The third
parameter is an identifier which is used to describe what other
sched_group_barriers should be synchronized with. Note that multiple
sched_group_barriers must be added in order for them to be useful since they
only synchronize with other sched_group_barriers. Only "scheduling groups" with
a matching third parameter will have any enforced ordering between them.
As an example, the code below tries to create a pipeline of 1 VMEM_READ
instruction followed by 1 VALU instruction followed by 5 MFMA instructions...
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 1 VALU
__builtin_amdgcn_sched_group_barrier(2, 1, 0)
// 5 MFMA
__builtin_amdgcn_sched_group_barrier(8, 5, 0)
// 1 VMEM_READ
__builtin_amdgcn_sched_group_barrier(32, 1, 0)
// 3 VALU
__builtin_amdgcn_sched_group_barrier(2, 3, 0)
// 2 VMEM_WRITE
__builtin_amdgcn_sched_group_barrier(64, 2, 0)
Reviewed By: jrbyrnes
Differential Revision: https://reviews.llvm.org/D128158
Clang has traditionally allowed C programs to implicitly convert
integers to pointers and pointers to integers, despite it not being
valid to do so except under special circumstances (like converting the
integer 0, which is the null pointer constant, to a pointer). In C89,
this would result in undefined behavior per 3.3.4, and in C99 this rule
was strengthened to be a constraint violation instead. Constraint
violations are most often handled as an error.
This patch changes the warning to default to an error in all C modes
(it is already an error in C++). This gives us better security posture
by calling out potential programmer mistakes in code but still allows
users who need this behavior to use -Wno-error=int-conversion to retain
the warning behavior, or -Wno-int-conversion to silence the diagnostic
entirely.
Differential Revision: https://reviews.llvm.org/D129881
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.
This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.
Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D124700
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.
Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert
Reviewed By: arsenm, sameerds
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80804
Expose llvm fence instruction as clang builtin for AMDGPU target
__builtin_amdgcn_fence(unsigned int memoryOrdering, const char *syncScope)
The first argument of this builtin is one of the memory-ordering specifiers
__ATOMIC_ACQUIRE, __ATOMIC_RELEASE, __ATOMIC_ACQ_REL, or __ATOMIC_SEQ_CST
following C++11 memory model semantics. This is mapped to corresponding
LLVM atomic memory ordering for the fence instruction using LLVM atomic C
ABI. The second argument is an AMDGPU-specific synchronization scope
defined as string.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D75917
Emit llvm.amdgcn.update.dpp for both __builtin_amdgcn_mov_dpp and
__builtin_amdgcn_update_dpp. The first argument to
llvm.amdgcn.update.dpp will be undef for __builtin_amdgcn_mov_dpp.
Differential Revision: https://reviews.llvm.org/D52320
llvm-svn: 344665
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
tree)
3. builtins renamed as requested
Differential Revision: https://reviews.llvm.org/D43281
llvm-svn: 332848