206 lines
		
	
	
		
			6.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
			
		
		
	
	
			206 lines
		
	
	
		
			6.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
==============================
 | 
						|
User Guide for AMDGPU Back-end
 | 
						|
==============================
 | 
						|
 | 
						|
Introduction
 | 
						|
============
 | 
						|
 | 
						|
The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
 | 
						|
the R600 family up until the current Volcanic Islands (GCN Gen 3).
 | 
						|
 | 
						|
 | 
						|
Conventions
 | 
						|
===========
 | 
						|
 | 
						|
Address Spaces
 | 
						|
--------------
 | 
						|
 | 
						|
The AMDGPU back-end uses the following address space mapping:
 | 
						|
 | 
						|
   ============= ============================================
 | 
						|
   Address Space Memory Space
 | 
						|
   ============= ============================================
 | 
						|
   0             Private
 | 
						|
   1             Global
 | 
						|
   2             Constant
 | 
						|
   3             Local
 | 
						|
   4             Generic (Flat)
 | 
						|
   5             Region
 | 
						|
   ============= ============================================
 | 
						|
 | 
						|
The terminology in the table, aside from the region memory space, is from the
 | 
						|
OpenCL standard.
 | 
						|
 | 
						|
 | 
						|
Assembler
 | 
						|
=========
 | 
						|
 | 
						|
The assembler is currently considered experimental.
 | 
						|
 | 
						|
For syntax examples look in test/MC/AMDGPU.
 | 
						|
 | 
						|
Below some of the currently supported features (modulo bugs).  These
 | 
						|
all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
 | 
						|
are also supported but may be missing some instructions and have more bugs:
 | 
						|
 | 
						|
DS Instructions
 | 
						|
---------------
 | 
						|
All DS instructions are supported.
 | 
						|
 | 
						|
FLAT Instructions
 | 
						|
------------------
 | 
						|
These instructions are only present in the Sea Islands and Volcanic Islands
 | 
						|
instruction set.  All FLAT instructions are supported for these architectures
 | 
						|
 | 
						|
MUBUF Instructions
 | 
						|
------------------
 | 
						|
All non-atomic MUBUF instructions are supported.
 | 
						|
 | 
						|
SMRD Instructions
 | 
						|
-----------------
 | 
						|
Only the s_load_dword* SMRD instructions are supported.
 | 
						|
 | 
						|
SOP1 Instructions
 | 
						|
-----------------
 | 
						|
All SOP1 instructions are supported.
 | 
						|
 | 
						|
SOP2 Instructions
 | 
						|
-----------------
 | 
						|
All SOP2 instructions are supported.
 | 
						|
 | 
						|
SOPC Instructions
 | 
						|
-----------------
 | 
						|
All SOPC instructions are supported.
 | 
						|
 | 
						|
SOPP Instructions
 | 
						|
-----------------
 | 
						|
 | 
						|
Unless otherwise mentioned, all SOPP instructions that have one or more
 | 
						|
operands accept integer operands only.  No verification is performed
 | 
						|
on the operands, so it is up to the programmer to be familiar with the
 | 
						|
range or acceptable values.
 | 
						|
 | 
						|
s_waitcnt
 | 
						|
^^^^^^^^^
 | 
						|
 | 
						|
s_waitcnt accepts named arguments to specify which memory counter(s) to
 | 
						|
wait for.
 | 
						|
 | 
						|
.. code-block:: nasm
 | 
						|
 | 
						|
   ; Wait for all counters to be 0
 | 
						|
   s_waitcnt 0
 | 
						|
 | 
						|
   ; Equivalent to s_waitcnt 0.  Counter names can also be delimited by
 | 
						|
   ; '&' or ','.
 | 
						|
   s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0)
 | 
						|
 | 
						|
   ; Wait for vmcnt counter to be 1.
 | 
						|
   s_waitcnt vmcnt(1)
 | 
						|
 | 
						|
VOP1, VOP2, VOP3, VOPC Instructions
 | 
						|
-----------------------------------
 | 
						|
 | 
						|
All 32-bit and 64-bit encodings should work.
 | 
						|
 | 
						|
The assembler will automatically detect which encoding size to use for
 | 
						|
VOP1, VOP2, and VOPC instructions based on the operands.  If you want to force
 | 
						|
a specific encoding size, you can add an _e32 (for 32-bit encoding) or
 | 
						|
_e64 (for 64-bit encoding) suffix to the instruction.  Most, but not all
 | 
						|
instructions support an explicit suffix.  These are all valid assembly
 | 
						|
strings:
 | 
						|
 | 
						|
.. code-block:: nasm
 | 
						|
 | 
						|
   v_mul_i32_i24 v1, v2, v3
 | 
						|
   v_mul_i32_i24_e32 v1, v2, v3
 | 
						|
   v_mul_i32_i24_e64 v1, v2, v3
 | 
						|
 | 
						|
Assembler Directives
 | 
						|
--------------------
 | 
						|
 | 
						|
.hsa_code_object_version major, minor
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
*major* and *minor* are integers that specify the version of the HSA code
 | 
						|
object that will be generated by the assembler.  This value will be stored
 | 
						|
in an entry of the .note section.
 | 
						|
 | 
						|
.hsa_code_object_isa [major, minor, stepping, vendor, arch]
 | 
						|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
*major*, *minor*, and *stepping* are all integers that describe the instruction
 | 
						|
set architecture (ISA) version of the assembly program.
 | 
						|
 | 
						|
*vendor* and *arch* are quoted strings.  *vendor* should always be equal to
 | 
						|
"AMD" and *arch* should always be equal to "AMDGPU".
 | 
						|
 | 
						|
If no arguments are specified, then the assembler will derive the ISA version,
 | 
						|
*vendor*, and *arch* from the value of the -mcpu option that is passed to the
 | 
						|
assembler.
 | 
						|
 | 
						|
ISA version, *vendor*, and *arch* will all be stored in a single entry of the
 | 
						|
.note section.
 | 
						|
 | 
						|
.amd_kernel_code_t
 | 
						|
^^^^^^^^^^^^^^^^^^
 | 
						|
 | 
						|
This directive marks the beginning of a list of key / value pairs that are used
 | 
						|
to specify the amd_kernel_code_t object that will be emitted by the assembler.
 | 
						|
The list must be terminated by the *.end_amd_kernel_code_t* directive.  For
 | 
						|
any amd_kernel_code_t values that are unspecified a default value will be
 | 
						|
used.  The default value for all keys is 0, with the following exceptions:
 | 
						|
 | 
						|
- *kernel_code_version_major* defaults to 1.
 | 
						|
- *machine_kind* defaults to 1.
 | 
						|
- *machine_version_major*, *machine_version_minor*, and
 | 
						|
  *machine_version_stepping* are derived from the value of the -mcpu option
 | 
						|
  that is passed to the assembler.
 | 
						|
- *kernel_code_entry_byte_offset* defaults to 256.
 | 
						|
- *wavefront_size* defaults to 6.
 | 
						|
- *kernarg_segment_alignment*, *group_segment_alignment*, and
 | 
						|
  *private_segment_alignment* default to 4.  Note that alignments are specified
 | 
						|
  as a power of two, so a value of **n** means an alignment of 2^ **n**.
 | 
						|
 | 
						|
The *.amd_kernel_code_t* directive must be placed immediately after the
 | 
						|
function label and before any instructions.
 | 
						|
 | 
						|
For a full list of amd_kernel_code_t keys, see the examples in
 | 
						|
test/CodeGen/AMDGPU/hsa.s.  For an explanation of the meanings of the different
 | 
						|
keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h
 | 
						|
 | 
						|
Here is an example of a minimal amd_kernel_code_t specification:
 | 
						|
 | 
						|
.. code-block:: none
 | 
						|
 | 
						|
   .hsa_code_object_version 1,0
 | 
						|
   .hsa_code_object_isa
 | 
						|
 | 
						|
   .hsatext
 | 
						|
   .globl  hello_world
 | 
						|
   .p2align 8
 | 
						|
   .amdgpu_hsa_kernel hello_world
 | 
						|
 | 
						|
   hello_world:
 | 
						|
 | 
						|
      .amd_kernel_code_t
 | 
						|
         enable_sgpr_kernarg_segment_ptr = 1
 | 
						|
         is_ptr64 = 1
 | 
						|
         compute_pgm_rsrc1_vgprs = 0
 | 
						|
         compute_pgm_rsrc1_sgprs = 0
 | 
						|
         compute_pgm_rsrc2_user_sgpr = 2
 | 
						|
         kernarg_segment_byte_size = 8
 | 
						|
         wavefront_sgpr_count = 2
 | 
						|
         workitem_vgpr_count = 3
 | 
						|
     .end_amd_kernel_code_t
 | 
						|
 | 
						|
     s_load_dwordx2 s[0:1], s[0:1] 0x0
 | 
						|
     v_mov_b32 v0, 3.14159
 | 
						|
     s_waitcnt lgkmcnt(0)
 | 
						|
     v_mov_b32 v1, s0
 | 
						|
     v_mov_b32 v2, s1
 | 
						|
     flat_store_dword v[1:2], v0
 | 
						|
     s_endpgm
 | 
						|
   .Lfunc_end0:
 | 
						|
        .size   hello_world, .Lfunc_end0-hello_world
 |