Commit Graph

245 Commits

Author SHA1 Message Date
Kazu Hirata 20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Ron Lieberman ca856fff1c Revert "enable code-object-version=5"
very sorry wrong repo.

This reverts commit d882ba7aea.
2022-11-29 15:21:09 -06:00
Ron Lieberman d882ba7aea enable code-object-version=5 2022-11-29 15:11:57 -06:00
Mateja Marjanovic 595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Dmitry Preobrazhensky 9b8eb5fa8e [AMDGPU][MC][GFX11] Correct op_sel handling for permlane*16
Differential Revision: https://reviews.llvm.org/D137969
2022-11-29 18:45:22 +03:00
Dmitry Preobrazhensky 869fc7eabd [AMDGPU][MC][MI100+] Enable VOP3 variants of dot2c/dot4c/dot8c opcodes
Differential Revision: https://reviews.llvm.org/D138494
2022-11-29 17:38:18 +03:00
Kazu Hirata aad2d272bf [Utils] Use std::optional in AMDGPUBaseInfo.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:43:00 -08:00
Dmitry Preobrazhensky 96155bf44b [AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2)
Rename interface functions and operands to make code clearer.

Differential Revision: https://reviews.llvm.org/D138133
2022-11-18 14:15:05 +03:00
Dmitry Preobrazhensky e468b1b740 [AMDGPU][GFX11] Refactor VOPD operands handling
Differential Revision: https://reviews.llvm.org/D137952
2022-11-16 16:29:12 +03:00
Pierre van Houtryve 7425077e31 [AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Joe Nash 01b8140d3a [AMDGPU] Fix delay alu for VOPD with src2acc
V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to
inform passes that the instructions read the dst operand. The VOPD
versions of these instructions lacked the dummy operand, which was a
problem for inserting s_delay_alu.
Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand
tracking logic to account for it.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D136629
2022-10-25 13:11:17 -04:00
Dmitry Preobrazhensky fd7b0eeaf6 [AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation
Differential Revision: https://reviews.llvm.org/D134960
2022-10-07 15:52:59 +03:00
Jay Foad ddfa0f62d8 [AMDGPU] Add GFX11 feature for subtargets with more VGPRs
The full complement of physical VGPRs for GFX11 is 50% more than GFX10.
Some subtargets have this, others stay the same as GFX10. This affects
occupancy calculations.

Differential Revision: https://reviews.llvm.org/D134522
2022-09-23 20:18:23 +01:00
Joe Nash b982ba2a6e [AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that
    disables the use of VGPRs above 128. This patch removes the need for
    that hack.

    We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
    operands of VOP1, VOP2, and VOPC instructions. This register class only has the
    low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
    VOP2, and VOPC instructions are correctly limited to use the first 128
    VGPRs, while the other instructions can freely use all 256.

    We introduce new pseduo-instructions used on GFX11 which have the suffix
    t16 (True 16) to use the VGPR_32_Lo128 register class.

Reviewed By: foad, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D133723
2022-09-20 09:56:28 -04:00
Dmitry Preobrazhensky c89e60bf1f [AMDGPU][MC][GFX11] Add VOPD literals validation
Differential Revision: https://reviews.llvm.org/D133864
2022-09-15 16:29:53 +03:00
Dmitry Preobrazhensky a80116efec [AMDGPU][MC][GFX11] Add a helper function for identification of VOPD instructions
Differential Revision: https://reviews.llvm.org/D133608
2022-09-13 12:41:39 +03:00
Kazu Hirata 7094ab4ee7 [llvm] Modernize bool literals (NFC)
Identified with modernize-use-bool-literals.
2022-07-17 18:08:51 -07:00
Joe Nash d1af09ad96 [AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD  instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a legal VOPD, namely that the matching operands
(e.g. src0x and src0y, src1x and src1y) must be in different register
banks. We add a PostRA scheduler
mutation to put possible VOPD components back-to-back.

Depends on D128442, D128270

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128656
2022-07-05 09:18:19 -04:00
Piotr Sobczak 4874838a63 [AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756
2022-06-30 11:13:45 -04:00
Joe Nash 07b7fada73 [AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32.
This patch includes MC layer support only.

A VOPD instruction is constituted of an X component (for which there are
13 possible opcodes) and a Y component (for which there are the 13 X
opcodes plus 3 more). Most of the complexity in defining and parsing
a VOPD operation arises from the possible different total numbers of
operands and deferred parsing of certain operands depending on the
constituent X and Y opcodes.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D128218
2022-06-24 11:08:39 -04:00
Dmitry Preobrazhensky 485e8b4f63 [AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes
Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065.

Differential Revision: https://reviews.llvm.org/D128075
2022-06-20 14:23:07 +03:00
Jay Foad 7050d5b98c [AMDGPU] Limit GFX11 to using 128 VGPRs
This is a temporary measure to avoid generating incorrect code until the
compiler understands the new way that GFX11 encodes 16-bit operands in
VOP instructions.

Differential Revision: https://reviews.llvm.org/D128054
2022-06-20 07:58:27 +01:00
Stanislav Mekhanoshin cb9ae93712 [AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand.
Define a 64 bit version of the register to use during codegen.

Differential Revision: https://reviews.llvm.org/D127527
2022-06-13 13:23:33 -07:00
Fangrui Song 95a134254a Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 01:07:51 -07:00
Joe Nash 835e09c4c3 [AMDGPU] gfx11 FLAT Instructions
MachineCode Support for FLAT type instructions

Contributors:
Sebastian Neubauer <sebastian.neubauer@amd.com>

Patch 12/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D125989

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D125992
2022-05-25 15:29:39 -04:00
Joe Nash 1a51ab766f [AMDGPU] gfx11 export instructions
Contributors:
Jay Foad <jay.foad@amd.com>
Dmitry Preobrazhensky <d-pre@mail.ru>

Patch 10/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D125822

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D125824
2022-05-25 14:44:09 -04:00
Joe Nash d21b9b4946 [AMDGPU] gfx11 scalar alu instructions
MC layer support for SOP(scalar alu operations) including encoding
support for s_delay_alu and s_sendmsg_rtn.

Contributors:
Jay Foad <jay.foad@amd.com>

Patch 7/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D125319

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D125498
2022-05-17 13:35:41 -04:00
Joe Nash c70259405c [AMDGPU] gfx11 BUF Instructions
Includes MachineCode layer support and tests, and MIR tests not requiring
CodeGen pass changes.
Includes a small change in SMInstructions.td to correct encoded bits.

Contributors:
Petar Avramovic <Petar.Avramovic@amd.com>
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>

Depends on D125316

Patch 6/N for upstreaming of AMDGPU gfx11 architecture.

Reviewed By: dp, Petar.Avramovic

Differential Revision: https://reviews.llvm.org/D125319
2022-05-16 09:41:40 -04:00
Dmitry Preobrazhensky e01dbabdd1 [AMDGPU][MC] Corrected error message "image data size does not match dmask and tfe"
Differential Revision: https://reviews.llvm.org/D123929
2022-04-19 13:52:58 +03:00
Changpeng Fang 8edaf25986 AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary:
  Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default.
We use implicitarg_ptr + offset to check whether the multigrid synchronization
pointer is used. If yes, we remove this attribute and also remove
amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg
only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.

Reviewers: arsenm, sameerds, b-sumner and foad

Differential Revision: https://reviews.llvm.org/D123548
2022-04-12 12:36:30 -07:00
Dmitry Preobrazhensky 1f6aa90386 [AMDGPU][MC][GFX10] Added syntactic sugar for s_waitcnt_depctr operand
Added the following helpers:

    depctr_hold_cnt(...)
    depctr_sa_sdst(...)
    depctr_va_vdst(...)
    depctr_va_sdst(...)
    depctr_va_ssrc(...)
    depctr_va_vcc(...)
    depctr_vm_vsrc(...)

Differential Revision: https://reviews.llvm.org/D123022
2022-04-07 17:03:44 +03:00
Stanislav Mekhanoshin 64838ba365 [AMDGPU] Use GenericTable to classify DGEMM
Since there is a table introduced for MAI instructions extend it
to use for DGEMM classification.

Differential Revision: https://reviews.llvm.org/D122337
2022-03-24 13:00:37 -07:00
Stanislav Mekhanoshin cad9de71d7 [AMDGPU] gfx940 MAI hazard recognizer
Differential Revision: https://reviews.llvm.org/D122263
2022-03-24 12:59:52 -07:00
Dmitry Preobrazhensky 1d817a1448 [AMDGPU][MC][NFC] Refactored sendmsg(...) handling
Differential Revision: https://reviews.llvm.org/D121995
2022-03-21 15:37:30 +03:00
Changpeng Fang dd5895cc39 AMDGPU: Use the implicit kernargs for code object version 5
Summary:
  Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265
2022-03-17 14:12:36 -07:00
Dmitry Preobrazhensky 5977dfba64 [AMDGPU][MC][NFC] Refactored custom operands handling
The original design of custom operands support assumed that most GPUs
have the same or very similar operand names end encodings. This is
no longer the case. As a result the support code becomes over-complicated
and difficult to maintain.

This change implements a different design with the following benefits:

- support of aliases;
- support of operands with overlapped encodings;
- identification of defined but unsupported operands.

Differential Revision: https://reviews.llvm.org/D121696
2022-03-16 16:04:55 +03:00
Stanislav Mekhanoshin 8dd3d1cf1f [AMDGPU] Add symbolic names for gfx940 HWREGs
The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper error messages, i.e. to issue a "specified
hardware register is not supported on this GPU" message.

This may need a major refactoring in the future.

Differential Revision: https://reviews.llvm.org/D121418
2022-03-14 16:13:33 -07:00
Changpeng Fang 0f20a35b9e AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary:
  In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.

Reviewers: sameerds, arsenm

Fixes: SWDEV-307189

Differential Revision: https://reviews.llvm.org/D119762
2022-03-09 10:14:05 -08:00
Stanislav Mekhanoshin 8992b50e2f [AMDGPU] gfx940 uses new names for coherency bits
Differential Revision: https://reviews.llvm.org/D120855
2022-03-07 11:50:07 -08:00
Changpeng Fang ca62b1db9f [AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary:
  Emit metadata for hidden_heap_v1 kernarg

Reviewers:
  sameerds, b-sumner

Fixes:
  SWDEV-307188

Differential Revision:
  https://reviews.llvm.org/D119027
2022-02-25 10:45:35 -08:00
Jacob Lambert 7470244475 [AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.

Differential Revision: https://reviews.llvm.org/D116140
2022-02-16 15:17:23 -08:00
Jacob Lambert 0bad7cb565 Hoist getTotalNumVGPRs into AMDGPUBaseInfo for use in both codegen and MC
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D119912
2022-02-16 11:04:08 -08:00
Dmitry Preobrazhensky 6655c5a6bb [AMDGPU][MC][GFX10] Added an alias for HW_REG_HW_ID1
Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.

Differential Revision: https://reviews.llvm.org/D119939
2022-02-16 19:45:44 +03:00
Jay Foad cb199e0fca [MC] Define and use MCRegisterInfo::regsOverlap
Separate MCRegisterInfo::regsOverlap out from
TargetRegisterInfo::regsOverlap. This is useful in the AMDGPU AsmParser
where we only have access to MCRegisterInfo.

Differential Revision: https://reviews.llvm.org/D119533
2022-02-14 20:46:02 +00:00
Stanislav Mekhanoshin c7eb846345 [AMDGPU] Merge AMDGPULDSUtils into AMDGPUMemoryUtils
Differential Revision: https://reviews.llvm.org/D119502
2022-02-11 10:32:24 -08:00
Sameer Sahasrabuddhe d8f99bb6e0 [AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216
2022-02-11 22:51:56 +05:30
Changpeng Fang 1194b9cdda AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary:
  Add code object v5 support (deafult is still v4)
  Generate metadata for implicit kernel args for the new ABI
  Set the metadata version to be 1.2

Reviewers:
  t-tye, b-sumner, arsenm, and bcahoon

Fixes:
  SWDEV-307188, SWDEV-307189

Differential Revision:
  https://reviews.llvm.org/D118272
2022-01-31 18:07:47 -08:00
Sebastian Neubauer 80532ebb50 [AMDGPU][InstCombine] Remove zero image offset
Remove the offset parameter if it is zero.

Differential Revision: https://reviews.llvm.org/D117876
2022-01-24 18:06:33 +01:00
Sebastian Neubauer 603d18033c [AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Revision: https://reviews.llvm.org/D116042
2022-01-21 12:09:07 +01:00
Dmitry Preobrazhensky c7ca4c6365 [AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers
GFX10 no longer support HW_ID. It has been replaced with HW_ID1 and HW_ID2.
See bug 52904: https://github.com/llvm/llvm-project/issues/52904

Differential Revision: https://reviews.llvm.org/D117313
2022-01-17 20:29:10 +03:00