Commit Graph

209 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 8dd3d1cf1f [AMDGPU] Add symbolic names for gfx940 HWREGs
The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper error messages, i.e. to issue a "specified
hardware register is not supported on this GPU" message.

This may need a major refactoring in the future.

Differential Revision: https://reviews.llvm.org/D121418
2022-03-14 16:13:33 -07:00
Changpeng Fang 0f20a35b9e AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary:
  In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.

Reviewers: sameerds, arsenm

Fixes: SWDEV-307189

Differential Revision: https://reviews.llvm.org/D119762
2022-03-09 10:14:05 -08:00
Stanislav Mekhanoshin 8992b50e2f [AMDGPU] gfx940 uses new names for coherency bits
Differential Revision: https://reviews.llvm.org/D120855
2022-03-07 11:50:07 -08:00
Changpeng Fang ca62b1db9f [AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary:
  Emit metadata for hidden_heap_v1 kernarg

Reviewers:
  sameerds, b-sumner

Fixes:
  SWDEV-307188

Differential Revision:
  https://reviews.llvm.org/D119027
2022-02-25 10:45:35 -08:00
Jacob Lambert 7470244475 [AMDGPU] Add agpr_count to metadata and AsmParser
gfx90a allows the number of ACC registers (AGPRs) to be set
independently to the VGPR registers. For both HSA and PAL metadata, we
now include an "agpr_count" key to report the number of AGPRs set for
supported devices (gfx90a, gfx908, as determined by hasMAIInsts()).
This is collected from SIProgramInfo.NumAccVGPR for both HSA and PAL.
The AsmParser also now recognizes ".kernel.agpr_count" for supported
devices.

Differential Revision: https://reviews.llvm.org/D116140
2022-02-16 15:17:23 -08:00
Jacob Lambert 0bad7cb565 Hoist getTotalNumVGPRs into AMDGPUBaseInfo for use in both codegen and MC
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D119912
2022-02-16 11:04:08 -08:00
Dmitry Preobrazhensky 6655c5a6bb [AMDGPU][MC][GFX10] Added an alias for HW_REG_HW_ID1
Enabled HW_REG_HW_ID as an alias for HW_REG_HW_ID1. This is required for compatibility with existing code.

Differential Revision: https://reviews.llvm.org/D119939
2022-02-16 19:45:44 +03:00
Jay Foad cb199e0fca [MC] Define and use MCRegisterInfo::regsOverlap
Separate MCRegisterInfo::regsOverlap out from
TargetRegisterInfo::regsOverlap. This is useful in the AMDGPU AsmParser
where we only have access to MCRegisterInfo.

Differential Revision: https://reviews.llvm.org/D119533
2022-02-14 20:46:02 +00:00
Stanislav Mekhanoshin c7eb846345 [AMDGPU] Merge AMDGPULDSUtils into AMDGPUMemoryUtils
Differential Revision: https://reviews.llvm.org/D119502
2022-02-11 10:32:24 -08:00
Sameer Sahasrabuddhe d8f99bb6e0 [AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216
2022-02-11 22:51:56 +05:30
Changpeng Fang 1194b9cdda AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args
Summary:
  Add code object v5 support (deafult is still v4)
  Generate metadata for implicit kernel args for the new ABI
  Set the metadata version to be 1.2

Reviewers:
  t-tye, b-sumner, arsenm, and bcahoon

Fixes:
  SWDEV-307188, SWDEV-307189

Differential Revision:
  https://reviews.llvm.org/D118272
2022-01-31 18:07:47 -08:00
Sebastian Neubauer 80532ebb50 [AMDGPU][InstCombine] Remove zero image offset
Remove the offset parameter if it is zero.

Differential Revision: https://reviews.llvm.org/D117876
2022-01-24 18:06:33 +01:00
Sebastian Neubauer 603d18033c [AMDGPU][InstCombine] Remove zero LOD bias
If the bias is zero, we can remove it from the image instruction.
Also copy other image optimizations (l->lz, mip->nomip) to IR combines.

Differential Revision: https://reviews.llvm.org/D116042
2022-01-21 12:09:07 +01:00
Dmitry Preobrazhensky c7ca4c6365 [AMDGPU][GFX10][MC] Updated symbolic names of internal HW registers
GFX10 no longer support HW_ID. It has been replaced with HW_ID1 and HW_ID2.
See bug 52904: https://github.com/llvm/llvm-project/issues/52904

Differential Revision: https://reviews.llvm.org/D117313
2022-01-17 20:29:10 +03:00
Kazu Hirata 0542d15211 Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-12-26 09:39:26 -08:00
Christudasan Devadasan 654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Joe Nash b4b7e605a6 [AMDGPU] Support shared literals in FMAMK/FMAAK
These instructions should allow src0 to be a literal with the same
value as the mandatory other literal. Enable it by introducing an
operand that defers adding its value to the MI when decoding till
the mandatory literal is parsed.

Reviewed By: dp, foad

Differential Revision: https://reviews.llvm.org/D111067

Change-Id: I22b0ae0d35bad17b6f976808e48bffe9a6af70b7
2021-10-11 13:09:54 -04:00
Simon Pilgrim 21661607ca [llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
2021-10-06 12:04:30 +01:00
Stanislav Mekhanoshin 082e22f3d7 [AMDGPU] Always reserve flat scratch SGPR for architected flat scratch
With architected flat scratch it becomes readonly. We must always
reserve SGPR pair for it even if we do not use scratch at all since
an attempt to write to SGPRs mapped to FLAT_SCRATCH results in
memory violation.

This is not needed since GFX10 with architected flat scratch though
since special SGPRs are not carving space from normal SGPRs.

Differential Revision: https://reviews.llvm.org/D110376
2021-09-24 09:46:31 -07:00
Arthur Eubanks a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Carl Ritson 9dcd75f86f [AMDGPU] Allow frontends to disable null export for pixel shaders
Disable null export (for kills) when a frontend defines a pixel
shader as not exporting using amdgpu-color-export and
amdgpu-depth-export function attrbutes.
This allows the generation of export free pixel shaders.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D105683
2021-07-22 10:20:46 +09:00
Carl Ritson 98f48723f2 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622
2021-06-24 12:41:22 +09:00
Brendon Cahoon 294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon 211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Stanislav Mekhanoshin 6fb02596a2 [AMDGPU] Add support for architected flat scratch
Add support for the readonly flat Scratch register initialized
by the SPI.

Differential Revision: https://reviews.llvm.org/D102432
2021-05-14 10:53:48 -07:00
David Stuttard 72d570ca08 [AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling
A16 support for image instructions assembly/disassembly (gfx10) was missing

Also refactor MIMG op addr size calcs to common function

We'd got 3 places where the same operation was being done.

One test is now marked XFAIL until a related codegen patch is in place

Differential Revision: https://reviews.llvm.org/D102231

Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
2021-05-14 09:25:44 +01:00
Austin Kerbow 4433f4601e [AMDGPU] Fix extra waitcnt being added with BUFFER_INVL2
The waitcnt pass would increment the number of vmem events for some buffer
invalidates that were not handled by the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D102252
2021-05-11 13:17:33 -07:00
Konstantin Zhuravlyov 4fae63c612 AMDGPU: Add gfx90c support to code object v2 for backwards compatibility
Differential Revision: https://reviews.llvm.org/D100126
2021-04-08 16:42:43 -04:00
Dmitry Preobrazhensky 0f5ebbcc7f [AMDGPU][MC] Added flag to identify VOP instructions which have a single variant
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.

This fix simplifies handling of single VOP instructions by adding a dedicated flag.

Differential Revision: https://reviews.llvm.org/D99408
2021-04-01 13:53:12 +03:00
Konstantin Zhuravlyov f4ace63737 AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id)
  - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object)
    - Add kernarg_size to kernel descriptor
    - Change trap handler ABI to no longer move queue pointer into s[0:1]
  - Cleanup ELF definitions
    - Add V2, V3, V4 suffixes to make a clear distinction for code object version
    - Consolidate note names

Differential Revision: https://reviews.llvm.org/D95638
2021-03-24 11:54:05 -04:00
Matt Arsenault 78b6d73a93 AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Jay Foad 67f0620831 [AMDGPU] Update s_sendmsg messages
Update the list of s_sendmsg messages known to the assembler and
disassembler and validate the ones that were added or removed in gfx9
and gfx10.

Differential Revision: https://reviews.llvm.org/D97295
2021-02-24 13:07:00 +00:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Austin Kerbow 2291bd137d [AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.

The possible settings are:

- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On

GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".

The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.

Differential Revision: https://reviews.llvm.org/D85882
2021-01-26 11:25:51 -08:00
Dmitry Preobrazhensky 745064e36b [AMDGPU][MC] Refactored exp tgt handling
Summary:
- Separated tgt encoding from parsing;
- Separated tgt decoding from printing;
- Improved errors handling;
- Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0

Reviewers: arsenm, rampitec, foad

Differential Revision: https://reviews.llvm.org/D95216
2021-01-26 14:54:15 +03:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Jay Foad 18cb7441b6 [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975
2021-01-19 18:47:14 +00:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Sebastian Neubauer 91445979be [AMDGPU] Unify flat offset logic
Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into
AMDGPUBaseInfo.

Differential Revision: https://reviews.llvm.org/D93287
2020-12-15 14:59:59 +01:00
Sebastian Neubauer 5733167f54 [AMDGPU] Mark amdgpu_gfx functions as module entry function
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata

Differential Revision: https://reviews.llvm.org/D92946
2020-12-14 10:43:39 +01:00
Jay Foad 4f87d30a06 [AMDGPU] Introduce and use isGFX10Plus. NFC.
It's more future-proof to use isGFX10Plus from the start, on the
assumption that future architectures will be based on current
architectures.

Also make use of the existing isGFX9Plus in a few places.

Differential Revision: https://reviews.llvm.org/D92092
2020-11-26 09:02:36 +00:00
Sebastian Neubauer a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Konstantin Zhuravlyov 3fdf3b1539 AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
2020-10-14 13:04:27 -04:00
Stanislav Mekhanoshin acce6b6082 [AMDGPU] Create isGFX9Plus utility function
Introduce a utility function to make it more
convenient to write code that is the same on
the GFX9 and GFX10 subtargets.

Use isGFX9Plus in the AsmParser for AMDGPU.

Authored By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D88908
2020-10-06 10:18:43 -07:00
dfukalov c259d3a061 [AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
2020-09-10 01:39:25 +03:00
Sebastian Neubauer 9aa0ff77bd [AMDGPU] Enable .rodata for amdpal os
PAL recently got support for multiple ELF sections and relocations,
therefore we can now use .rodata sections instead of forcing constants
into .text.

Differential Revision: https://reviews.llvm.org/D85895
2020-08-14 09:05:48 +02:00
Dmitry Preobrazhensky 6b8948922c [AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier
Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3.

    op     dst, addr, rsrc, FORMAT, soffset

This change adds support for SP3 syntax:

    op     dst, addr, rsrc, soffset SP3FORMAT

In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants.

format:<expression>
format:[<numeric-format-name>,<data-format-name>]
format:[<data-format-name>,<numeric-format-name>]
format:[<data-format-name>]
format:[<numeric-format-name>]
format:[<unified-format-name>]

The last syntax variant is supported for GFX10 only.

See llvm bug 37738

Reviewers: arsenm, rampitec, vpykhtin

Differential Revision: https://reviews.llvm.org/D84026
2020-07-24 16:41:03 +03:00
Dmitry Preobrazhensky e122eba185 [AMDGPU][MC] Corrected MTBUF parsing and decoding
MTBUF implementation has many issues and this change addresses most of these:
- refactored duplicated code;
- hardcoded constants moved out of high-level code;
- fixed a decoding error when nfmt or dfmt are zero (bug 36932);
- corrected parsing of operand separators (bug 46403);
- corrected handling of missing operands (bug 46404);
- corrected handling of out-of-range modifiers (bug 46421);
- corrected default value (bug 46467).

Reviewers: arsenm, rampitec, vpykhtin, artem.tamazov, kzhuravl

Differential Revision: https://reviews.llvm.org/D83760
2020-07-15 19:46:00 +03:00
Guillaume Chatelet 52911428ef [Alignment][NFC] Migrate AMDGPU backend to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82743
2020-06-29 11:56:06 +00:00