Commit Graph

119 Commits

Author SHA1 Message Date
Petar Avramovic 6db7921b65 AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd
Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.

global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors).
Patterns now check addrspace of pointer, added patterns for flat intrinsic.
with global addrspace pointer that selects into global atomic instruction.

buffer atomic fadd patterns changes:
Rdit patterns to import into global-isel.
Remove gfx6/gfx7 _addr64 and _offset patterns.
Remove patterns that can't be reached (same pattern but different feature).

Differential Revision: https://reviews.llvm.org/D130579
2022-09-23 17:52:10 +02:00
Petar Avramovic e03d36d4ae [AMDGPU] Add FeatureFlatAtomicFaddF32Inst
Feature used by targets that have flat_atomic_add_f32 instruction
(gfx940 and gfx11). Remove isGFX940GFX11Plus.
Add hasFlatAtomicFaddF32Inst Subtarget check for codegen.

Differential Revision: https://reviews.llvm.org/D134532
2022-09-23 17:52:10 +02:00
Abinav Puthan Purayil 7504c7a877 [AMDGPU] Use AddedComplexity for ret and noret atomic ops selection
This patch removes the predicate for return atomic ops and uses
AddedComplexity to distinguish its selection from its no return variant.
This will produce better matchers that doesn't unnecessarily check for
the negated predicate if the initial predicate failed. Also, it
simplifies the enabling of no return atomic ops selection in GlobalISel.

Differential Revision: https://reviews.llvm.org/D128241
2022-07-08 09:47:33 +05:30
Joe Nash 835e09c4c3 [AMDGPU] gfx11 FLAT Instructions
MachineCode Support for FLAT type instructions

Contributors:
Sebastian Neubauer <sebastian.neubauer@amd.com>

Patch 12/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D125989

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D125992
2022-05-25 15:29:39 -04:00
Stanislav Mekhanoshin a09af86693 [AMDGPU] Enable FLAT LDS DMA on gfx9/10 before gfx940
We always had global and scratch loads to LDS in the gfx9,
but did not handle it. These were available via the 'lds'
encoding bit. In gfx940 this bit was reused as 'svs' which
resulted in new '_lds' opcodes effectively pushing this
bit into the opcode, but functionally it is the same. These
instructions are also available on gfx10.

Differential Revision: https://reviews.llvm.org/D125126
2022-05-17 12:16:37 -07:00
Joe Nash 7e71a03966 [AMDGPU] Split FeatureAtomicFaddInsts
FeatureAtomicFaddInsts is replaced with three more granular features.
Contributors:
Petar Avramovic <Petar.Avramovic@amd.com>

Patch 3/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124537

Reviewed By: foad, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124538
2022-05-05 13:27:45 -04:00
Stanislav Mekhanoshin a9ccc7bc54 [AMDGPU] Properly mark MUBUF and FLAT LDS DMA instructions. NFC.
Add these bits to the MUBUF and FLAT LDS DMA instructions:

- LGKM_CNT - these operate on LDS;
- VALU - SPG 3.9.8: This instruction acts as both a MUBUF and
VALU instruction;

Codegen currently does not produce any of this, so the change is NFC.

Differential Revision: https://reviews.llvm.org/D124472
2022-04-26 14:20:26 -07:00
Matt Arsenault 0ecbb683a2 TableGen/GlobalISel: Make address space/align predicates consistent
The builtin predicate handling has a strange behavior where the code
assumes that a PatFrag is a stack of PatFrags, and each level adds at
most one predicate. I don't think this particularly makes sense,
especially without a diagnostic to ensure you aren't trying to set
multiple at once.

This wasn't followed for address spaces and alignment, which could
potentially fall through to report no builtin predicate was
added. Just switch these to follow the existing convention for now.
2022-04-22 15:48:07 -04:00
Abinav Puthan Purayil 272a876804 [AMDGPU] Rename the FlatSignedIntrPat multiclass to FlatSignedAtomicIntrPat. NFC 2022-04-22 11:47:23 +05:30
Abinav Puthan Purayil 165ae7276c [AMDGPU] Remove atomic pattern args in FLAT_[Global_]Atomic_Pseudo defs
We already have explicit patterns for these.

Differential Revision: https://reviews.llvm.org/D124084
2022-04-22 09:37:40 +05:30
Abinav Puthan Purayil 45ca94334e [AMDGPU] Select no-return atomic intrinsics in tblgen
This is to avoid relying on the post-isel hook.

This change also enable the saddr pattern selection for atomic
intrinsics in GlobalISel.

Differential Revision: https://reviews.llvm.org/D123583
2022-04-22 09:37:40 +05:30
Jay Foad f707e1255e [AMDGPU] Select d16 stores even when sramecc is enabled
The sramecc feature changes the behaviour of d16 loads so they do not
preserve the unused 16 bits of the result register, but it has no impact
on d16 stores, so we should make use of them even when the feature is
enabled.

Differential Revision: https://reviews.llvm.org/D104912
2022-04-19 09:34:32 +01:00
Matt Arsenault df29ec2f54 AMDGPU: Select i8/i16 global and flat atomic load/store
As far as I know these should be atomic anyway, as long as the address
is aligned. Unaligned atomics hit an ugly error in AtomicExpand.
2022-04-14 20:52:05 -04:00
Jay Foad fb8d23b8e7 [AMDGPU] Define new feature HasFlatScratchSVSMode. NFC.
This is by analogy with HasFlatScratchSTMode and is slightly more
informative than using isGFX940Plus.

Differential Revision: https://reviews.llvm.org/D121804
2022-03-16 19:54:02 +00:00
Stanislav Mekhanoshin 23499103f7 [AMDGPU] Support for gfx940 flat lds opcodes
Differential Revision: https://reviews.llvm.org/D121414
2022-03-14 15:46:19 -07:00
Stanislav Mekhanoshin 36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Jay Foad c7218164c4 [AMDGPU] Remove HasAtomicFaddInstsGFX90X and HasAtomicFaddInstsGFX940
These compound predicates are not required, since we can use a
combination of setting the SubtargetPredicate (to a subtarget
predicate like isGFX940Plus) and OtherPredicates (to a list of feature
predicates like HasAtomicFaddInsts) instead. NFC.

Differential Revision: https://reviews.llvm.org/D121289
2022-03-09 18:02:21 +00:00
Stanislav Mekhanoshin 932f628121 [AMDGPU] new gfx940 fp atomics
Differential Revision: https://reviews.llvm.org/D121028
2022-03-07 12:32:02 -08:00
Abinav Puthan Purayil 29bd3fadbc [AMDGPU] Select no-return atomic ops in FLATInstructions.td.
This change adds the selection for the no-return global_* and flat_*
instructions in tblgen.  The motivation for this is to get the no-return atomic
isel working without relying on post-isel hooks so that GlobalISel can start
selecting them (once GlobalISelEmitter allows no return atomic patterns like
how DAGISel does).

Differential Revision: https://reviews.llvm.org/D119227
2022-02-10 09:26:37 +05:30
Matt Arsenault 8ff3c9e0be AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics
The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn't handle at all). We also are not diagnosing the
use of the return value on gfx908. These will be addressed separately.
2022-01-20 12:12:06 -05:00
Jessica Clarke 3ee56eed2f [AMDGPU][NFC] Alter ComplexPattern types to be consistent with their uses
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. Fixing that
discrepancy is something I intend to upstream as a subsequent review.

AMDGPU currently has several ComplexPatterns that are used in contexts
where they're expected to be an iPTR, and where using an iPTR instead of
a fixed-width integer type matters. With my locally-patched TableGen,
none of these mismatches result in type contradictions, but do change
the patterns and cause various failures to select. These changes to the
ComplexPatterns' types reflect how they are actually used, result in
bit-for-bit identical TableGen output (without my local TableGen patch),
and ensure that with improved type inference AMDGPU's backend will
continue to work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D109032
2021-12-03 07:04:59 +00:00
Dmitry Preobrazhensky 91f4650ebb [AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap*
Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes.

Differential Revision: https://reviews.llvm.org/D113746
2021-11-15 12:51:12 +03:00
Dmitry Preobrazhensky b8e7f53208 [AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics
Differential Revision: https://reviews.llvm.org/D109614
2021-09-21 16:23:20 +03:00
Jay Foad 128a49727a [AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC.
The warning is implemented by D109359 which is still in review.

Differential Revision: https://reviews.llvm.org/D109826
2021-09-16 09:07:18 +01:00
Joe Nash e381833ba5 [AMDGPU] Support global_atomic_fmin/max on gfx10
Makes patterns added for gfx90a usable with the gfx10 versions of the
insts.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108654

Change-Id: I86167bf6b4823f975f74ccb619bd6190331ba16b
2021-08-25 09:35:10 -04:00
Jay Foad 24ffc343f9 [AMDGPU] Set IsAtomicRet and IsAtomicNoRet on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 12:23:29 +01:00
Jay Foad 323b3e645d [AMDGPU] Set mayLoad and mayStore on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 12:10:23 +01:00
Jay Foad 6f778fed8e [AMDGPU] Set more flags on Real instructions
This does not affect codegen, which only tests these flags on Pseudo
instructions, but might help llvm-mca which has to work with Real
instructions. In particular setting LGKM_CNT on DS instructions helps
with the problem identified in D104149.

Differential Revision: https://reviews.llvm.org/D104293
2021-06-16 09:58:50 +01:00
Matt Arsenault ed633a1daa AMDGPU: Restore atomic fp feature on FP atomic instruction definitions
9931b1f7a4 switched this to checking for
the two specific subtargets, instead of the dedicated feature. This
broke supporting functions which force added the feature when emitting
targets that do not actually support them. This stil does not work for
the targets that use the gfx6/7 or gfx10 encodings.
2021-04-22 21:32:01 -04:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Sebastian Neubauer 36138db116 [AMDGPU] IsFlatScratch/Global -> FlatScratch/Global
Remove 'Is' from IsFlatScratch/Global. NFC

Differential Revision: https://reviews.llvm.org/D100108
2021-04-09 11:20:31 +02:00
Jay Foad fc7e3e7dd9 [AMDGPU] Set SchedRW on real instructions
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.

Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.

Differential Revision: https://reviews.llvm.org/D99187
2021-03-23 15:38:11 +00:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Stanislav Mekhanoshin 9931b1f7a4 [AMDGPU] Disable SCC bit on fp atomics
Differential Revision: https://reviews.llvm.org/D98221
2021-03-10 12:36:09 -08:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin 5cf9292ce3 [AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn
We are using AtomicNoRet map in multiple places to determine
if an instruction atomic, rtn or nortn atomic. This method
does not work always since we have some instructions which
only has rtn or nortn version.

One such instruction is ds_wrxchg_rtn_b32 which does not have
nortn version. This has caused changes in memory legalizer
tests.

Differential Revision: https://reviews.llvm.org/D96639
2021-02-15 11:27:59 -08:00
Sebastian Neubauer 409a2f0f9e [AMDGPU] Allow no saddr for global addtid insts
I think the global_load/store_dword_addtid instructions support
switching off the scalar address.
Add assembler and disassembler support for this.

Differential Revision: https://reviews.llvm.org/D93288
2020-12-16 10:01:40 +01:00
Stanislav Mekhanoshin f738aee0bb [AMDGPU] Add default 1 glc operand to rtn atomics
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.

Improves asm parser diagnostics.

Differential Revision: https://reviews.llvm.org/D90730
2020-11-05 10:41:59 -08:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Stanislav Mekhanoshin 038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Jay Foad f6a5699c6c [AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC. 2020-10-21 09:56:43 +01:00
Stanislav Mekhanoshin 6ddadf9901 [AMDGPU] flat scratch ST addressing mode on gfx10
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.

Differential Revision: https://reviews.llvm.org/D89501
2020-10-19 15:29:52 -07:00
Stanislav Mekhanoshin 45014ce36f [AMDGPU] Add tied operand to d16 scratch loads
This is still no-op because there is no selection for these
opcodes.

Differential Revision: https://reviews.llvm.org/D88927
2020-10-07 11:13:01 -07:00
Stanislav Mekhanoshin 7361ce73ef [AMDGPU] Use default zero flag operands in flat scratch
This is no-op so far because we do not select these yet.

Differential Revision: https://reviews.llvm.org/D88920
2020-10-07 10:56:47 -07:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Matt Arsenault e1a2f4713c AMDGPU: Match global saddr addressing mode
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
2020-08-17 15:28:14 -04:00
Matt Arsenault e0375dbcb3 AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics
Global instructions have the signed offsets.
2020-08-17 09:19:15 -04:00
Matt Arsenault f0af434b79 AMDGPU: Remove register class params from flat memory patterns 2020-08-15 12:12:33 -04:00
Matt Arsenault a7455652c0 AMDGPU: Fix global atomic saddr operand class 2020-08-15 12:12:28 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00