Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.
global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors).
Patterns now check addrspace of pointer, added patterns for flat intrinsic.
with global addrspace pointer that selects into global atomic instruction.
buffer atomic fadd patterns changes:
Rdit patterns to import into global-isel.
Remove gfx6/gfx7 _addr64 and _offset patterns.
Remove patterns that can't be reached (same pattern but different feature).
Differential Revision: https://reviews.llvm.org/D130579
Feature used by targets that have flat_atomic_add_f32 instruction
(gfx940 and gfx11). Remove isGFX940GFX11Plus.
Add hasFlatAtomicFaddF32Inst Subtarget check for codegen.
Differential Revision: https://reviews.llvm.org/D134532
This patch removes the predicate for return atomic ops and uses
AddedComplexity to distinguish its selection from its no return variant.
This will produce better matchers that doesn't unnecessarily check for
the negated predicate if the initial predicate failed. Also, it
simplifies the enabling of no return atomic ops selection in GlobalISel.
Differential Revision: https://reviews.llvm.org/D128241
MachineCode Support for FLAT type instructions
Contributors:
Sebastian Neubauer <sebastian.neubauer@amd.com>
Patch 12/N for upstreaming of AMDGPU gfx11 architecture.
Depends on D125989
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D125992
We always had global and scratch loads to LDS in the gfx9,
but did not handle it. These were available via the 'lds'
encoding bit. In gfx940 this bit was reused as 'svs' which
resulted in new '_lds' opcodes effectively pushing this
bit into the opcode, but functionally it is the same. These
instructions are also available on gfx10.
Differential Revision: https://reviews.llvm.org/D125126
FeatureAtomicFaddInsts is replaced with three more granular features.
Contributors:
Petar Avramovic <Petar.Avramovic@amd.com>
Patch 3/N for upstreaming of AMDGPU gfx11 architecture
Depends on D124537
Reviewed By: foad, #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D124538
Add these bits to the MUBUF and FLAT LDS DMA instructions:
- LGKM_CNT - these operate on LDS;
- VALU - SPG 3.9.8: This instruction acts as both a MUBUF and
VALU instruction;
Codegen currently does not produce any of this, so the change is NFC.
Differential Revision: https://reviews.llvm.org/D124472
The builtin predicate handling has a strange behavior where the code
assumes that a PatFrag is a stack of PatFrags, and each level adds at
most one predicate. I don't think this particularly makes sense,
especially without a diagnostic to ensure you aren't trying to set
multiple at once.
This wasn't followed for address spaces and alignment, which could
potentially fall through to report no builtin predicate was
added. Just switch these to follow the existing convention for now.
This is to avoid relying on the post-isel hook.
This change also enable the saddr pattern selection for atomic
intrinsics in GlobalISel.
Differential Revision: https://reviews.llvm.org/D123583
The sramecc feature changes the behaviour of d16 loads so they do not
preserve the unused 16 bits of the result register, but it has no impact
on d16 stores, so we should make use of them even when the feature is
enabled.
Differential Revision: https://reviews.llvm.org/D104912
This is by analogy with HasFlatScratchSTMode and is slightly more
informative than using isGFX940Plus.
Differential Revision: https://reviews.llvm.org/D121804
These compound predicates are not required, since we can use a
combination of setting the SubtargetPredicate (to a subtarget
predicate like isGFX940Plus) and OtherPredicates (to a list of feature
predicates like HasAtomicFaddInsts) instead. NFC.
Differential Revision: https://reviews.llvm.org/D121289
This change adds the selection for the no-return global_* and flat_*
instructions in tblgen. The motivation for this is to get the no-return atomic
isel working without relying on post-isel hooks so that GlobalISel can start
selecting them (once GlobalISelEmitter allows no return atomic patterns like
how DAGISel does).
Differential Revision: https://reviews.llvm.org/D119227
The struct/raw forms for the buffer atomics now work as
expected. However, we're incorrectly handling the legacy form (which
we probably shouldn't handle at all). We also are not diagnosing the
use of the return value on gfx908. These will be addressed separately.
When used as a non-leaf node, TableGen does not currently use the type
of a ComplexPattern for type inference, which also means it does not
check it doesn't conflict with the use. This differs from when used as a
leaf value, where the type is used for inference. Fixing that
discrepancy is something I intend to upstream as a subsequent review.
AMDGPU currently has several ComplexPatterns that are used in contexts
where they're expected to be an iPTR, and where using an iPTR instead of
a fixed-width integer type matters. With my locally-patched TableGen,
none of these mismatches result in type contradictions, but do change
the patterns and cause various failures to select. These changes to the
ComplexPatterns' types reflect how they are actually used, result in
bit-for-bit identical TableGen output (without my local TableGen patch),
and ensure that with improved type inference AMDGPU's backend will
continue to work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109032
Makes patterns added for gfx90a usable with the gfx10 versions of the
insts.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D108654
Change-Id: I86167bf6b4823f975f74ccb619bd6190331ba16b
This does not affect codegen, which only tests these flags on Pseudo
instructions, but might help llvm-mca which has to work with Real
instructions. In particular setting LGKM_CNT on DS instructions helps
with the problem identified in D104149.
Differential Revision: https://reviews.llvm.org/D104293
9931b1f7a4 switched this to checking for
the two specific subtargets, instead of the dedicated feature. This
broke supporting functions which force added the feature when emitting
targets that do not actually support them. This stil does not work for
the targets that use the gfx6/7 or gfx10 encodings.
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.
Fixed version of D99587, which does not rely on the address space.
Differential Revision: https://reviews.llvm.org/D99743
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.
Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.
Differential Revision: https://reviews.llvm.org/D99187
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.
Additional advantage that parser will accept these flags in any order unlike
now.
Differential Revision: https://reviews.llvm.org/D96469
We are using AtomicNoRet map in multiple places to determine
if an instruction atomic, rtn or nortn atomic. This method
does not work always since we have some instructions which
only has rtn or nortn version.
One such instruction is ds_wrxchg_rtn_b32 which does not have
nortn version. This has caused changes in memory legalizer
tests.
Differential Revision: https://reviews.llvm.org/D96639
I think the global_load/store_dword_addtid instructions support
switching off the scalar address.
Add assembler and disassembler support for this.
Differential Revision: https://reviews.llvm.org/D93288
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.
Improves asm parser diagnostics.
Differential Revision: https://reviews.llvm.org/D90730
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.
At the very least missing:
- GlobalISel;
- Some optimizations in frame elimination in between vector
and scalar ALU;
- It shall finally allow to always materialize frame index
as an SGPR, but that is not implemented and frame elimination
cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
expression to recover the unswizzled scratch address;
Differential Revision: https://reviews.llvm.org/D89170
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.
Differential Revision: https://reviews.llvm.org/D89501
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.
Differential Revision: https://reviews.llvm.org/D87719
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.