Commit Graph

1097 Commits

Author SHA1 Message Date
Jay Foad 3d9e226081 [AMDGPU] Use s_cmp instead of s_cmpk
Don't bother pre-shrinking "s_cmp_lg_u32 reg, 0" to s_cmpk_lg_u32
because 0 is already an inline constant so the s_cmpk form is no
smaller.

This is just for consistency with the surrounding code and to simplify a
downstream patch.

Differential Revision: https://reviews.llvm.org/D138993
2022-11-30 18:02:39 +00:00
Pierre van Houtryve a88deb4b65 [AMDGPU] Use aperture registers instead of S_GETREG
Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register).

Patch based on D31874 by @kzhuravl
Depends on D137767

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137542
2022-11-30 12:25:10 +00:00
Nicolai Hähnle 4e79764f0a AMDGPU: Fixup tests 2022-11-30 12:37:40 +01:00
Nicolai Hähnle b7f44f7cf9 AMDGPU: Remove ImagePSV and move images to addrspace 7
Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU:
Remove BufferPseudoSourceValue")

It is unclear what exactly the right address space for images should be.
They seem morally closest to buffers, so that's what I went with. In
practical terms, address space 7 is better than address space 0 because
it can't alias with LDS.

Differential Revision: https://reviews.llvm.org/D138949
2022-11-30 11:32:34 +01:00
Matt Arsenault c08d55623d AMDGPU: Fix creating illegal f16 fp_class
We were missing legality checks. The device library build was broken
for targets without f16 support. Technically the first pattern isn't
tested by this patch; it only triggers with the isBeforeLegalize check
in performAndCombine removed. I'm not sure how to trick this into
appearing post-legalization.
2022-11-29 18:24:30 -05:00
Benjamin Kramer 3a86931f56 AMDGPU: Remove unused variables. NFC 2022-11-29 22:57:00 +01:00
Nicolai Hähnle 43b86bf992 AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.

Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.

There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.

Differential Revision: https://reviews.llvm.org/D138711
2022-11-29 22:15:11 +01:00
Jay Foad 0daa3df3a5 [AMDGPU] Use GCNSubtarget::hasInstPrefetch instead of generation check. NFC. 2022-11-29 17:36:33 +00:00
Mateja Marjanovic 595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Stanislav Mekhanoshin 28eb9ed3bb [AMDGPU] Fine tune LDS misaligned access speed
Differential Revision: https://reviews.llvm.org/D124219
2022-11-28 16:12:02 -08:00
Janek van Oirschot 322966f8f8 [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp
class support and introduce GlobalISel implementation for AMDGPU

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
2022-11-28 16:00:36 -05:00
Ivan Kosarev ec8ede8177 [AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones.
Differential Revision: https://reviews.llvm.org/D138215
2022-11-24 10:50:26 +00:00
Matt Arsenault fe56afc4d7 AMDGPU: Fix fcanonicalize constant folding not correctly handling -0.0 2022-11-18 10:03:29 -08:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Nikita Popov feda983ff8 [TableGen] Use MemoryEffects to represent intrinsic memory effects (NFCI)
The TableGen implementation was using a homegrown implementation of
FunctionModRefInfo. This switches it to use MemoryEffects instead.
This makes the code simpler, and will allow exposing the full
representational power of MemoryEffects in the future. Among other
things, this will allow us to map IntrHasSideEffects to an
inaccessiblemem readwrite, rather than just ignoring it entirely
in most cases.

To avoid layering issues, this moves the ModRef.h header from IR
to Support, so that it can be included in the TableGen layer.

Differential Revision: https://reviews.llvm.org/D137641
2022-11-14 10:52:04 +01:00
Matt Arsenault 3cfa03856f AtomicExpand: Support cmpxchg expansion for small FP types
Handles f16 atomics for AMDGPU.
2022-11-10 22:16:11 -08:00
Pierre van Houtryve 7425077e31 [AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Matt Arsenault ada6aa3f5c AMDGPU: Fold undef rcp to qnan
This matches the behavior in instcombine, and for fdiv.
2022-11-04 15:49:37 -07:00
Shilei Tian 1186e9d59f [LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).

In order to do that, this patch introduces a new interface function
`emitExpandAtomicRMW`. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129690
2022-11-04 14:11:05 -04:00
Nikita Popov 304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Jay Foad 5073ae2a88 [AMDGPU] Fix duplicated words in comments 2022-11-03 15:33:30 +00:00
Jay Foad 9bb1e21f07 [AMDGPU] Clean up calls to MachineOperand::setIsDead and friends. NFC. 2022-10-28 10:44:08 +01:00
Pierre van Houtryve 75b292cb14 [AMDGPU][DAG] Fix insert_vector_elt lowering for 8 bit elements
The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D135156
2022-10-04 14:48:15 +00:00
Pierre van Houtryve d8258508d4 [AMDGPU][GISel] Update `isCanonicalized`
Recognize more opcodes in the function.
Fixes some regressions introduced in D134857 for fdiv.f16 too.

Depends on D134857

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D134862
2022-09-30 14:13:35 +00:00
Matt Arsenault 7a84624079 AMDGPU: Make various vector undefs legal
Surprisingly these were getting legalized to something
zero initialized.

This fixes an infinite loop when combining some vector types.
Also fixes zero initializing some undef values.

SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking
for the legality of the output undefs they are replacing unused
operations with. This resulted in turning vectors into undefs
that were later re-legalized back into zero vectors.
2022-09-28 10:48:52 -04:00
Jon Chesterfield 80ba432821 [amdgpu][nfc] Allocate kernel-specific LDS struct deterministically
A kernel may have an associated struct for laying out LDS variables.
This patch puts that instance, if present, at a deterministic address by
allocating it at the same time as the module scope instance.

This is relatively likely to be where the instance was allocated anyway (~NFC)
but will allow later patches to calculate where a given field can be found,
which means a function which is only reachable from a single kernel will be
able to access a LDS variable with zero overhead. That will be particularly
helpful for applications that instantiate a function template containing LDS
variables once per kernel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127052
2022-09-28 14:55:16 +01:00
Carl Ritson 266b5dbc5d [AMDGPU] Add MIMG NSA threshold configuration attribute
Make MIMG NSA minimum addresses threshold an attribute that can
be set on a function or configured via command line.
This enables frontend tuning which allows increased NSA usage
where beneficial.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134780
2022-09-28 20:03:18 +09:00
Petar Avramovic 6db7921b65 AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd
Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.

global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors).
Patterns now check addrspace of pointer, added patterns for flat intrinsic.
with global addrspace pointer that selects into global atomic instruction.

buffer atomic fadd patterns changes:
Rdit patterns to import into global-isel.
Remove gfx6/gfx7 _addr64 and _offset patterns.
Remove patterns that can't be reached (same pattern but different feature).

Differential Revision: https://reviews.llvm.org/D130579
2022-09-23 17:52:10 +02:00
Petar Avramovic 5cee9047d5 AMDGPU: Improve atomicrmw fadd selection
Use same atomicrmw fadd expansion rules for gfx908, gfx940 and gfx11
as for gfx90a. Add missing globalisel legalizer support for flat
atomicrmw fadd f32 on gfx940 and gfx11.
Isel support for gfx11 will be added in D130579.

Differential Revision: https://reviews.llvm.org/D131560
2022-09-23 17:52:10 +02:00
Alexander Timofeev fbdea5a2e9 [AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler.  It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133593
2022-09-15 22:03:56 +02:00
Sergei Barannikov c6acb4eb0f [SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit.  This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
2022-09-15 14:02:12 -04:00
Jay Foad 3822a01e0b [AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Differential Revision: https://reviews.llvm.org/D133928
2022-09-15 16:46:14 +01:00
Matt Arsenault 7834194837 TableGen: Introduce generated getSubRegisterClass function
Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.
2022-09-12 09:03:37 -04:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Leon Clark 6a275cd53c Transform illegal intrinsics to V_ILLEGAL
Related tasks:

- SWDEV-240194
- SWDEV-309417
- SWDEV-334876

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123693
2022-08-06 08:59:00 +01:00
David Stuttard b14d7bf750 AMDGPU: Turn off force init 16 input SGPRS for pal
Pal uses a different mechanism for user sgprs.

Differential Revision: https://reviews.llvm.org/D129566
2022-07-25 10:52:46 +01:00
Kazu Hirata 0387da6f4f Use value instead of getValue (NFC) 2022-07-19 21:18:26 -07:00
Kazu Hirata 41ae78ea3a Use has_value instead of hasValue (NFC) 2022-07-19 20:15:44 -07:00
Jon Chesterfield 3a20597776 [amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060
2022-07-19 17:46:19 +01:00
Piotr Sobczak 2bd8e74b94 [AMDGPU] Fix bitcast v4i64/v16i16
Fix a regression introduced in D128865.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129375
2022-07-11 22:27:52 +02:00
Piotr Sobczak bd675af2a2 [AMDGPU] Make v16i16/v16f16 legal
There are upcoming intrinsics to use the new types.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128865
2022-06-30 23:08:40 +02:00
Matt Arsenault 0bdaef38c9 AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs
The total user+system SGPR count needs to be padded out to 16 if fewer
inputs are enabled.
2022-06-29 14:52:19 -04:00
Jay Foad 3fbc945c3a [AMDGPU] llvm.amdgcn.exp.compr is not supported on GFX11
Differential Revision: https://reviews.llvm.org/D128259
2022-06-28 14:48:25 +01:00
Joe Nash ae72fee74e [AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type
Reviewed By: #amdgpu, foad, rampitec

Differential Revision: https://reviews.llvm.org/D128205
2022-06-23 11:05:32 -04:00
Rodrigo Dominguez 971fa4b196 [AMDGPU] GFX11: remove ShaderType from ds_ordered_count offset field
In GFX11 ShaderType is determined by the hardware and should no longer
be written into bits[3:2] of the ds_ordered_count offset field.

Differential Revision: https://reviews.llvm.org/D128196
2022-06-23 14:20:33 +01:00
Jay Foad c155a944fb [AMDGPU] GFX11 CodeGen support for MIMG instructions
This includes:
- New llvm.amdgcn.image.msaa.load.* intrinsics
- NSA changes, because MIMG-NSA is now limited to 3 dwords
- Split CD forms of IMAGE_SAMPLE instructions out into separate
  test files since they are no longer supported in GFX11

Differential Revision: https://reviews.llvm.org/D127837
2022-06-16 18:23:14 +01:00
Jay Foad 4b2d70fa5b [AMDGPU] Basic implementation of isExtractSubvectorCheap
Add a basic implementation of isExtractSubvectorCheap that only
considers extracts at offset 0.

Differential Revision: https://reviews.llvm.org/D127385
2022-06-10 14:43:07 +01:00
Guillaume Chatelet 0788186182 [Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910
2022-06-07 13:52:20 +00:00