There are cases where infer address spaces pass cannot yet
infer an address space in the opt pipeline and then in the
llc pipeline it runs too late for atomic expand pass to
benefit from a specific address space.
Move atomic expand pass past the infer address spaces.
Fixes: SWDEV-293410
Differential Revision: https://reviews.llvm.org/D105511
Update AMDGPU gfx90a memory model to make coarse grain memory allocations
consistent when fine grained system scope atomic acquire and release is
performed.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D105137
Previously we were allowing to use FP atomics without
-amdgpu-unsafe-fp-atomics option if a scope is less then
system. This is not safe just as well if we have UC memory.
This change only allows global and flat FP atomics with
the unsafe option. Consequentially that makes a check for
denorm mode redundant since we skip it with the unsafe
option and do not have a way to produce these instructions
without it anyway.
Differential Revision: https://reviews.llvm.org/D102347
If amdgpu-unsafe-fp-atomics is specified, allow {flat|global}_atomic_add_f32 even if atomic modes don't match.
Differential Revision: https://reviews.llvm.org/D95391
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.
Differential Revision: https://reviews.llvm.org/D87719
Global TableGen let override blocks are pretty dangerous and override
any local special cases. In this case, the broader HasFlatGlobalInsts
was overriding the more specific predicate for
FeatureAtomicFaddInsts. Make sure HasFlatGlobalInsts is implied by
FeatureAtomicFaddInsts, and make sure the right predicate is used.
One issue with independently setting the subtarget features on
incompatible targets is all of the encoding families do not define all
opcodes. This will hit an assert on gfx10 for example, since we set
the encoding independently based on the generation and not based on a
feature.