Commit Graph

140 Commits

Author SHA1 Message Date
Kazu Hirata 20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Ivan Kosarev 1b560e6ab7 [AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.
Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D137783
2022-11-14 15:36:18 +00:00
Pierre van Houtryve 7425077e31 [AMDGPU] Add & use `hasNamedOperand`, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Ivan Kosarev 693f816288 [AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.
Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D133787
2022-09-15 13:48:51 +01:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Carl Ritson 4c4db81630 [AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742
2022-07-30 11:38:39 +09:00
Stanislav Mekhanoshin 33fb23f728 [AMDGPU] Merge flat with global in the SILoadStoreOptimizer
Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.

Differential Revision: https://reviews.llvm.org/D120431
2022-03-09 10:04:37 -08:00
Stanislav Mekhanoshin 517171ce20 [AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120351
2022-02-28 11:27:30 -08:00
Stanislav Mekhanoshin 3279e44063 [AMDGPU] Extend SILoadStoreOptimizer to handle global stores
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120346
2022-02-24 11:09:51 -08:00
Stanislav Mekhanoshin cefa1c5ca9 [AMDGPU] Fix combined MMO in load-store merge
Loads and stores can be out of order in the SILoadStoreOptimizer.
When combining MachineMemOperands of two instructions operands are
sent in the IR order into the combineKnownAdjacentMMOs. At the
moment it picks the first operand and just replaces its offset and
size. This essentially loses alignment information and may generally
result in an incorrect base pointer to be used.

Use a base pointer in memory addresses order instead and only adjust
size.

Differential Revision: https://reviews.llvm.org/D120370
2022-02-24 10:47:57 -08:00
Stanislav Mekhanoshin 9e055c0fff [AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.

TODO: merge global stores.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120285
2022-02-22 09:01:43 -08:00
Stanislav Mekhanoshin ba17bd2674 [AMDGPU] Extend SILoadStoreOptimizer to handle global loads
There can be situations where global and flat loads and stores are not
combined by the vectorizer, in particular if their address space
differ in the IR but they end up the same class instructions after
selection. For example a divergent load from constant address space
ends up being the same global_load as a load from global address space.

TODO: merge global stores.
TODO: handle SADDR forms.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120279
2022-02-22 08:42:36 -08:00
Stanislav Mekhanoshin dc0981562e [AMDGPU] Remove redundand check in the SILoadStoreOptimizer
Differential Revision: https://reviews.llvm.org/D120268
2022-02-21 15:04:44 -08:00
Jay Foad 359a792f9b [AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases
Previously when combining two loads this pass would sink the
first one down to the second one, putting the combined load
where the second one was. It would also sink any intervening
instructions which depended on the first load down to just
after the combined load.

For example, if we started with this sequence of
instructions (code flowing from left to right):

  X A B C D E F Y

After combining loads X and Y into XY we might end up with:

  A B C D E F XY

But if B D and F depended on X, we would get:

  A C E XY B D F

Now if the original code had some short disjoint live ranges
from A to B, C to D and E to F, in the transformed code
these live ranges will be long and overlapping. In this way
a single merge of two loads could cause an unbounded
increase in register pressure.

To fix this, change the way the way that loads are moved in
order to merge them so that:
- The second load is moved up to the first one. (But when
  merging stores, we still move the first store down to the
  second one.)
- Intervening instructions are never moved.
- Instead, if we find an intervening instruction that would
  need to be moved, give up on the merge. But this case
  should now be pretty rare because normal stores have no
  outputs, and normal loads only have address register
  inputs, but these will be identical for any pair of loads
  that we try to merge.

As well as fixing the unbounded register pressure increase
problem, moving loads up and stores down seems like it
should usually be a win for memory latency reasons.

Differential Revision: https://reviews.llvm.org/D119006
2022-02-21 10:51:14 +00:00
Sebastian Neubauer 6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Jay Foad a456ace9c1 [AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.
Separate the function clearly into:
- Checks that can be done on CI and Paired before the loop.
- The loop over all instructions between CI and Paired.
- Checks that must be done on InstsToMove after the loop.

Previously these were mostly done inside the loop in a very
confusing way.

Differential Revision: https://reviews.llvm.org/D118994
2022-02-04 17:17:29 +00:00
Jay Foad 001cb43159 [AMDGPU] SILoadStoreOptimizer: fewer calls to offsetsCanBeCombined
Only call offsetsCanBeCombined with Modify = true in cases
where it will really do something. NFC.
2022-02-04 14:52:11 +00:00
Jay Foad 00bbda07ae [AMDGPU] SILoadStoreOptimizer: simplify class/subclass checks
Also add a comment explaining the difference between class
and subclass. NFCI.
2022-02-04 14:08:04 +00:00
Jay Foad 33ef8bdf36 [AMDGPU] SILoadStoreOptimizer: simplify optimizeInstsWithSameBaseAddr
Common up all the calls to CI.setMI. NFCI.
2022-02-04 13:25:05 +00:00
Jay Foad ca05edd927 [AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test
At this point CI represents the combined access (original CI combined
with Paired) so it doesn't make any sense to add in Paired.width again.
NFCI.
2022-02-04 13:02:19 +00:00
Jay Foad 68e3946270 [AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects
This just helps to keep the lists shorter and faster to sort. NFCI.

Differential Revision: https://reviews.llvm.org/D118384
2022-01-28 18:03:42 +00:00
Jay Foad 4b133cee80 [AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable
list seems cleaner than adding them to the list and rejecting them
later.

Differential Revision: https://reviews.llvm.org/D118368
2022-01-27 18:20:46 +00:00
Jay Foad 94a4594c54 [AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a
cleaner solution than putting them all in the same list and then later
refusing to merge instructions of different AGPR-ness.

Differential Revision: https://reviews.llvm.org/D118367
2022-01-27 18:20:46 +00:00
Jay Foad 8a52fef1e0 [AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.
Change CombineInfo::setMI to take a reference to the
SILoadStoreOptimizer instance, for easy access to common fields like
TII and STM.

Differential Revision: https://reviews.llvm.org/D118366
2022-01-27 18:20:46 +00:00
Jay Foad 185cb8e82c [AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access
Swizzled accesses are not merged, but there is no particular reason not
to merge two instructions if any of the intervening instructions happens
to be a swizzled access.

This moves the check for swizzled accesses out of checkAndPrepareMerge
into collectMergeableInsts where I think it makes more sense.

Differential Revision: https://reviews.llvm.org/D118267
2022-01-27 14:40:58 +00:00
Jay Foad 95857a7058 [AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile
SILoadStoreOptimizer::collectMergeableInsts already ends the current
block if it sees a volatile (or ordered) memory access, so there is no
need to check for them again when scanning the instructions between two
pairing candidates in a block.

Differential Revision: https://reviews.llvm.org/D118266
2022-01-27 10:14:53 +00:00
Jay Foad 63eea41de6 [AMDGPU] Simplify SILoadStoreOptimizer::getSubRegIdxs. NFC. 2022-01-19 15:20:35 +00:00
Kazu Hirata 5a667c0e74 [llvm] Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2021-12-28 08:52:25 -08:00
Christudasan Devadasan 654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Neubauer, Sebastian d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Martin Liska c5029023fb Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990
2021-11-02 14:28:00 +01:00
Piotr Sobczak 30d6c39bca [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM
Extend SILoadStoreOptimizer to merge into DWORDX8 variant of S_BUFFER_LOAD.

Merging into DWORDX2 and DWORDX4 variants is handled already.

Differential Revision: https://reviews.llvm.org/D108909
2021-09-02 16:26:25 +02:00
Carl Ritson 99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Matt Arsenault cc79aaced0 AMDGPU: Fix SILoadStoreOptimizer for gfx90a
This was hardcoding the register class to use for the newly created
pointer registers, violating the aligned VGPR requirement.
2021-05-11 21:26:43 -04:00
Stanislav Mekhanoshin c297709ee1 [AMDGPU] Fixed msan failure with uninitialized value 2021-03-15 13:58:19 -07:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Matt Arsenault 78b6d73a93 AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin 75997e8407 [AMDGPU] Fixed msan build
LoadStoreOptimizer was using uninitialized SCC value for
instructions where it is unsupported.
2021-02-17 18:01:23 -08:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Jay Foad 23db2d363f [AMDGPU] Better selection of base offset when merging DS reads/writes
When merging a pair of DS reads or writes needs to materialize the base
offset in a vgpr, choose a value that is aligned to as high a power of
two as possible. This maximises the chance that different pairs can use
the same base offset, in which case the base offset registers can be
commoned up by MachineCSE.

Differential Revision: https://reviews.llvm.org/D96421
2021-02-11 17:46:09 +00:00
Jay Foad 2114b458b0 [AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined 2021-02-10 14:49:33 +00:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Stanislav Mekhanoshin 91f503c3af [AMDGPU] gfx1030 RT support
Differential Revision: https://reviews.llvm.org/D87782
2020-09-16 11:40:58 -07:00
Jay Foad 3497860203 [AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
2020-08-20 17:59:11 +01:00
Matt Arsenault 79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Jay Foad 47788b97a9 SILoadStoreOptimizer: add support for GFX10 image instructions
GFX10 image instructions use one or more address operands starting at
vaddr0, instead of a single vaddr operand, to allow for NSA forms.

Differential Revision: https://reviews.llvm.org/D81675
2020-07-08 19:15:46 +01:00
Matt Arsenault d0b0b252e1 AMDGPU: Use IsSSA property check instead of asserting on isSSA
Also fix an SSA violation in a test the MIRParser/verifier fails to
catch. It's illegal to define a subregister in SSA. For the purpose of
the test, it just needs to define the super-register to use the
subregister in the use operand.
2020-06-29 10:05:23 -04:00
Matt Arsenault 35e6a9c839 AMDGPU: Break read2/write2 search range on a memory fence
This is to fix performance regressions introduced by
86c944d790.

The old search would collect all potentially mergeable instructions in
the entire block. In this case, the same address is written in
multiple places in the block on the other side of a fence. When sorted
by offset, the two unmergeable, identical addresses would be next to
each other and the merge would give up.

Break the search space when we encounter an instruction we won't be
able to merge across. This will keep the identical addresses in
different merge attempts.

This may also improve compile time by reducing the merge list size.
2020-04-24 15:53:30 -04:00
Matt Arsenault 6bffd0df78 AMDGPU: Fix redundant members 2020-04-23 23:14:01 -04:00