Commit Graph

120 Commits

Author SHA1 Message Date
Jay Foad 68e3946270 [AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects
This just helps to keep the lists shorter and faster to sort. NFCI.

Differential Revision: https://reviews.llvm.org/D118384
2022-01-28 18:03:42 +00:00
Jay Foad 4b133cee80 [AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable
list seems cleaner than adding them to the list and rejecting them
later.

Differential Revision: https://reviews.llvm.org/D118368
2022-01-27 18:20:46 +00:00
Jay Foad 94a4594c54 [AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a
cleaner solution than putting them all in the same list and then later
refusing to merge instructions of different AGPR-ness.

Differential Revision: https://reviews.llvm.org/D118367
2022-01-27 18:20:46 +00:00
Jay Foad 8a52fef1e0 [AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.
Change CombineInfo::setMI to take a reference to the
SILoadStoreOptimizer instance, for easy access to common fields like
TII and STM.

Differential Revision: https://reviews.llvm.org/D118366
2022-01-27 18:20:46 +00:00
Jay Foad 185cb8e82c [AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access
Swizzled accesses are not merged, but there is no particular reason not
to merge two instructions if any of the intervening instructions happens
to be a swizzled access.

This moves the check for swizzled accesses out of checkAndPrepareMerge
into collectMergeableInsts where I think it makes more sense.

Differential Revision: https://reviews.llvm.org/D118267
2022-01-27 14:40:58 +00:00
Jay Foad 95857a7058 [AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile
SILoadStoreOptimizer::collectMergeableInsts already ends the current
block if it sees a volatile (or ordered) memory access, so there is no
need to check for them again when scanning the instructions between two
pairing candidates in a block.

Differential Revision: https://reviews.llvm.org/D118266
2022-01-27 10:14:53 +00:00
Jay Foad 63eea41de6 [AMDGPU] Simplify SILoadStoreOptimizer::getSubRegIdxs. NFC. 2022-01-19 15:20:35 +00:00
Kazu Hirata 5a667c0e74 [llvm] Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2021-12-28 08:52:25 -08:00
Christudasan Devadasan 654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Neubauer, Sebastian d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Martin Liska c5029023fb Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990
2021-11-02 14:28:00 +01:00
Piotr Sobczak 30d6c39bca [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM
Extend SILoadStoreOptimizer to merge into DWORDX8 variant of S_BUFFER_LOAD.

Merging into DWORDX2 and DWORDX4 variants is handled already.

Differential Revision: https://reviews.llvm.org/D108909
2021-09-02 16:26:25 +02:00
Carl Ritson 99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Matt Arsenault cc79aaced0 AMDGPU: Fix SILoadStoreOptimizer for gfx90a
This was hardcoding the register class to use for the newly created
pointer registers, violating the aligned VGPR requirement.
2021-05-11 21:26:43 -04:00
Stanislav Mekhanoshin c297709ee1 [AMDGPU] Fixed msan failure with uninitialized value 2021-03-15 13:58:19 -07:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Matt Arsenault 78b6d73a93 AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin 75997e8407 [AMDGPU] Fixed msan build
LoadStoreOptimizer was using uninitialized SCC value for
instructions where it is unsupported.
2021-02-17 18:01:23 -08:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Jay Foad 23db2d363f [AMDGPU] Better selection of base offset when merging DS reads/writes
When merging a pair of DS reads or writes needs to materialize the base
offset in a vgpr, choose a value that is aligned to as high a power of
two as possible. This maximises the chance that different pairs can use
the same base offset, in which case the base offset registers can be
commoned up by MachineCSE.

Differential Revision: https://reviews.llvm.org/D96421
2021-02-11 17:46:09 +00:00
Jay Foad 2114b458b0 [AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined 2021-02-10 14:49:33 +00:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Stanislav Mekhanoshin 91f503c3af [AMDGPU] gfx1030 RT support
Differential Revision: https://reviews.llvm.org/D87782
2020-09-16 11:40:58 -07:00
Jay Foad 3497860203 [AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
2020-08-20 17:59:11 +01:00
Matt Arsenault 79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Jay Foad 47788b97a9 SILoadStoreOptimizer: add support for GFX10 image instructions
GFX10 image instructions use one or more address operands starting at
vaddr0, instead of a single vaddr operand, to allow for NSA forms.

Differential Revision: https://reviews.llvm.org/D81675
2020-07-08 19:15:46 +01:00
Matt Arsenault d0b0b252e1 AMDGPU: Use IsSSA property check instead of asserting on isSSA
Also fix an SSA violation in a test the MIRParser/verifier fails to
catch. It's illegal to define a subregister in SSA. For the purpose of
the test, it just needs to define the super-register to use the
subregister in the use operand.
2020-06-29 10:05:23 -04:00
Matt Arsenault 35e6a9c839 AMDGPU: Break read2/write2 search range on a memory fence
This is to fix performance regressions introduced by
86c944d790.

The old search would collect all potentially mergeable instructions in
the entire block. In this case, the same address is written in
multiple places in the block on the other side of a fence. When sorted
by offset, the two unmergeable, identical addresses would be next to
each other and the merge would give up.

Break the search space when we encounter an instruction we won't be
able to merge across. This will keep the identical addresses in
different merge attempts.

This may also improve compile time by reducing the merge list size.
2020-04-24 15:53:30 -04:00
Matt Arsenault 6bffd0df78 AMDGPU: Fix redundant members 2020-04-23 23:14:01 -04:00
Matt Arsenault 50128f8a33 AMDGPU: Use Register 2020-04-23 22:25:36 -04:00
Jay Foad 0337017a9f [AMDGPU] Use SGPR instead of SReg classes
12994a70cf did this for 128-bit classes:

    SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
    the additional non-allocatable TTMP registers. There's no point in
    allocating SReg_128 vregs. This shrinks the size of the classes
    regalloc needs to consider, which is usually good.

This patch extends it to all classes > 64 bits, for consistency.

Differential Revision: https://reviews.llvm.org/D78622
2020-04-23 11:45:22 +01:00
Stanislav Mekhanoshin f2334a7ef2 [AMDGPU] Fix crash in SILoadStoreOptimizer
SILoadStoreOptimizer::checkAndPrepareMerge() expects base and
paired instruction to come in order and scans MBB from base to
the paired instruction. An original order can be changed if
there were a dependent instruction in between and base instruction
was moved.

Fixed by bailing the optimization. In theory it might be possible
still to perform a merge by swapping instructions, but on practice
it bails anyway because it finds dependency on that same instruction
which has resulted in the base move.

Differential Revision: https://reviews.llvm.org/D77245
2020-04-02 10:26:47 -07:00
Sebastian Neubauer 8756869170 [AMDGPU] Add a16 feature to gfx10
Based on D72931

This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gfx7/8.

Differential Revision: https://reviews.llvm.org/D73956
2020-02-10 09:04:23 +01:00
Matt Arsenault 0426c2d07d Reapply "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 6a4acb9d80.
2020-01-31 06:01:28 -08:00
Matt Arsenault 6a4acb9d80 Revert "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 17dbc6611d.

A test is failing on some bots
2020-01-30 15:39:51 -08:00
Matt Arsenault 17dbc6611d AMDGPU: Cleanup and fix SMRD offset handling
I believe this also fixes bugs with CI 32-bit handling, which was
incorrectly skipping offsets that look like signed 32-bit values. Also
validate the offsets are dword aligned before folding.
2020-01-30 15:04:21 -08:00
Tom Stellard cb297050bb AMDGPU/SILoadStoreOptimizer: Fix uninitialized variable error
This was introduced by 86c944d790 and
caught by the sanitizer-x86_64-linux-fast bot.
2020-01-24 21:53:05 -08:00
Tom Stellard 86c944d790 AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets
Summary:
This improves merging of sequences like:

store a, ptr + 4
store b, ptr + 8
store c, ptr + 12
store d, ptr + 16
store e, ptr + 20
store f, ptr

Prior to this patch the basic block was scanned in order to find instructions
to merge and the above sequence would be transformed to:

store4 <a, b, c, d>, ptr + 4
store e, ptr + 20
store r, ptr

With this change, we now sort all the candidate merge instructions by their offset,
so instructions are visited in offset order rather than in the order they appear
in the basic block.  We now transform this sequnce into:

store4 <f, a, b, c>, ptr
store2 <d, e>, ptr + 16

Another benefit of this change is that since we have sorted the mergeable lists
by offset, we can easily check if an instruction is mergeable by checking the
offset of the instruction that becomes before or after it in the sorted list.
Once we determine an instruction is not mergeable we can remove it from the list
and avoid having to do the more expensive mergeablilty checks.

Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin

Reviewed By: arsenm, nhaehnle

Subscribers: kerbowa, merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65966
2020-01-24 19:45:56 -08:00
Tom Stellard c3bc805f4f AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct
Summary:
Modify CombineInfo to only store information about a single instruction.
This is a little easier to work with and removes a lot of duplicate
initialization code.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm, nhaehnle

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71045
2019-12-17 13:43:10 -08:00
Tom Stellard bf13a71095 AMDGPU/SILoadStoreOptimizer: Simplify function
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71044
2019-12-12 06:53:03 -08:00
Piotr Sobczak 4a801170f3 [AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores
Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69794
2019-11-20 22:59:30 +01:00
Michael Liao 4a308d302c [AMDGPU] Keep consistent check of legal addressing mode.
Summary:
- Add test cases for GFX10, which has narrower offset range compared to
  GFX9.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70473
2019-11-20 15:08:17 -05:00
Nicolai Hähnle d8f7c68e28 AMDGPU/SILoadStoreOptimizer: fix a likely bug introduced recently
Summary:
We should check for same instruction class before checking whether they
have the same base address, else we might iterate out of bounds of a
MachineInstr operands list. The InstClass check is also cheaper.

This was introduced in SVN r373630.

Reviewers: tstellar

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68690
2019-11-16 11:35:34 +01:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Dávid Bolvanský c3d6f0ddee [SILoadStoreOptimizer] Fixed typo. NFCI. 2019-11-03 20:38:29 +01:00
Piotr Sobczak 02baaca742 [AMDGPU] Extend the SI Load/Store optimizer
Summary:
Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle
different flavours of image_load and image_sample instructions.

When the instructions of the same subclass differ only in dmask, merge
them and update dmask accordingly.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64911

llvm-svn: 374984
2019-10-16 10:17:02 +00:00
Matt Arsenault 12994a70cf AMDGPU: Use SGPR_128 instead of SReg_128 for vregs
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.

llvm-svn: 374284
2019-10-10 07:11:33 +00:00
Matt Arsenault 001826835a AMDGPU: Relax register classes used
llvm-svn: 374254
2019-10-09 22:44:48 +00:00
Piotr Sobczak 165e469145 [AMDGPU][SILoadStoreOptimizer] NFC: Refactor code
Summary:
This patch fixes a potential aliasing problem in InstClassEnum,
where local values were mixed with machine opcodes.

Introducing InstSubclass will keep them separate and help extending
InstClassEnum with other instruction types (e.g. MIMG) in the future.

This patch also makes getSubRegIdxs() more concise.

Reviewers: nhaehnle, arsenm, tstellar

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68384

llvm-svn: 373699
2019-10-04 07:09:40 +00:00