Commit Graph

7105 Commits

Author SHA1 Message Date
Abinav Puthan Purayil 7504c7a877 [AMDGPU] Use AddedComplexity for ret and noret atomic ops selection
This patch removes the predicate for return atomic ops and uses
AddedComplexity to distinguish its selection from its no return variant.
This will produce better matchers that doesn't unnecessarily check for
the negated predicate if the initial predicate failed. Also, it
simplifies the enabling of no return atomic ops selection in GlobalISel.

Differential Revision: https://reviews.llvm.org/D128241
2022-07-08 09:47:33 +05:30
Austin Kerbow 6817031d0b [AMDGPU] Disable FillMFMAShadowMutation by default
Disable amdgpu mfma power sched.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D129172
2022-07-07 09:34:45 -07:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Thomas Symalla 86bd7e2065 [NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and
moves the v_cmpx optimization out of the
runOnMachineFunction pass.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D129086
2022-07-06 11:03:03 +02:00
Carl Ritson 8bc5e7ac51 [AMDGPU] Additional liveness tests for si-optimize-exec-masking-pre-ra
Merge tests and fixes from D128110 and D128315 on top of already
committed D128800.

Original author: arsenm

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128882
2022-07-06 15:05:32 +09:00
Jay Foad 4dbc2876cf [AMDGPU] GFX11 trivial NFC tweaks
A few miscellaneous comment, whitespace and indentation tweaks.
2022-07-05 17:20:17 +01:00
Jay Foad 12fd00ee17 [AMDGPU] Add patterns for GFX11 v_minmax and v_maxmin instructions
Differential Revision: https://reviews.llvm.org/D128445
2022-07-05 16:07:47 +01:00
Joe Nash 0483c91eee [AMDGPU] gfx11 CodeGen for new DPP instructions
Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP
instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC
with DPP.

Depends on D128656

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128682
2022-07-05 10:17:59 -04:00
Joe Nash d1af09ad96 [AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD  instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a legal VOPD, namely that the matching operands
(e.g. src0x and src0y, src1x and src1y) must be in different register
banks. We add a PostRA scheduler
mutation to put possible VOPD components back-to-back.

Depends on D128442, D128270

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128656
2022-07-05 09:18:19 -04:00
Ivan Kosarev 4696a33dfa [AMDGPU][NFC] Refine matching SMRD offsets.
Tell the matcher what we are looking for instead of matching everything
and then discarding the result if doesn't fit.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D128171
2022-07-05 14:07:22 +01:00
Ivan Kosarev 8cd79bc12c [AMDGPU][GlobalISel] Support register offsets for SMRDs.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D128836
2022-07-05 13:41:06 +01:00
Thomas Symalla 04c5fed5e0 [NFC] Fix wrong comment. 2022-07-05 13:37:44 +02:00
Nikita Popov 8e70258b18 [AMDGPUCodeGenPrepare] Check result of ConstantFoldBinaryOpOperands()
This function will become fallible once we don't support constant
expressions for all binops, so make sure to check the result.
2022-07-04 14:20:23 +02:00
Mirko Brkusanin 2208342c9b [AMDGPU][GlobalISel] Always use VGPR bank for G_FCMP
Differential Revision: https://reviews.llvm.org/D128980
2022-07-01 15:03:37 +02:00
Piotr Sobczak b6ef36a1c4 [AMDGPU] Update WMMA intrinsics with explicit f16 types
Update intrinsics to use n x f16 and n x i16 instead
of 32-bit types. This may avoid the need for a bitcast
and is probably less confusing.

Depends on making v16f16 and v16i16 types legal.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128951
2022-07-01 08:55:25 +02:00
Piotr Sobczak bd675af2a2 [AMDGPU] Make v16i16/v16f16 legal
There are upcoming intrinsics to use the new types.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128865
2022-06-30 23:08:40 +02:00
Jay Foad 0f94d2b385 [AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to
release a shader's VGPRs. Sending this at the end of a shader (just
before the s_endpgm) can help overall system performance in cases where
the s_endpgm would have to wait for outstanding VMEM stores to complete
before releasing the VGPRs.

Differential Revision: https://reviews.llvm.org/D128442
2022-06-30 20:55:14 +01:00
Piotr Sobczak 4874838a63 [AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756
2022-06-30 11:13:45 -04:00
Carl Ritson d0f6641615 [AMDGPU] Fix liveness for loops in si-optimize-exec-masking-pre-ra
Follow up to D127894, new liveness update code needs to handle
the case where S_ANDN2 input must be extended through loops when
V_CNDMASK_B32 has been hoisted.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128800
2022-06-30 15:26:50 +09:00
Jay Foad cfb7ffdec0 [AMDGPU] New AMDGPUInsertDelayAlu pass
Differential Revision: https://reviews.llvm.org/D128270
2022-06-29 21:30:20 +01:00
Matt Arsenault 0bdaef38c9 AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs
The total user+system SGPR count needs to be padded out to 16 if fewer
inputs are enabled.
2022-06-29 14:52:19 -04:00
Matt Arsenault ffd6aaf5b6 AMDGPU: Make packed 32-bit instructions rematerializable 2022-06-29 11:57:54 -04:00
Matt Arsenault 4c400dc103 AMDGPU: Make 16-bit pk instructions rematerializable 2022-06-29 11:57:53 -04:00
Matt Arsenault da6d7728d4 AMDGPU: Mark more instructions as rematerializable
D106023 excluded 16-bit instructions from rematerialization, with the
justification that we can't rematerialize instructions that preserve
the high bits (plus the instructions which do are a confusing mess
between different subtargets). This doesn't make sense to me as a
problem since cases where we would rely on the high bit behavior would
still need to be represented as a register value constraint with a
tied operand. It's not a hidden side effect and should still be
rematerializable.
2022-06-29 11:19:15 -04:00
Matt Arsenault d342d130da AMDGPU: Use isMeta flags on pseudoinstructions 2022-06-29 10:31:29 -04:00
Stanislav Mekhanoshin 21895c6b50 [AMDGPU] Relax verification of soffset in scalar stores
It must use m0 only on GFX8. Later chips can use ang SGPR.

Differential Revision: https://reviews.llvm.org/D128765
2022-06-28 16:10:08 -07:00
Jay Foad 3fbc945c3a [AMDGPU] llvm.amdgcn.exp.compr is not supported on GFX11
Differential Revision: https://reviews.llvm.org/D128259
2022-06-28 14:48:25 +01:00
Joe Nash f1cfaa956d [AMDGPU] Use GFX11 S_PACK_HL instruction in more cases
Differential Revision: https://reviews.llvm.org/D128527
2022-06-28 14:35:19 +01:00
Jay Foad b5818e4eb4 [AMDGPU] Cluster stores as well as loads for GFX11
Differential Revision: https://reviews.llvm.org/D128517
2022-06-27 16:41:41 +01:00
Jay Foad 77e63b25f9 [AMDGPU] Fix assertion failure on mad with negative immediate addend
Without this, the new test case would fail with:

AMDGPUInstPrinter.cpp:545: void llvm::AMDGPUInstPrinter::printImmediate64(uint64_t, const llvm::MCSubtargetInfo &, llvm::raw_ostream &): Assertion `isUInt<32>(Imm) || Imm == 0x3fc45f306dc9c882' failed.

Differential Revision: https://reviews.llvm.org/D128435
2022-06-27 09:49:20 +01:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Min-Yih Hsu 97579dcc6d [MCA] Introducing incremental SourceMgr and resumable pipeline
The new resumable mca::Pipeline capability introduced in this patch
allows users to save the current state of pipeline and resume from the
very checkpoint.
It is better (but not require) to use with the new IncrementalSourceMgr,
where users can add mca::Instruction incrementally rather than having a
fixed number of instructions ahead-of-time.

Note that we're using unit tests to test these new features. Because
integrating them into the `llvm-mca` tool will make too many churns.

Differential Revision: https://reviews.llvm.org/D127083
2022-06-24 15:39:51 -07:00
Joe Nash 07b7fada73 [AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32.
This patch includes MC layer support only.

A VOPD instruction is constituted of an X component (for which there are
13 possible opcodes) and a Y component (for which there are the 13 X
opcodes plus 3 more). Most of the complexity in defining and parsing
a VOPD operation arises from the possible different total numbers of
operands and deferred parsing of certain operands depending on the
constituent X and Y opcodes.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D128218
2022-06-24 11:08:39 -04:00
Konstantin Zhuravlyov 7736ce1c56 AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
It was causing bad machine code for several blender scenes:
  *** Bad machine code: Using an undefined physical register ***
  - function:    kernel_holdout_emission_blurring_pathtermination_ao
  - basic block: %bb.28 if.end40.i (0x7f84861a2320)
  - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec
  - operand 1:   $vgpr3

Differential Revision: https://reviews.llvm.org/D127768
2022-06-24 11:30:22 -04:00
Joe Nash ae72fee74e [AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type
Reviewed By: #amdgpu, foad, rampitec

Differential Revision: https://reviews.llvm.org/D128205
2022-06-23 11:05:32 -04:00
Baptiste Saleil 79e77a9f39 [AMDGPU] Flush the vmcnt counter in loop preheaders when necessary
waitcnt vmcnt instructions are currently generated in loop bodies before using
values loaded outside of the loop. In some cases, it is better to flush the
vmcnt counter in a loop preheader before entering the loop body. This patch
detects these cases and generates waitcnt instructions to flush the counter.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D115747
2022-06-23 10:53:21 -04:00
Rodrigo Dominguez 971fa4b196 [AMDGPU] GFX11: remove ShaderType from ds_ordered_count offset field
In GFX11 ShaderType is determined by the hardware and should no longer
be written into bits[3:2] of the ds_ordered_count offset field.

Differential Revision: https://reviews.llvm.org/D128196
2022-06-23 14:20:33 +01:00
Ruiling Song 49b8ca3f7c AMDGPU: Don't crash on global_ctor/dtor declaration
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128320
2022-06-23 21:04:54 +08:00
Dmitry Preobrazhensky dcb24f93af [AMDGPU][MC][GFX11] Correct disassembly of VOP3.DPP8 opcodes
Fix bug #56163.
Add W32/W64 tests for all VOP3.DPP opcodes.

Differential Revision: https://reviews.llvm.org/D128369
2022-06-23 13:07:45 +03:00
Matt Arsenault b03d902b61 AMDGPU: Fix invalid liveness after si-optimize-exec-masking-pre-ra
This was leaving behind a use at the deleted instruction which the
verifier would fail during allocation.
2022-06-22 20:49:03 -04:00
serge-sans-paille 27fd01d3f8 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fb67d683db detected a few
regressions, fixing them.

The impact on preprocessed output is negligible: -4k lines.
2022-06-22 18:50:39 +02:00
Guillaume Chatelet cef65864af [Alignment] Use Align for MaxKernArgAlign
Differential Revision: https://reviews.llvm.org/D128118
2022-06-22 13:40:37 +00:00
Ruiling Song 4dcb42fae5 AMDGPU: Skip unexpected CFG in SIOptimizeVGPRLiveRange
There are some cases that we use si_if/si_else in unatural way.
Just skip them.

Fixes: https://github.com/llvm/llvm-project/issues/55922

Reviewed by: critson

Differential Revision: https://reviews.llvm.org/D128193
2022-06-22 12:49:41 +08:00
Joe Nash 90254d524f [AMDGPU] gfx11 Remove SDWA from shuffle_vector ISel
gfx11 does not have SDWA

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128208
2022-06-21 14:55:00 -04:00
Jay Foad 929a8ad2b6 [AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11
The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed
in GFX11. It is now in units of 256 dwords instead of 128 dwords.

COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of
128 dwords.

Differential Revision: https://reviews.llvm.org/D128179
2022-06-21 14:48:12 +01:00
Carl Ritson 62abc8c200 [AMDGPU] Set GFX11 null export target based on export attributes
If shader only has depth exports use MRTZ otherwise use MRT0.

Differential Revision: https://reviews.llvm.org/D128185
2022-06-21 09:40:31 +01:00
Kazu Hirata 7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata 064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
Ruiling Song 732eed40fd [AMDGPU] Mark GFX11 dual source blend export as strict-wqm
The instructions that generate the source of dual source blend export
should run in strict-wqm. That is if any lane in a quad is active,
we need to enable all four lanes of that quad to make the shuffling
operation before exporting to dual source blend target work correctly.

Differential Revision: https://reviews.llvm.org/D127981
2022-06-20 21:58:12 +01:00
Piotr Sobczak 29621c13ef [AMDGPU] Tag GFX11 LDS loads as using strict_wqm
LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad
(if any pixel is enabled in the quad, data is written
to all 4 pixels/threads in the quad).

Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm
to enforce this and avoid lane clobbering issues.
Note that only the instruction itself is tagged.
The implicit uses of these do not need to be set WQM.
The reduces unnecessary WQM calculation of M0.

Differential Revision: https://reviews.llvm.org/D127977
2022-06-20 21:58:12 +01:00
Jay Foad 13107c2770 [AMDGPU] Add support for GFX11 LDSDIR hazards
Detect LDS direct WAR/WAW hazards and compute values for
wait_vdst (va_vdst) parameter.  Where appropriate this
raises wait_vdst from the default 0 to allow concurrent
issue of LDS direct with VALU execution.

Also detect LDS direct versus VMEM source VGPR hazards
and insert vm_vsrc=0 waits using s_waitcnt_depctr.

Differential Revision: https://reviews.llvm.org/D127963
2022-06-20 21:58:12 +01:00
Guillaume Chatelet d154d0ac06 [NFC] Simplify code 2022-06-20 15:15:52 +00:00
Jay Foad ba306216d2 [AMDGPU] Reorder cases. NFC. 2022-06-20 14:30:17 +01:00
Jay Foad d7762a3b36 [AMDGPU] Increase instruction cache line size to 128 bytes for GFX11
Differential Revision: https://reviews.llvm.org/D128189
2022-06-20 14:25:10 +01:00
Jay Foad b8e32e808d [AMDGPU] Remove a duplicate atomic fadd pattern
This was left over after D124538.
2022-06-20 14:08:57 +01:00
Dmitry Preobrazhensky 485e8b4f63 [AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes
Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065.

Differential Revision: https://reviews.llvm.org/D128075
2022-06-20 14:23:07 +03:00
Mirko Brkusanin 6cae753bf4 [AMDGPU][GlobalISel] Legalize G_FSUB for s16
Differential Revision: https://reviews.llvm.org/D128066
2022-06-20 12:25:49 +02:00
Guillaume Chatelet f1255186c7 [NFC][Alignment] Remove max functions between Align and MaybeAlign
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.

Differential Revision: https://reviews.llvm.org/D128121
2022-06-20 08:37:48 +00:00
Jay Foad 7050d5b98c [AMDGPU] Limit GFX11 to using 128 VGPRs
This is a temporary measure to avoid generating incorrect code until the
compiler understands the new way that GFX11 encodes 16-bit operands in
VOP instructions.

Differential Revision: https://reviews.llvm.org/D128054
2022-06-20 07:58:27 +01:00
Kazu Hirata 129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
Kazu Hirata 437f960062 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 10:22:05 -07:00
Kazu Hirata 4271a1ff33 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 10:17:22 -07:00
Joe Nash 2a68364745 [AMDGPU] gfx11 waitcnt support for VINTERP and LDSDIR instructions
Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127781
2022-06-17 09:30:37 -04:00
Joe Nash 20d20156f4 [AMDGPU] gfx11 VINTERP intrinsics and ISel support
Depends on D127664

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127756
2022-06-17 09:16:59 -04:00
Joe Nash 6d5d8b1313 [AMDGPU] gfx11 ldsdir intrinsics and ISel
Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D127664
2022-06-17 09:03:16 -04:00
LiaoChunyu 6181c19283 [AMDGPU][NFC] Remove isConstantAddr
fix isConstantAddr defined but not used

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D127959
2022-06-17 08:49:29 +08:00
Joe Nash 2d43de13df [AMDGPU] gfx11 new dot instruction codegen support
Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127904
2022-06-16 14:19:34 -04:00
Jay Foad 7e681ef35e [AMDGPU] Add GFX11 codegen for llvm.amdgcn.mov.dpp8
Differential Revision: https://reviews.llvm.org/D127980
2022-06-16 19:44:28 +01:00
Jay Foad 36ec1fcaac [AMDGPU] Add GFX11 llvm.amdgcn.ds.add.gs.reg.rtn / llvm.amdgcn.ds.sub.gs.reg.rtn intrinsics
Differential Revision: https://reviews.llvm.org/D127955
2022-06-16 18:23:14 +01:00
Jay Foad c155a944fb [AMDGPU] GFX11 CodeGen support for MIMG instructions
This includes:
- New llvm.amdgcn.image.msaa.load.* intrinsics
- NSA changes, because MIMG-NSA is now limited to 3 dwords
- Split CD forms of IMAGE_SAMPLE instructions out into separate
  test files since they are no longer supported in GFX11

Differential Revision: https://reviews.llvm.org/D127837
2022-06-16 18:23:14 +01:00
Jay Foad 445a483b41 [AMDGPU] Add new GFX11 intrinsic llvm.amdgcn.exp.row
Differential Revision: https://reviews.llvm.org/D127671
2022-06-16 18:23:14 +01:00
Dmitry Preobrazhensky b26afab9d1 [AMDGPU][MC][GFX11] Correct src0 for dpp variants of v_cvt_*_e64
Differential Revision: https://reviews.llvm.org/D127847
2022-06-16 13:48:43 +03:00
David Stuttard 77851cc1cf [AMDGPU] Change use null for dead sdst to be gfx1030+
Pre gfx1030 null for sdst is different.
c97436f8b6 [AMDGPU] Use null for dead sdst operand - requires a change to make
it not apply to pre gfx1030

Differential Revision: https://reviews.llvm.org/D127869
2022-06-16 10:39:06 +01:00
Jay Foad 9dff14be9e [AMDGPU] Add support for GFX11 hazards
Add support for partial stall over EXEC hazard and trans use hazard.

Differential Revision: https://reviews.llvm.org/D127872
2022-06-16 08:15:21 +01:00
Austin Kerbow 4bba82116a [AMDGPU] Fix buildbot failures after 48ebc1af29
Some buildbots (lto, windows) were failing due to some function reference
variables being improperly initialized.
2022-06-15 00:23:30 -07:00
Austin Kerbow 48ebc1af29 [AMDGPU] Add more expressive sched_barrier controls
The sched_barrier builtin allow the scheduler's behavior to be shaped by users
when very specific codegen is needed in order to create highly optimized code.
This patch adds more granular control over the types of instructions that are
allowed to be reordered with respect to one or multiple sched_barriers. A mask
is used to specify groups of instructions that should be allowed to be scheduled
around a sched_barrier. The details about this mask may be used can be found in
llvm/include/llvm/IR/IntrinsicsAMDGPU.td.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D127123
2022-06-14 22:03:05 -07:00
Austin Kerbow bd9eed3aec [AMDGPU] Add isMFMA helper function. NFC
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D127124
2022-06-14 22:01:49 -07:00
Joe Nash 989bd57f98 [AMDGPU] gfx11 support add_f16
The instruction was skipped in the earlier large patch adding
VOP2, https://reviews.llvm.org/D126917.

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127697
2022-06-14 08:59:45 -04:00
Dmitry Preobrazhensky 365d827f65 [AMDGPU][MC][GFX11] Correct ds_swizzle_b32
Enable offset parsing.

Differential Revision: https://reviews.llvm.org/D127404
2022-06-14 12:58:03 +03:00
Stanislav Mekhanoshin c97436f8b6 [AMDGPU] Use null for dead sdst operand
Differential Revision: https://reviews.llvm.org/D127542
2022-06-13 14:41:40 -07:00
Stanislav Mekhanoshin cb9ae93712 [AMDGPU] Define SGPR_NULL64 register. NFCI.
On gfx10+ null register can be used as both 32 and 64 bit operand.
Define a 64 bit version of the register to use during codegen.

Differential Revision: https://reviews.llvm.org/D127527
2022-06-13 13:23:33 -07:00
Jay Foad bfcfd53b92 [AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.

Also use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D127662
2022-06-13 21:12:11 +01:00
Jay Foad 7b9f620e78 [AMDGPU] Work around GFX11 flat scratch SVS swizzling bug
Differential Revision: https://reviews.llvm.org/D127635
2022-06-13 21:00:42 +01:00
Jay Foad d943c51465 [AMDGPU] Fix GFX11 codegen for V_MAD_U64_U32 and V_MAD_I64_I32
GFX11 uses different pseudos for these because of a new constraint
on which operands' registers can overlap.

Differential Revision: https://reviews.llvm.org/D127659
2022-06-13 20:59:18 +01:00
Stanislav Mekhanoshin 0f81830632 [AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR
This uses rotating reminder of division by 3 to select another
temp vgpr each next time in a sequence of several agpr copies.
Therefore, temp vgpr selection depends on the generated agpr
number. This number could change with any unrelated change to
the register definitions.

Stabilize the selection by using a real agpr number.

Differential Revision: https://reviews.llvm.org/D127524
2022-06-13 09:39:46 -07:00
Fangrui Song adf4142f76 [MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
2022-06-10 22:50:55 -07:00
Jay Foad ff85d61a6e Update *_TMPRING_SIZE.WAVESIZE for GFX11
The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and
SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units
of 64 dwords instead of 256 dwords, and the field has been widened
from 13 bits to 15 bits.

Depends on D126989

Reviewed By: rampitec, arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D127248
2022-06-10 13:24:00 -04:00
Joe Nash ea3c9a87d3 [AMDGPU] gfx11 add bits to COMPUTE_PGM_RSRC3
Contributors:
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 21/N for upstreaming of AMDGPU gfx11 architecture

Depends on D127143

Reviewed By: rampitec, #amdgpu, kzhuravl

Differential Revision: https://reviews.llvm.org/D127241
2022-06-10 13:07:14 -04:00
Joe Nash 78d8fdb88b [AMDGPU] NFC. Comment change to GFX10+ in AsmParser 2022-06-10 12:34:07 -04:00
Joe Nash 9175ab7746 [AMDGPU] gfx11 SRC_POPS_EXISTING_WAVE_ID is removed 2022-06-10 12:32:22 -04:00
Joe Nash fd3304ef85 [AMDGPU] gfx11 EXECZ and VCCZ are no longer allowed to be used as
sources to SALU and VALU instructions.

Contributors:
Baptiste Saleil <baptiste.saleil@amd.com>

Patch 20/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126989

Reviewed By: rampitec, foad, #amdgpu

Differential Revision: https://reviews.llvm.org/D127143
2022-06-10 10:03:43 -04:00
Jay Foad 4b2d70fa5b [AMDGPU] Basic implementation of isExtractSubvectorCheap
Add a basic implementation of isExtractSubvectorCheap that only
considers extracts at offset 0.

Differential Revision: https://reviews.llvm.org/D127385
2022-06-10 14:43:07 +01:00
Ivan Kosarev 60d6fbb621 [AMDGPU][GFX9][GFX10] Support base+soffset+offset SMEM atomics.
Resolves a part of
https://github.com/llvm/llvm-project/issues/38652

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D127314
2022-06-10 13:22:41 +01:00
Dmitry Preobrazhensky f8aba9995a [AMDGPU][MC][GFX1013] Enable image_msaa_load
Differential Revision: https://reviews.llvm.org/D127198
2022-06-10 13:42:05 +03:00
Jay Foad 6c372daa84 [AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn
Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and
s_sendmsg_rtn_b64 instructions.

Differential Revision: https://reviews.llvm.org/D127315
2022-06-10 08:15:23 +01:00
Jay Foad b0a3849439 [AMDGPU] Update dlc usage for GFX11
In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed
to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass.

Update the documentation and SIMemoryLegalizer accordingly. Set dlc for
nontemporal and volatile accesses.

Differential Revision: https://reviews.llvm.org/D127405
2022-06-10 08:10:34 +01:00
Jay Foad ffe86e3bdd [AMDGPU] Update SIInsertHardClauses for GFX11
Changes for GFX11:
- Clauses may not mix instructions of different types, and there are
  more types. For example image instructions with and without a sampler
  are now different types.
- The max size of a clause is explicitly documented as 63 instructions.
  Previously it was implicitly assumed to be 64. This is such a tiny
  difference that it does not seem worth making it conditional on the
  subtarget.
- It can be beneficial to clause stores as well as loads.

Differential Revision: https://reviews.llvm.org/D127391
2022-06-09 21:29:56 +01:00
Joe Nash be1082c6d5 [AMDGPU] gfx11 VOPC instructions
Supports encoding existing instrutions on gfx11 and MC support for the new VOPC
dpp instructions.

Patch 19/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126978

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126989
2022-06-09 15:22:42 -04:00