Commit Graph

7069 Commits

Author SHA1 Message Date
jeff 8a12f20ef7 [AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation 2022-07-14 16:24:13 -07:00
Alexander Timofeev 2e29b0138c [AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable.
Since the divergence-driven instruction selection has been enabled for AMDGPU,
 all the uniform instructions are expected to be selected to SALU form, except those not having one.
 VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm
 that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form
 and overhead introduced by the data transfer between the VGPRs and SGPRs.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128252
2022-07-14 23:59:02 +02:00
Jay Foad e45aa230ad [AMDGPU] Update LiveVariables after killing an immediate def
D114999 added code to kill an immediate def if it was folded into its
only use by convertToThreeAddress. This patch updates LiveVariables when
that happens in order to fix verification failures exposed by D129213.

Differential Revision: https://reviews.llvm.org/D129661
2022-07-14 10:49:41 +01:00
David Green 3e0bf1c7a9 [CodeGen] Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Recommitted with some fixes for the leftover MCII variables in release
builds.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-14 09:33:28 +01:00
Jannik Silvanus e5c4cde451 [AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features
The SI machine scheduler inherits from ScheduleDAGMI.
This patch adds support for a few features that are implemented
in ScheduleDAGMI (or its base classes) that were missing so far
because their support is implemented in overridden functions.

* Support cl::opt -view-misched-dags
  This option allows to open a graphical window of the scheduling DAG.

* Support cl::opt -misched-print-dags
  This option allows to print the scheduling DAG in text form.

* After constructing the scheduling DAG, call postprocessDAG()
  to apply any registered DAG mutations.
  Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp
  in case SIScheduler is used.
  Still add this to avoid surprises in the future in case mutations are added.

Differential Revision: https://reviews.llvm.org/D128808
2022-07-14 09:45:31 +02:00
Kazu Hirata 611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
David Green 95252133e1 Revert "Move instruction predicate verification to emitInstruction"
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
2022-07-13 13:28:11 +01:00
David Green e2fb8c0f4b Move instruction predicate verification to emitInstruction
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.

This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo.  The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.

The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.

Differential Revision: https://reviews.llvm.org/D129506
2022-07-13 12:53:32 +01:00
Jay Foad 5d41fe0768 [AMDGPU] SILowerControlFlow uses LiveIntervals
The availability of LiveIntervals affects kill flags in the output, so
declare the use to avoid strange effects where the output of this pass
is different depending on what other passes are scheduled after it.

Differential Revision: https://reviews.llvm.org/D129555
2022-07-12 16:53:53 +01:00
Piotr Sobczak 2bd8e74b94 [AMDGPU] Fix bitcast v4i64/v16i16
Fix a regression introduced in D128865.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129375
2022-07-11 22:27:52 +02:00
NAKAMURA Takumi 393e12bddd R600ISelLowering.h: Silence a warning. [-Warray-parameter]
FIXME: Could it be rewritten with llvm::ArrayRef ?
2022-07-10 18:29:55 +09:00
David Blaikie 9008d0a38e Fix -Warray-parameter warning
Remove the bound in the definition, since it's not guaranteed/could
provide a false sense of security (I'd be inclined to go further and
change this to a pointer parameter, since that's what it really is - but
figured I'd preserve some of the author's intent here)
2022-07-09 17:04:01 +00:00
serge-sans-paille e1272ab6ec [AMDGPU][NFC] Harmonize decl&def of R600TargetLowering::OptimizeSwizzle
The freshly baked -Warray-parameter warning discovered an inconsistency in
argument declaration, use the stricter one.

This fixes build issues like https://lab.llvm.org/buildbot#builders/18/builds/5305
2022-07-09 09:07:31 +02:00
Abinav Puthan Purayil 17a81ecf85 [AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection
This change replaces the C++ predicates with the HasNoUse builtin
predicate that would enable the no-ret atomic op selection in
GlobalISel.

Differential Revision: https://reviews.llvm.org/D125213
2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil 7504c7a877 [AMDGPU] Use AddedComplexity for ret and noret atomic ops selection
This patch removes the predicate for return atomic ops and uses
AddedComplexity to distinguish its selection from its no return variant.
This will produce better matchers that doesn't unnecessarily check for
the negated predicate if the initial predicate failed. Also, it
simplifies the enabling of no return atomic ops selection in GlobalISel.

Differential Revision: https://reviews.llvm.org/D128241
2022-07-08 09:47:33 +05:30
Austin Kerbow 6817031d0b [AMDGPU] Disable FillMFMAShadowMutation by default
Disable amdgpu mfma power sched.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D129172
2022-07-07 09:34:45 -07:00
Shilei Tian 1023ddaf77 [LLVM] Add the support for fmax and fmin in atomicrmw instruction
This patch adds the support for `fmax` and `fmin` operations in `atomicrmw`
instruction. For now (at least in this patch), the instruction will be expanded
to CAS loop. There are already a couple of targets supporting the feature. I'll
create another patch(es) to enable them accordingly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D127041
2022-07-06 10:57:53 -04:00
Thomas Symalla 86bd7e2065 [NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass.
This patch removes a bit of code duplication and
moves the v_cmpx optimization out of the
runOnMachineFunction pass.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D129086
2022-07-06 11:03:03 +02:00
Carl Ritson 8bc5e7ac51 [AMDGPU] Additional liveness tests for si-optimize-exec-masking-pre-ra
Merge tests and fixes from D128110 and D128315 on top of already
committed D128800.

Original author: arsenm

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128882
2022-07-06 15:05:32 +09:00
Jay Foad 4dbc2876cf [AMDGPU] GFX11 trivial NFC tweaks
A few miscellaneous comment, whitespace and indentation tweaks.
2022-07-05 17:20:17 +01:00
Jay Foad 12fd00ee17 [AMDGPU] Add patterns for GFX11 v_minmax and v_maxmin instructions
Differential Revision: https://reviews.llvm.org/D128445
2022-07-05 16:07:47 +01:00
Joe Nash 0483c91eee [AMDGPU] gfx11 CodeGen for new DPP instructions
Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP
instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC
with DPP.

Depends on D128656

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128682
2022-07-05 10:17:59 -04:00
Joe Nash d1af09ad96 [AMDGPU] gfx11 Generate VOPD Instructions
We form VOPD  instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a legal VOPD, namely that the matching operands
(e.g. src0x and src0y, src1x and src1y) must be in different register
banks. We add a PostRA scheduler
mutation to put possible VOPD components back-to-back.

Depends on D128442, D128270

Reviewed By: #amdgpu, rampitec

Differential Revision: https://reviews.llvm.org/D128656
2022-07-05 09:18:19 -04:00
Ivan Kosarev 4696a33dfa [AMDGPU][NFC] Refine matching SMRD offsets.
Tell the matcher what we are looking for instead of matching everything
and then discarding the result if doesn't fit.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D128171
2022-07-05 14:07:22 +01:00
Ivan Kosarev 8cd79bc12c [AMDGPU][GlobalISel] Support register offsets for SMRDs.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D128836
2022-07-05 13:41:06 +01:00
Thomas Symalla 04c5fed5e0 [NFC] Fix wrong comment. 2022-07-05 13:37:44 +02:00
Nikita Popov 8e70258b18 [AMDGPUCodeGenPrepare] Check result of ConstantFoldBinaryOpOperands()
This function will become fallible once we don't support constant
expressions for all binops, so make sure to check the result.
2022-07-04 14:20:23 +02:00
Mirko Brkusanin 2208342c9b [AMDGPU][GlobalISel] Always use VGPR bank for G_FCMP
Differential Revision: https://reviews.llvm.org/D128980
2022-07-01 15:03:37 +02:00
Piotr Sobczak b6ef36a1c4 [AMDGPU] Update WMMA intrinsics with explicit f16 types
Update intrinsics to use n x f16 and n x i16 instead
of 32-bit types. This may avoid the need for a bitcast
and is probably less confusing.

Depends on making v16f16 and v16i16 types legal.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128951
2022-07-01 08:55:25 +02:00
Piotr Sobczak bd675af2a2 [AMDGPU] Make v16i16/v16f16 legal
There are upcoming intrinsics to use the new types.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128865
2022-06-30 23:08:40 +02:00
Jay Foad 0f94d2b385 [AMDGPU] GFX11: automatically release VGPRs at the end of the shader
GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to
release a shader's VGPRs. Sending this at the end of a shader (just
before the s_endpgm) can help overall system performance in cases where
the s_endpgm would have to wait for outstanding VMEM stores to complete
before releasing the VGPRs.

Differential Revision: https://reviews.llvm.org/D128442
2022-06-30 20:55:14 +01:00
Piotr Sobczak 4874838a63 [AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756
2022-06-30 11:13:45 -04:00
Carl Ritson d0f6641615 [AMDGPU] Fix liveness for loops in si-optimize-exec-masking-pre-ra
Follow up to D127894, new liveness update code needs to handle
the case where S_ANDN2 input must be extended through loops when
V_CNDMASK_B32 has been hoisted.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D128800
2022-06-30 15:26:50 +09:00
Jay Foad cfb7ffdec0 [AMDGPU] New AMDGPUInsertDelayAlu pass
Differential Revision: https://reviews.llvm.org/D128270
2022-06-29 21:30:20 +01:00
Matt Arsenault 0bdaef38c9 AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs
The total user+system SGPR count needs to be padded out to 16 if fewer
inputs are enabled.
2022-06-29 14:52:19 -04:00
Matt Arsenault ffd6aaf5b6 AMDGPU: Make packed 32-bit instructions rematerializable 2022-06-29 11:57:54 -04:00
Matt Arsenault 4c400dc103 AMDGPU: Make 16-bit pk instructions rematerializable 2022-06-29 11:57:53 -04:00
Matt Arsenault da6d7728d4 AMDGPU: Mark more instructions as rematerializable
D106023 excluded 16-bit instructions from rematerialization, with the
justification that we can't rematerialize instructions that preserve
the high bits (plus the instructions which do are a confusing mess
between different subtargets). This doesn't make sense to me as a
problem since cases where we would rely on the high bit behavior would
still need to be represented as a register value constraint with a
tied operand. It's not a hidden side effect and should still be
rematerializable.
2022-06-29 11:19:15 -04:00
Matt Arsenault d342d130da AMDGPU: Use isMeta flags on pseudoinstructions 2022-06-29 10:31:29 -04:00
Stanislav Mekhanoshin 21895c6b50 [AMDGPU] Relax verification of soffset in scalar stores
It must use m0 only on GFX8. Later chips can use ang SGPR.

Differential Revision: https://reviews.llvm.org/D128765
2022-06-28 16:10:08 -07:00
Jay Foad 3fbc945c3a [AMDGPU] llvm.amdgcn.exp.compr is not supported on GFX11
Differential Revision: https://reviews.llvm.org/D128259
2022-06-28 14:48:25 +01:00
Joe Nash f1cfaa956d [AMDGPU] Use GFX11 S_PACK_HL instruction in more cases
Differential Revision: https://reviews.llvm.org/D128527
2022-06-28 14:35:19 +01:00
Jay Foad b5818e4eb4 [AMDGPU] Cluster stores as well as loads for GFX11
Differential Revision: https://reviews.llvm.org/D128517
2022-06-27 16:41:41 +01:00
Jay Foad 77e63b25f9 [AMDGPU] Fix assertion failure on mad with negative immediate addend
Without this, the new test case would fail with:

AMDGPUInstPrinter.cpp:545: void llvm::AMDGPUInstPrinter::printImmediate64(uint64_t, const llvm::MCSubtargetInfo &, llvm::raw_ostream &): Assertion `isUInt<32>(Imm) || Imm == 0x3fc45f306dc9c882' failed.

Differential Revision: https://reviews.llvm.org/D128435
2022-06-27 09:49:20 +01:00
Kazu Hirata a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata 3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Min-Yih Hsu 97579dcc6d [MCA] Introducing incremental SourceMgr and resumable pipeline
The new resumable mca::Pipeline capability introduced in this patch
allows users to save the current state of pipeline and resume from the
very checkpoint.
It is better (but not require) to use with the new IncrementalSourceMgr,
where users can add mca::Instruction incrementally rather than having a
fixed number of instructions ahead-of-time.

Note that we're using unit tests to test these new features. Because
integrating them into the `llvm-mca` tool will make too many churns.

Differential Revision: https://reviews.llvm.org/D127083
2022-06-24 15:39:51 -07:00
Joe Nash 07b7fada73 [AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32.
This patch includes MC layer support only.

A VOPD instruction is constituted of an X component (for which there are
13 possible opcodes) and a Y component (for which there are the 13 X
opcodes plus 3 more). Most of the complexity in defining and parsing
a VOPD operation arises from the possible different total numbers of
operands and deferred parsing of certain operands depending on the
constituent X and Y opcodes.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D128218
2022-06-24 11:08:39 -04:00
Konstantin Zhuravlyov 7736ce1c56 AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
It was causing bad machine code for several blender scenes:
  *** Bad machine code: Using an undefined physical register ***
  - function:    kernel_holdout_emission_blurring_pathtermination_ao
  - basic block: %bb.28 if.end40.i (0x7f84861a2320)
  - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec
  - operand 1:   $vgpr3

Differential Revision: https://reviews.llvm.org/D127768
2022-06-24 11:30:22 -04:00