llvm-project

Commit Graph

Author	SHA1	Message	Date
jeff	8a12f20ef7	[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation	2022-07-14 16:24:13 -07:00
Alexander Timofeev	2e29b0138c	[AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable. Since the divergence-driven instruction selection has been enabled for AMDGPU, all the uniform instructions are expected to be selected to SALU form, except those not having one. VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form and overhead introduced by the data transfer between the VGPRs and SGPRs. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128252	2022-07-14 23:59:02 +02:00
Jay Foad	e45aa230ad	[AMDGPU] Update LiveVariables after killing an immediate def D114999 added code to kill an immediate def if it was folded into its only use by convertToThreeAddress. This patch updates LiveVariables when that happens in order to fix verification failures exposed by D129213. Differential Revision: https://reviews.llvm.org/D129661	2022-07-14 10:49:41 +01:00
David Green	3e0bf1c7a9	[CodeGen] Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Recommitted with some fixes for the leftover MCII variables in release builds. Differential Revision: https://reviews.llvm.org/D129506	2022-07-14 09:33:28 +01:00
Jannik Silvanus	e5c4cde451	[AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features The SI machine scheduler inherits from ScheduleDAGMI. This patch adds support for a few features that are implemented in ScheduleDAGMI (or its base classes) that were missing so far because their support is implemented in overridden functions. * Support cl::opt -view-misched-dags This option allows to open a graphical window of the scheduling DAG. * Support cl::opt -misched-print-dags This option allows to print the scheduling DAG in text form. * After constructing the scheduling DAG, call postprocessDAG() to apply any registered DAG mutations. Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp in case SIScheduler is used. Still add this to avoid surprises in the future in case mutations are added. Differential Revision: https://reviews.llvm.org/D128808	2022-07-14 09:45:31 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
David Green	95252133e1	Revert "Move instruction predicate verification to emitInstruction" This reverts commit `e2fb8c0f4b` as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.	2022-07-13 13:28:11 +01:00
David Green	e2fb8c0f4b	Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506	2022-07-13 12:53:32 +01:00
Jay Foad	5d41fe0768	[AMDGPU] SILowerControlFlow uses LiveIntervals The availability of LiveIntervals affects kill flags in the output, so declare the use to avoid strange effects where the output of this pass is different depending on what other passes are scheduled after it. Differential Revision: https://reviews.llvm.org/D129555	2022-07-12 16:53:53 +01:00
Piotr Sobczak	2bd8e74b94	[AMDGPU] Fix bitcast v4i64/v16i16 Fix a regression introduced in D128865. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129375	2022-07-11 22:27:52 +02:00
NAKAMURA Takumi	393e12bddd	R600ISelLowering.h: Silence a warning. [-Warray-parameter] FIXME: Could it be rewritten with llvm::ArrayRef ?	2022-07-10 18:29:55 +09:00
David Blaikie	9008d0a38e	Fix -Warray-parameter warning Remove the bound in the definition, since it's not guaranteed/could provide a false sense of security (I'd be inclined to go further and change this to a pointer parameter, since that's what it really is - but figured I'd preserve some of the author's intent here)	2022-07-09 17:04:01 +00:00
serge-sans-paille	e1272ab6ec	[AMDGPU][NFC] Harmonize decl&def of R600TargetLowering::OptimizeSwizzle The freshly baked -Warray-parameter warning discovered an inconsistency in argument declaration, use the stricter one. This fixes build issues like https://lab.llvm.org/buildbot#builders/18/builds/5305	2022-07-09 09:07:31 +02:00
Abinav Puthan Purayil	17a81ecf85	[AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection This change replaces the C++ predicates with the HasNoUse builtin predicate that would enable the no-ret atomic op selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D125213	2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil	7504c7a877	[AMDGPU] Use AddedComplexity for ret and noret atomic ops selection This patch removes the predicate for return atomic ops and uses AddedComplexity to distinguish its selection from its no return variant. This will produce better matchers that doesn't unnecessarily check for the negated predicate if the initial predicate failed. Also, it simplifies the enabling of no return atomic ops selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D128241	2022-07-08 09:47:33 +05:30
Austin Kerbow	6817031d0b	[AMDGPU] Disable FillMFMAShadowMutation by default Disable amdgpu mfma power sched. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D129172	2022-07-07 09:34:45 -07:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Thomas Symalla	86bd7e2065	[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass. This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D129086	2022-07-06 11:03:03 +02:00
Carl Ritson	8bc5e7ac51	[AMDGPU] Additional liveness tests for si-optimize-exec-masking-pre-ra Merge tests and fixes from D128110 and D128315 on top of already committed D128800. Original author: arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128882	2022-07-06 15:05:32 +09:00
Jay Foad	4dbc2876cf	[AMDGPU] GFX11 trivial NFC tweaks A few miscellaneous comment, whitespace and indentation tweaks.	2022-07-05 17:20:17 +01:00
Jay Foad	12fd00ee17	[AMDGPU] Add patterns for GFX11 v_minmax and v_maxmin instructions Differential Revision: https://reviews.llvm.org/D128445	2022-07-05 16:07:47 +01:00
Joe Nash	0483c91eee	[AMDGPU] gfx11 CodeGen for new DPP instructions Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC with DPP. Depends on D128656 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128682	2022-07-05 10:17:59 -04:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Ivan Kosarev	4696a33dfa	[AMDGPU][NFC] Refine matching SMRD offsets. Tell the matcher what we are looking for instead of matching everything and then discarding the result if doesn't fit. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128171	2022-07-05 14:07:22 +01:00
Ivan Kosarev	8cd79bc12c	[AMDGPU][GlobalISel] Support register offsets for SMRDs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128836	2022-07-05 13:41:06 +01:00
Thomas Symalla	04c5fed5e0	[NFC] Fix wrong comment.	2022-07-05 13:37:44 +02:00
Nikita Popov	8e70258b18	[AMDGPUCodeGenPrepare] Check result of ConstantFoldBinaryOpOperands() This function will become fallible once we don't support constant expressions for all binops, so make sure to check the result.	2022-07-04 14:20:23 +02:00
Mirko Brkusanin	2208342c9b	[AMDGPU][GlobalISel] Always use VGPR bank for G_FCMP Differential Revision: https://reviews.llvm.org/D128980	2022-07-01 15:03:37 +02:00
Piotr Sobczak	b6ef36a1c4	[AMDGPU] Update WMMA intrinsics with explicit f16 types Update intrinsics to use n x f16 and n x i16 instead of 32-bit types. This may avoid the need for a bitcast and is probably less confusing. Depends on making v16f16 and v16i16 types legal. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128951	2022-07-01 08:55:25 +02:00
Piotr Sobczak	bd675af2a2	[AMDGPU] Make v16i16/v16f16 legal There are upcoming intrinsics to use the new types. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128865	2022-06-30 23:08:40 +02:00
Jay Foad	0f94d2b385	[AMDGPU] GFX11: automatically release VGPRs at the end of the shader GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs. Differential Revision: https://reviews.llvm.org/D128442	2022-06-30 20:55:14 +01:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Carl Ritson	d0f6641615	[AMDGPU] Fix liveness for loops in si-optimize-exec-masking-pre-ra Follow up to D127894, new liveness update code needs to handle the case where S_ANDN2 input must be extended through loops when V_CNDMASK_B32 has been hoisted. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128800	2022-06-30 15:26:50 +09:00
Jay Foad	cfb7ffdec0	[AMDGPU] New AMDGPUInsertDelayAlu pass Differential Revision: https://reviews.llvm.org/D128270	2022-06-29 21:30:20 +01:00
Matt Arsenault	0bdaef38c9	AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs The total user+system SGPR count needs to be padded out to 16 if fewer inputs are enabled.	2022-06-29 14:52:19 -04:00
Matt Arsenault	ffd6aaf5b6	AMDGPU: Make packed 32-bit instructions rematerializable	2022-06-29 11:57:54 -04:00
Matt Arsenault	4c400dc103	AMDGPU: Make 16-bit pk instructions rematerializable	2022-06-29 11:57:53 -04:00
Matt Arsenault	da6d7728d4	AMDGPU: Mark more instructions as rematerializable D106023 excluded 16-bit instructions from rematerialization, with the justification that we can't rematerialize instructions that preserve the high bits (plus the instructions which do are a confusing mess between different subtargets). This doesn't make sense to me as a problem since cases where we would rely on the high bit behavior would still need to be represented as a register value constraint with a tied operand. It's not a hidden side effect and should still be rematerializable.	2022-06-29 11:19:15 -04:00
Matt Arsenault	d342d130da	AMDGPU: Use isMeta flags on pseudoinstructions	2022-06-29 10:31:29 -04:00
Stanislav Mekhanoshin	21895c6b50	[AMDGPU] Relax verification of soffset in scalar stores It must use m0 only on GFX8. Later chips can use ang SGPR. Differential Revision: https://reviews.llvm.org/D128765	2022-06-28 16:10:08 -07:00
Jay Foad	3fbc945c3a	[AMDGPU] llvm.amdgcn.exp.compr is not supported on GFX11 Differential Revision: https://reviews.llvm.org/D128259	2022-06-28 14:48:25 +01:00
Joe Nash	f1cfaa956d	[AMDGPU] Use GFX11 S_PACK_HL instruction in more cases Differential Revision: https://reviews.llvm.org/D128527	2022-06-28 14:35:19 +01:00
Jay Foad	b5818e4eb4	[AMDGPU] Cluster stores as well as loads for GFX11 Differential Revision: https://reviews.llvm.org/D128517	2022-06-27 16:41:41 +01:00
Jay Foad	77e63b25f9	[AMDGPU] Fix assertion failure on mad with negative immediate addend Without this, the new test case would fail with: AMDGPUInstPrinter.cpp:545: void llvm::AMDGPUInstPrinter::printImmediate64(uint64_t, const llvm::MCSubtargetInfo &, llvm::raw_ostream &): Assertion `isUInt<32>(Imm) \|\| Imm == 0x3fc45f306dc9c882' failed. Differential Revision: https://reviews.llvm.org/D128435	2022-06-27 09:49:20 +01:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit `aa8feeefd3`.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Min-Yih Hsu	97579dcc6d	[MCA] Introducing incremental SourceMgr and resumable pipeline The new resumable mca::Pipeline capability introduced in this patch allows users to save the current state of pipeline and resume from the very checkpoint. It is better (but not require) to use with the new IncrementalSourceMgr, where users can add mca::Instruction incrementally rather than having a fixed number of instructions ahead-of-time. Note that we're using unit tests to test these new features. Because integrating them into the `llvm-mca` tool will make too many churns. Differential Revision: https://reviews.llvm.org/D127083	2022-06-24 15:39:51 -07:00
Joe Nash	07b7fada73	[AMDGPU] gfx11 VOPD instructions MC support VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only. A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D128218	2022-06-24 11:08:39 -04:00
Konstantin Zhuravlyov	7736ce1c56	AMDGPU: Clear kill flags when optimizing vcmp save exec sequence It was causing bad machine code for several blender scenes: * Bad machine code: Using an undefined physical register * - function: kernel_holdout_emission_blurring_pathtermination_ao - basic block: %bb.28 if.end40.i (0x7f84861a2320) - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec - operand 1: $vgpr3 Differential Revision: https://reviews.llvm.org/D127768	2022-06-24 11:30:22 -04:00

1 2 3 4 5 ...

7069 Commits