llvm-project

Commit Graph

Author	SHA1	Message	Date
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Ruiling Song	0404aafbe3	AMDGPU: Factor out hasDivergentBranch(). NFC This is helpful for detecting whether a block ends with divergent branch in passes before lowering the pseudo control flow instructions. Differential Revision: https://reviews.llvm.org/D133184	2022-09-14 13:27:21 +08:00
Matt Arsenault	7834194837	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.	2022-09-12 09:03:37 -04:00
Jay Foad	afa0ed33df	[AMDGPU] Fix shrinking of F16 FMA on newer subtargets D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of the opcode which is used on GFX9+. Differential Revision: https://reviews.llvm.org/D133489	2022-09-08 16:41:04 +01:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	d0ec61c9ff	[Target] Remove unused forward declarations (NFC)	2022-08-07 00:16:16 -07:00
Jay Foad	c24d68fff1	[AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress This improves a corner case where v_fmac can be converted to v_fma on GFX10+ even if it has a literal operand. Differential Revision: https://reviews.llvm.org/D130992	2022-08-02 17:27:11 +01:00
Stanislav Mekhanoshin	68901fdbeb	[AMDGPU] Consider S_SETPRIO a scheduling boundary The instruction is used to modify wave priority with the intent to affect VALU execution and currently we can reschedule VALU around it since that VALU does not have side effects. Differential Revision: https://reviews.llvm.org/D130654	2022-07-27 11:50:23 -07:00
Joe Nash	b28bb8cc9c	[AMDGPU] Remove old operand from VOPC DPP For most DPP instructions, the old operand stores the value that was in the current lane before the DPP operation, and is tied to the destination. For VOPC DPP, this is unnecessary and incorrect. There appears to have been a latent bug related to D122737 with SIInstrInfo::isOperandLegal. If you checked if a register operand was legal when the InstructionDesc expected an immediate, it reported that is valid. Its fix is necessary for and tested in this patch. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D130040	2022-07-19 09:35:05 -04:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Jay Foad	e45aa230ad	[AMDGPU] Update LiveVariables after killing an immediate def D114999 added code to kill an immediate def if it was folded into its only use by convertToThreeAddress. This patch updates LiveVariables when that happens in order to fix verification failures exposed by D129213. Differential Revision: https://reviews.llvm.org/D129661	2022-07-14 10:49:41 +01:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Matt Arsenault	d342d130da	AMDGPU: Use isMeta flags on pseudoinstructions	2022-06-29 10:31:29 -04:00
Stanislav Mekhanoshin	21895c6b50	[AMDGPU] Relax verification of soffset in scalar stores It must use m0 only on GFX8. Later chips can use ang SGPR. Differential Revision: https://reviews.llvm.org/D128765	2022-06-28 16:10:08 -07:00
Joe Nash	f1cfaa956d	[AMDGPU] Use GFX11 S_PACK_HL instruction in more cases Differential Revision: https://reviews.llvm.org/D128527	2022-06-28 14:35:19 +01:00
Austin Kerbow	bd9eed3aec	[AMDGPU] Add isMFMA helper function. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127124	2022-06-14 22:01:49 -07:00
Stanislav Mekhanoshin	cb9ae93712	[AMDGPU] Define SGPR_NULL64 register. NFCI. On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen. Differential Revision: https://reviews.llvm.org/D127527	2022-06-13 13:23:33 -07:00
Stanislav Mekhanoshin	0f81830632	[AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR This uses rotating reminder of division by 3 to select another temp vgpr each next time in a sequence of several agpr copies. Therefore, temp vgpr selection depends on the generated agpr number. This number could change with any unrelated change to the register definitions. Stabilize the selection by using a real agpr number. Differential Revision: https://reviews.llvm.org/D127524	2022-06-13 09:39:46 -07:00
Matt Arsenault	0e1c71e4a4	CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function. Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.	2022-06-01 09:45:40 -04:00
Stanislav Mekhanoshin	5df6669d45	[AMDGPU] Enforce alignment of image vaddr on gfx90a Even though single address image instructions only use a single VGPR HW accesses 4 or 5 which creates alignment requirement. Fixes: SWDEV-316648 Differential Revision: https://reviews.llvm.org/D126009	2022-05-24 10:05:39 -07:00
Jay Foad	78ec59e6ae	[AMDGPU] Handle mandatory literals in isOperandLegal Extend SIInstrInfo::isOperandLegal to enforce a limit on the number of literal operands for all VALU instructions, not just VOP3. In particular it now handles VOP2 instructions with a mandatory literal operand like V_FMAAK_F32. Differential Revision: https://reviews.llvm.org/D126064	2022-05-20 16:14:00 +01:00
Jay Foad	5b18ef7256	[AMDGPU] Add verification for mandatory literals Extend the literal operand checking in SIInstrInfo::verifyInstruction to check VOP2 instructions like V_FMAAK_F32 which have a mandatory literal operand. The rule is that src0 can also be a literal, but only if it is the same literal value. AMDGPUAsmParser::validateConstantBusLimitations already handles this correctly. Differential Revision: https://reviews.llvm.org/D126063	2022-05-20 16:14:00 +01:00
Jay Foad	d14f2a6359	[AMDGPU] Allow multiple uses of the same literal in SOP2/SOPC AMDGPUAsmParser::validateSOPLiteral already knew about this but SIInstrInfo::verifyInstruction did not. Differential Revision: https://reviews.llvm.org/D125976	2022-05-19 16:42:20 +01:00
Stanislav Mekhanoshin	dee3190293	[AMDGPU] Add llvm.amdgcn.global.load.lds intrinsic Differential Revision: https://reviews.llvm.org/D125279	2022-05-17 12:35:27 -07:00
Stanislav Mekhanoshin	791ec1c68e	[AMDGPU] Add intrinsics llvm.amdgcn.{raw\|struct}.buffer.load.lds Differential Revision: https://reviews.llvm.org/D124884	2022-05-17 10:32:13 -07:00
Joe Nash	c70259405c	[AMDGPU] gfx11 BUF Instructions Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits. Contributors: Petar Avramovic <Petar.Avramovic@amd.com> Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> Depends on D125316 Patch 6/N for upstreaming of AMDGPU gfx11 architecture. Reviewed By: dp, Petar.Avramovic Differential Revision: https://reviews.llvm.org/D125319	2022-05-16 09:41:40 -04:00
Jay Foad	dfb006c0c9	[AMDGPU] Extract SIInstrInfo::removeModOperands. NFC. Make this an externally callable function for use in a future patch. Differential Revision: https://reviews.llvm.org/D125565	2022-05-16 09:43:41 +01:00
Austin Kerbow	2db700215a	[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers have knowledge of inter-wave runtime behavior which is unknown to the compiler this builtin can be used to tune scheduling. This intrinsic creates a barrier between scheduling regions. The immediate parameter is a mask to determine the types of instructions that should be prevented from crossing the sched_barrier. In this initial patch, there are only two variations. A mask of 0 means that no instructions may be scheduled across the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing instructions may cross the sched_barrier. Note that this intrinsic is only meant to work with the scheduling passes. Any other transformations that may move code will not be impacted in the ways described above. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124700	2022-05-11 13:22:51 -07:00
Joe Nash	18ed279a3a	[AMDGPU] gfx11 subtarget features & early tests Tablegen definitions for subtarget features and cpp predicate functions to access the features. New Sub-TargetProcessors and common latencies. Simple changes to MIR codegen tests which pass on gfx11 because they have the same output as previous subtargets or operate on pseudo instructions which are reused from previous subtargets. Contributors: Jay Foad <jay.foad@amd.com> Petar Avramovic <Petar.Avramovic@amd.com> Patch 4/N for upstreaming of AMDGPU gfx11 architecture Depends on D124538 Reviewed By: Petar.Avramovic, foad Differential Revision: https://reviews.llvm.org/D125261	2022-05-11 10:31:49 -04:00
Ivan Kosarev	88f04bdbd8	[AMDGPU][GFX10] Support base+soffset+offset SMEM loads. Also makes a step towards resolving https://github.com/llvm/llvm-project/issues/38652 Reviewed By: foad, dp Differential Revision: https://reviews.llvm.org/D125117	2022-05-10 16:17:14 +01:00
Jay Foad	879ac41089	[AMDGPU] Fix crash in SIOptimizeExecMaskingPreRA When folding a COPY of exec into another COPY, the call to TII->isOperandLegal would crash because COPYs don't have defined register classes for their operands. Differential Revision: https://reviews.llvm.org/D122737	2022-04-20 14:42:48 +01:00
Matt Arsenault	c528fbf882	AMDGPU: Fix assert if v_mov_b32_dpp is last instruction in the block This can happen if the use instruction is a phi. Fixes issue 49961	2022-04-14 20:21:22 -04:00
hsmahesha	ea47373af4	[AMDGPU][NFC] Organize code around reserving VGPR32 for AGPR copy. This is an NFC patch in preparation to fix a bug related to always reserving VGPR32 for AGPR copy. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D123651	2022-04-14 12:51:33 +05:30
Changpeng Fang	1711020c37	AMDGPU: Use isLiteralConstantLike to check whether the operand could ever be literal Summary: To compute the size of a VALU/SALU instruction, we need to check whether an operand could ever be literal. Previously isLiteralConstant was used, which missed cases like global variables or external symbols. These misses lead to under-estimation of the instruction size and branch offset, and thus incorrectly skip the necessary branch relaxation when the branch offset is actually greater than what the branch bits can hold. In this work, we use isLiteralConstantLike to check the operands. It maybe conservative, but it is safe. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D122778	2022-03-31 08:06:31 -07:00
Carl Ritson	1f52d02ceb	[AMDGPU] Split waterfall loop exec manipulation Split waterfall loops into multiple blocks so that exec mask manipulation (s_and_saveexec) does not occur in the middle of a block. VGPR live range optimizer is updated to handle waterfall loops spanning multiple blocks. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D122200	2022-03-28 17:44:54 +09:00
Thomas Symalla	718aec209c	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Same as D119696 including a buildbot and MIR test fix. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D122332	2022-03-25 11:40:18 +01:00
Stanislav Mekhanoshin	6e3e14f600	[AMDGPU] Support gfx940 smfmac instructions Differential Revision: https://reviews.llvm.org/D122191	2022-03-24 12:40:42 -07:00
hsmahesha	f5b6866d7e	[AMDGPU] Add missing testcase for SGPR to AGPR copy and, also update the function indirectCopyToAGPR() to ensure that it is called only on GFX908 sub-target. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D122286	2022-03-23 21:38:04 +05:30
Stanislav Mekhanoshin	72c1a0d9c2	[AMDGPU] Allow v_accvgpr_write to use SGPR on gfx90a This is undocumented, but it should work. Differential Revision: https://reviews.llvm.org/D122252	2022-03-22 13:52:29 -07:00
Jay Foad	321d3aae7c	[AMDGPU] SIInstrInfo::verifyInstruction tweaks. NFCI. Simplify some for loops. Don't bother checking src2 operand for writelane because it doesn't have one. Check all VALU instructions, not just VOP1/2/3/C/SDWA.	2022-03-21 11:15:55 +00:00
Thomas Symalla	7de6107dce	Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3." This reverts commit `011c64191e` and `e725e2afe0`. Differential Revision: https://reviews.llvm.org/D122117	2022-03-21 09:50:44 +01:00
Thomas Symalla	011c64191e	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Reviewed By: sebastian-ne, critson Differential Revision: https://reviews.llvm.org/D119696	2022-03-21 09:31:59 +01:00
Stanislav Mekhanoshin	522b259976	[AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940 Differential Revision: https://reviews.llvm.org/D121843	2022-03-17 12:12:06 -07:00
Christudasan Devadasan	af717d4aca	[AMDGPU][MachineVerifier] Alignment check for fp32 packed math instructions The fp32 packed math instructions are introduced in gfx90a. If their vector register operands are not properly aligned, the verifier should flag them. Currently, the verifier failed to report it and the compiler ended up emitting a broken assembly. This patch fixes that missed case in TII::verifyInstruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D121794	2022-03-17 08:21:35 +05:30
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00

1 2 3 4 5 ...

682 Commits