llvm-project

Commit Graph

Author	SHA1	Message	Date
Ruiling Song	0404aafbe3	AMDGPU: Factor out hasDivergentBranch(). NFC This is helpful for detecting whether a block ends with divergent branch in passes before lowering the pseudo control flow instructions. Differential Revision: https://reviews.llvm.org/D133184	2022-09-14 13:27:21 +08:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Joe Nash	0483c91eee	[AMDGPU] gfx11 CodeGen for new DPP instructions Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC with DPP. Depends on D128656 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128682	2022-07-05 10:17:59 -04:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Ruiling Song	732eed40fd	[AMDGPU] Mark GFX11 dual source blend export as strict-wqm The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we need to enable all four lanes of that quad to make the shuffling operation before exporting to dual source blend target work correctly. Differential Revision: https://reviews.llvm.org/D127981	2022-06-20 21:58:12 +01:00
Austin Kerbow	bd9eed3aec	[AMDGPU] Add isMFMA helper function. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127124	2022-06-14 22:01:49 -07:00
Matt Arsenault	0e1c71e4a4	CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function. Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.	2022-06-01 09:45:40 -04:00
Joe Nash	ef1ea5ac01	[AMDGPU] gfx11 vinterp instructions MC support A new instruction encoding. Some of these instructions were previously VOP3 encoded. Contributors: Carl Ritson <carl.ritson@amd.com> Patch 11/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125824 Reviewed By: critson Differential Revision: https://reviews.llvm.org/D125989	2022-05-25 14:59:16 -04:00
Stanislav Mekhanoshin	5df6669d45	[AMDGPU] Enforce alignment of image vaddr on gfx90a Even though single address image instructions only use a single VGPR HW accesses 4 or 5 which creates alignment requirement. Fixes: SWDEV-316648 Differential Revision: https://reviews.llvm.org/D126009	2022-05-24 10:05:39 -07:00
Joe Nash	729467acef	[AMDGPU] gfx11 LDSDIR instructions MC support Contributors: Carl Ritson <carl.ritson@amd.com> Patch 8/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125498 Reviewed By: critson, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D125820	2022-05-19 10:08:47 -04:00
Dmitry Preobrazhensky	95a8af2750	[AMDGPU][MC][NFC] MUBUF code cleanup Removed code that is no longer used after https://reviews.llvm.org/D124485. Differential Revision: https://reviews.llvm.org/D125811	2022-05-18 15:00:38 +03:00
Jay Foad	dfb006c0c9	[AMDGPU] Extract SIInstrInfo::removeModOperands. NFC. Make this an externally callable function for use in a future patch. Differential Revision: https://reviews.llvm.org/D125565	2022-05-16 09:43:41 +01:00
Thomas Symalla	718aec209c	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Same as D119696 including a buildbot and MIR test fix. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D122332	2022-03-25 11:40:18 +01:00
Thomas Symalla	7de6107dce	Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3." This reverts commit `011c64191e` and `e725e2afe0`. Differential Revision: https://reviews.llvm.org/D122117	2022-03-21 09:50:44 +01:00
Thomas Symalla	011c64191e	[AMDGPU] Improve v_cmpx usage on GFX10.3. On GFX10.3 targets, the following instruction sequence v_cmp_* SGPR, ... s_and_saveexec ..., SGPR leads to a fairly long stall caused by a VALU write to a SGPR and having the following SALU wait for the SGPR. An equivalent sequence is to save the exec mask manually instead of letting s_and_saveexec do the work and use a v_cmpx instruction instead to do the comparison. This patch modifies the SIOptimizeExecMasking pass as this is the last position where s_and_saveexec instructions are inserted. It does the transformation by trying to find the pattern, extracting the operands and generating the new instruction sequence. It also changes some existing lit tests and introduces a few new tests to show the changed behavior on GFX10.3 targets. Reviewed By: sebastian-ne, critson Differential Revision: https://reviews.llvm.org/D119696	2022-03-21 09:31:59 +01:00
Stanislav Mekhanoshin	36fe3f13a9	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254	2022-03-14 15:23:36 -07:00
Jay Foad	ddd3807e69	[AMDGPU] Use new target MMO flag MONoClobber This allows us to set the noclobber flag on (the MMO of) a load instruction instead of on the pointer. This fixes a bug where noclobber was being applied to all loads from the same pointer, even if some of them were clobbered. Differential Revision: https://reviews.llvm.org/D118775	2022-02-02 17:12:36 +00:00
Stanislav Mekhanoshin	dbf278b984	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844	2022-01-26 14:48:20 -08:00
David Salinas	c0581f7df6	Revert D109159 : Revert "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit `640beb38e7`. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ifc167b3c2dae7a65920676f22a97ba76485f3456 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D116686 Change-Id: I1abf49b74a7e2ba0e0205f747a4154a468b9d7f2	2022-01-11 21:14:09 +00:00
Nico Weber	085f078307	Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."" This reverts commit `859ebca744`. The change contained many unrelated changes and e.g. restored unit test failes for the old lld port.	2022-01-05 13:10:25 -05:00
David Salinas	859ebca744	Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`." This reverts commit `640beb38e7`. That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort). Reverting until we have a better solution to s_cselect_b64 codegen cleanup Change-Id: Ibf8e397df94001f248fba609f072088a46abae08 Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D115960 Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105	2022-01-05 17:57:32 +00:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Jay Foad	3f34f75a68	[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32 As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as implicit pseudo operands. Fix this by setting the correct latency from the SchedModel after addPhysRegDataDeps wrongly set it to 0. Differential Revision: https://reviews.llvm.org/D112317	2021-10-22 20:03:29 +01:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Michael Liao	640beb38e7	[amdgpu] Enable selection of `s_cselect_b64`. Differential Revision: https://reviews.llvm.org/D109159	2021-09-07 10:45:07 -04:00
Stanislav Mekhanoshin	2cfda6a691	[AMDGPU] Fold immediates in the optimizeCompareInstr Peephole works before the first SIFoldOperands so most of the immediates are in registers. Differential Revision: https://reviews.llvm.org/D109186	2021-09-02 17:23:26 -07:00
Stanislav Mekhanoshin	bf77b11277	[AMDGPU] Introduce optimizeCompareInstr The following patterns are currently handled: s_cmp_eq_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_eq_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_eq_u64 (s_and_b64 $src, 1), 1 => s_and_b64 $src, 1 s_cmp_ge_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_ge_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1 s_cmp_lg_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 s_cmp_lg_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 s_cmp_lg_u64 (s_and_b64 $src, 1), 0 => s_and_b64 $src, 1 s_cmp_gt_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 s_cmp_gt_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1 Differential Revision: https://reviews.llvm.org/D109031	2021-09-01 15:57:05 -07:00
alex-t	ed0f4415f0	[AMDGPU] Divergence-driven compare operations instruction selection Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D106079	2021-08-25 18:30:49 +03:00
Michael Liao	b0402a35fc	[amdgpu] Add 64-bit PC support when expanding unconditional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106445	2021-07-26 14:50:30 -04:00
Stanislav Mekhanoshin	76b7d3432e	[AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization Any def of EXEC prevents rematerialization of any VOP instruction because of the physreg use. Create a callback to check if the physreg use can be ingored to allow rematerialization. Differential Revision: https://reviews.llvm.org/D105836	2021-07-14 13:03:58 -07:00
Brendon Cahoon	3f7b7e7393	[AMDGPU] Update SCC defs to VCC when uses are changed to VCC The FixSGPRCopies pass converts instructions to VALU when removing illegal VGPR to SGPR copies. Instructions that use SCC are changed to use VCC instead. When that happens, the pass must also change instructions that define SCC to define VCC. The pass was not changing the SCC definition when an ADDC is converted due to a input that is a VGPR to SGPR copy. But, the initial ADD insruction, which define SCC, is not converted. This causes a compilation failure due to a use of an undefined physical register. This patch adds code that inserts the SCC definition in the MoveToVALU worklist when a SCC use is converted to a VCC use. Differential Revision: https://reviews.llvm.org/D102111	2021-05-14 18:05:05 -04:00
Stanislav Mekhanoshin	4d6ebe8ac0	[AMDGPU] Change FLAT Scratch SADDR to VADDR form in moveToVALU Extend the legalization of global SADDR loads and stores with changing to VADDR to the FLAT scratch instructions. Differential Revision: https://reviews.llvm.org/D101408	2021-05-03 10:57:14 -07:00
Stanislav Mekhanoshin	89a94be16b	[AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU Instead of legalizing saddr operand with a readfirstlane when address is moved from SGPR to VGPR we can just change the opcode. Differential Revision: https://reviews.llvm.org/D101405	2021-05-03 10:36:26 -07:00
Sebastian Neubauer	cc7add5298	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
Sebastian Neubauer	36138db116	[AMDGPU] IsFlatScratch/Global -> FlatScratch/Global Remove 'Is' from IsFlatScratch/Global. NFC Differential Revision: https://reviews.llvm.org/D100108	2021-04-09 11:20:31 +02:00
Jay Foad	3ad5216ed8	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin	5cf9292ce3	[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn We are using AtomicNoRet map in multiple places to determine if an instruction atomic, rtn or nortn atomic. This method does not work always since we have some instructions which only has rtn or nortn version. One such instruction is ds_wrxchg_rtn_b32 which does not have nortn version. This has caused changes in memory legalizer tests. Differential Revision: https://reviews.llvm.org/D96639	2021-02-15 11:27:59 -08:00
Sebastian Neubauer	8214982b50	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Relands `ba7dcd8542`, which had memory leaks. Differential Revision: https://reviews.llvm.org/D95215	2021-01-22 11:24:08 +01:00
Sebastian Neubauer	4dbdff66fe	Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue" This reverts commit `ba7dcd8542`. (caused memory leaks)	2021-01-21 18:11:48 +01:00
Sebastian Neubauer	ba7dcd8542	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Differential Revision: https://reviews.llvm.org/D94768	2021-01-21 16:32:17 +01:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Stanislav Mekhanoshin	ae8f4b2178	[AMDGPU] Folding of FI operand with flat scratch Differential Revision: https://reviews.llvm.org/D93501	2020-12-22 10:48:04 -08:00
Sebastian Neubauer	91445979be	[AMDGPU] Unify flat offset logic Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into AMDGPUBaseInfo. Differential Revision: https://reviews.llvm.org/D93287	2020-12-15 14:59:59 +01:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Jay Foad	4926eed59c	[AMDGPU] Add a TRANS bit to TSFlags. NFC. This is used to mark transcendental instructions that execute on a separate pipeline from the normal VALU pipeline. Differential Revision: https://reviews.llvm.org/D92042	2020-11-24 17:49:56 +00:00
Matt Arsenault	e722943e05	AMDGPU: Factor out large flat offset splitting	2020-11-13 11:22:13 -05:00
Sebastian Neubauer	31a0b2834f	[AMDGPU] Fix iterating in SIFixSGPRCopies The insertion of waterfall loops splits the current basic block into three blocks. So the basic block that we iterate over must be updated. This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for divergent calls in branches before. Differential Revision: https://reviews.llvm.org/D90596	2020-11-04 18:43:19 +01:00

1 2 3 4 5

250 Commits