llvm-project

Commit Graph

Author	SHA1	Message	Date
Sebastian Neubauer	b76c2a6c2b	[AMDGPU] Fix saving fp and bp Spilling the fp or bp to scratch could overwrite VGPRs of inactive lanes. Fix that by using only the active lanes of the scavenged VGPR. This builds on the assumptions that 1. a function is never called with exec=0 2. lanes do not die in a function, i.e. exec!=0 in the function epilog 3. no new lanes are active when exiting the function, i.e. exec in the epilog is a subset of exec in the prolog. Differential Revision: https://reviews.llvm.org/D96869	2021-04-12 11:52:55 +02:00
Sebastian Neubauer	32bc9a9bc3	[AMDGPU] Unify spill code Instead of reimplementing spilling in prolog and epilog, reuse buildSpillLoadStore. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D99269	2021-04-12 11:19:08 +02:00
Sebastian Neubauer	f9a8c6a0e5	[AMDGPU] Save VGPR of whole wave when spilling Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead. The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR? If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes) Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains. Differential Revision: https://reviews.llvm.org/D96336	2021-04-12 11:01:38 +02:00
Sebastian Neubauer	cc7add5298	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
Sebastian Neubauer	c10cc4ea27	[AMDGPU] Fix computing live registers in prolog ScratchExecCopy needs to be marked as live, we cannot use that register while EXEC is stored in there. Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as available is unnecessary, they should not be live at that point anway. Differential Revision: https://reviews.llvm.org/D100098	2021-04-08 14:52:50 +02:00
Sebastian Neubauer	2dc6be5209	[AMDGPU] Update SGPRSpillVGPRCSR name. NFC The struct is used for both, callee and caller-save registers now. The frame index is not set for entrypoints, as we do not need to save the registers then. Update the struct name to reflect that. Differential Revision: https://reviews.llvm.org/D99722	2021-04-07 16:30:40 +02:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
Sebastian Neubauer	1c3b74f0ab	[AMDGPU] Remove outdated TODOs. NFC spillSGPRToVGPR is already respected in these places since D95768. Differential Revision: https://reviews.llvm.org/D99570	2021-03-30 15:18:49 +02:00
RamNalamothu	43f2d269b3	[AMDGPU, NFC] Refactor FP/BP spill index code in emitPrologue/emitEpilogue Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D98617	2021-03-16 19:19:45 +05:30
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Sebastian Neubauer	6c59dc474d	[AMDGPU] Save all lanes for reserved VGPRs When SGPRs are spilled to VGPRs, they can overwrite any lane. We need to preserve the value of inactive lanes in function calls, so we save the register even if it is marked as caller saved. Also, teach buildPrologSpill to work when no registers are free like in CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir and update the comment on findScratchNonCalleeSaveRegister as it is not used anymore to realign the stack pointer since D95865. Differential Revision: https://reviews.llvm.org/D95946	2021-02-04 09:56:36 +01:00
Sebastian Neubauer	8b898b19a8	[AMDGPU] Remove unused tmp register The temporary register is only used to compute the frame pointer. The frame pointer is overwritten and not used in between, so we can reuse the frame pointer for the computation, saving one register. Differential Revision: https://reviews.llvm.org/D95865	2021-02-02 17:17:54 +01:00
Sebastian Neubauer	6b6ae583cf	[AMDGPU] Save fp/bp after csr saves Saving callee-save registers happens in whole wave mode. Exec is saved to a free register, which can be reused to save the frame pointer. Therefore, saving the fp needs to happen after saving csrs. Differential Revision: https://reviews.llvm.org/D95861	2021-02-02 17:17:54 +01:00
Sebastian Neubauer	b91afa474e	[AMDGPU] Mark epilog restores as frame-destroy I guess instructions were marked as frame-setup by accident, they are restores as part of the epilog. Differential Revision: https://reviews.llvm.org/D95783	2021-02-02 10:24:37 +01:00
Austin Kerbow	e068e236c3	[AMDGPU] Fix release build after `0397dca0`.	2021-02-01 08:55:14 -08:00
Austin Kerbow	0397dca021	[AMDGPU] Fix crash with sgpr spills to vgpr disabled This would assert with amdgpu-spill-sgpr-to-vgpr disabled when trying to spill the FP. Fixes: SWDEV-262704 Reviewed By: RamNalamothu Differential Revision: https://reviews.llvm.org/D95768	2021-02-01 08:35:25 -08:00
Matt Arsenault	5f9707b796	AMDGPU: Fix redundant FP spilling/assert in some functions If a function has stack objects, and a call, we require an FP. If we did not initially have any stack objects, and only introduced them during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills would end up spilling the FP register as if it were a normal register. This would result in an assert in a debug build, or redundant handling of the FP register in a release build. Try to predict that we will have an FP later, although this is ugly.	2021-01-26 13:01:45 -05:00
Matt Arsenault	92d1195b5f	AMDGPU: Add assertion to determineCalleeSaves Make sure this isn't getting called multiple times. I was surprised we were modifying the function here, which I think is a bit questionable.	2021-01-26 13:01:45 -05:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Hsiangkai Wang	914e2f5a02	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sebastian Neubauer	7a18bdb350	[AMDGPU] Implement flat scratch init for pal Extract the scratch offset from the scratch buffer descriptor that is stored in the global table. Differential Revision: https://reviews.llvm.org/D91701	2020-11-20 11:14:30 +01:00
Stanislav Mekhanoshin	5ab1702129	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin	d5a465866e	[AMDGPU] Omit buffer resource with flat scratch. Differential Revision: https://reviews.llvm.org/D90979	2020-11-09 08:05:20 -08:00
Sander de Smalen	d57bba7cf8	[SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference. To accommodate frame layouts that have both fixed and scalable objects on the stack, describing a stack location or offset using a pointer + uint64_t is not sufficient. For this reason, we've introduced the StackOffset class, which models both the fixed- and scalable sized offsets. The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset, so that this can be used in other interfaces, such as to eliminate frame indices in PEI or to emit Debug locations for variables on the stack. This patch is purely mechanical and doesn't change the behaviour of how the result of this function is used for fixed-sized offsets. The patch adds various checks to assert that the offset has no scalable component, as frame offsets with a scalable component are not yet supported in various places. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90018	2020-11-05 11:02:18 +00:00
Jay Foad	58de4b2053	[AMDGPU] Use pseudo instructions for readlane/writelane This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2". All the codegen changes are caused by the post-RA scheduler no longer treating readlane/writelane as scheduling barriers due to having unmodelled side effects. (The pseudos are hasSideEffects = 0, but the real instructions are hasSideEffects = ? which TableGen conservatively treats as 1.) Differential Revision: https://reviews.llvm.org/D90401	2020-10-29 16:00:53 +00:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Sebastian Neubauer	f53b43c00a	[AMDGPU] Use isLegalMUBUFImmOffset more Instead of hardcoding isUInt<12>. Differential Revision: https://reviews.llvm.org/D88961	2020-10-08 14:31:44 +02:00
vnalamot	aff94ec0f4	[AMDGPU] Remove the dead spill slots while spilling FP/BP to memory During the PEI pass, the dead TargetStackID::SGPRSpill spill slots are not being removed while spilling the FP/BP to memory. Fixes: SWDEV-250393 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87032	2020-09-06 07:04:25 +05:30
vnalamot	54d8ded4b1	allSGPRSpillsAreDead() should use actual FP/BP frame indices The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead() make sure there are no SGPR spills pending during PEI. But the FP/BP spills happen during PEI and are exceptions. Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to accommodate the exceptions. Differential Revision: https://reviews.llvm.org/D86291	2020-08-20 16:15:53 -04:00
Austin Kerbow	7d1cb187fb	[AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded If we need a scratch register for the spill don't use the same scratch register that is being used for the MBUF offset. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85772	2020-08-13 14:12:00 -07:00
Matt Arsenault	ec8c172d01	AMDGPU: Correct prolog SP initialization logic Having callees that will read SP is not the only reason we need to reference the stack pointer.	2020-08-05 15:47:53 -04:00
madhur13490	4a577c3a22	[AMDGPU] Fix incorrect arch assert while setting up FlatScratchInit Reviewers: arsenm, foad, rampitec, scott.linder Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84391	2020-07-24 18:19:04 +00:00
Guillaume Chatelet	28de229bc6	[Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82894	2020-07-01 07:28:11 +00:00
vnalamot	2e28009981	[NFC] Move getAll{S,V}GPR{32,128} methods to SIFrameLowering Summary: Future patch needs some of these in multiple places. The definitions of these can't be in the header and be eligible for inlining without making the full declaration of GCNSubtarget visible. I'm not sure what the right trade-off is, but I opted to not bloat SIRegisterInfo.h Reviewers: arsenm, cdevadas Reviewed By: arsenm Subscribers: RamNalamothu, qcolombet, jvesely, wdng, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79878	2020-06-17 12:08:09 -04:00
Christudasan Devadasan	7c4e711ef8	[AMDGPU] Enable base pointer. When the callee requires a dynamic stack realignment, it is not possible to correcty access the incoming stack arguments using the stack pointer. We reserve a base pointer in such cases to access the function arguments inside the callee. The base pointer will hold the incoming stack pointer value before any kind of delta added to it. Reviewed By: arsenm, scott.linder Differential Revision: https://reviews.llvm.org/D78811	2020-05-17 16:13:55 +05:30
Eric Christopher	59a299cbb3	Fix a release+noasserts werror for unused variable.	2020-05-11 20:03:23 -07:00
Austin Kerbow	09253b608a	[AMDGPU] Allow spilling FP to memory If there are no available lanes in a reserved VGPR, no free SGPR, and no unused CSR VGPR when trying to save the FP it needs to be spilled to memory as a last resort. This can be done in the prolog/epilog if we manually add the spill and manage exec. Differential Revision: https://reviews.llvm.org/D79610	2020-05-11 16:42:59 -07:00
Ram Nalamothu	f7060f4f88	For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers clobber preloaded Scratch Wave Offset register, copy the Scratch Wave Offset register to a free SGPR.	2020-05-06 10:31:15 -04:00
Christudasan Devadasan	207cd5f68f	[AMDGPU] Add the SGPR used for FP copy to block livein lists. The temporary register used for FP copy should be live throughout the function.	2020-04-24 11:47:38 +05:30
Matt Arsenault	7dece2fde3	AMDGPU: Use Register	2020-04-21 15:19:35 -04:00
Matt Arsenault	2481f26ac3	CodeGen: Use Register in TargetFrameLowering	2020-04-07 17:07:44 -04:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Guillaume Chatelet	b727aabcb8	[Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76613	2020-03-26 18:15:53 +00:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Scott Linder	30bb113beb	[AMDGPU][NFC] Refactor emitEntryFunctionPrologue Remove dead code and factor repeated conditions out into a single check. Rename and move code to make it more obvious what is running only for entry functions. Simplify function arguments to make it clearer what the relevant inputs are. Make flat scratch init accept an MBB iterator and move it to where it was logically being emitted within the prologue. These changes will make a future update to the calling convention simpler. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75092	2020-03-19 15:35:16 -04:00
Guillaume Chatelet	d000655a8c	[Alignment][NFC] Deprecate getMaxAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76348	2020-03-18 14:48:45 +01:00
Matt Arsenault	0426c2d07d	Reapply "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `6a4acb9d80`.	2020-01-31 06:01:28 -08:00
Matt Arsenault	6a4acb9d80	Revert "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `17dbc6611d`. A test is failing on some bots	2020-01-30 15:39:51 -08:00

1 2 3

137 Commits