llvm-project

Commit Graph

Author	SHA1	Message	Date
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Kazu Hirata	2ca45adf24	[CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC)	2021-11-11 22:28:55 -08:00
kpyzhov	c9690092c8	[AMDGPU] Small correction in SITargetLowering::performOrCombine(). Differential Revision: https://reviews.llvm.org/D113203	2021-11-10 21:07:27 -05:00
Stanislav Mekhanoshin	476ab0f809	[AMDGPU] Fixed stack pointer init with architected flat scratch Even if wave offset is not present we still need to do the rest of the initialization. The mov into s32 was missing in the kernels. Fixes: SWDEV-310935 Differential Revision: https://reviews.llvm.org/D113628	2021-11-10 17:18:38 -08:00
Matt Arsenault	c7a0c2d0f7	AMDGPU: Report large stack usage for recursive calls We were previously setting an ignored bit in the kernel headers. The current behavior is to add the large amount on top of the statically known size of a single stack frame. I'm not sure if we should just use the large size as the entire reported size instead.	2021-11-10 20:02:01 -05:00
Matt Arsenault	90ff148719	AMDGPU: Account for implicit argument alignment for kernarg segment If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI. The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters.	2021-11-09 17:48:37 -05:00
Michael Liao	bf225939bc	[InferAddressSpaces] Support assumed addrspaces from addrspace predicates. - CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC. - This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g., ``` foo(float *p) { __builtin_assume(__isGlobal(p)); // From there, we could assume p is a global pointer instead of a // generic one. } ``` This makes the code portable without introducing the implementation-specific features. Note that NVCC starts to support __builtin_assume from version 11. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112041	2021-11-08 16:51:57 -05:00
Joe Nash	79f52af4cd	[AMDGPU] Make getInstSizeInBytes more generic NFC. This check mainly handles size affecting literals. Make it check all explicit operands instead of a few by name. Also make the isLiteral check handle the KIMM operands, see https://reviews.llvm.org/D111067 Change-Id: I1a362d55b2a10f5c74d445272e8b7829a8b77597 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D113318 Change-Id: Ie6c688f30a71e0335d1c6dd1ff65019bd7ce684e	2021-11-08 10:34:49 -05:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
skc7	a0633f5ccb	[AMDGPU] Test Commit. NFC Reviewed By: hsmhsm Differential Revision: https://reviews.llvm.org/D113379	2021-11-08 07:09:09 +00:00
Kazu Hirata	aee86f9b6c	[AMDGPU] Remove unused declaration selectSMRD (NFC) The function body proper was removed on Feb 20, 2019 in commit `79b5c3842b`.	2021-11-07 09:53:18 -08:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Kazu Hirata	e4bab21848	[AMDGPU] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-06 19:31:20 -07:00
Kazu Hirata	14d656b3d8	[Target] Use llvm::reverse (NFC)	2021-11-06 13:08:21 -07:00
Kazu Hirata	2c4ba3e9d3	[Target] Use make_early_inc_range (NFC)	2021-11-05 09:14:32 -07:00
Jay Foad	c93bf53a3e	[AMDGPU] NFC formatting fixes in SIMemoryLegalizer	2021-11-05 09:10:24 +00:00
Thomas Symalla	76cbe62262	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637	2021-11-04 21:50:18 +01:00
RamNalamothu	539f500e78	[AMDGPU] Do not add debug locations to the code inside prologue There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code inside prologue as a side effect of https://reviews.llvm.org/D99269 because buildSpillLoadStore() is using source location of the real instruction in the basic block if any. Fixes: SWDEV-307590 Reviewed By: scott.linder, sebastian-ne Differential Revision: https://reviews.llvm.org/D113100	2021-11-04 08:02:41 +05:30
alex-t	0a3d755ee9	[AMDGPU] Enable divergence-driven BFE selection Detailed description: This change enables the bit field extract patterns selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node divergence. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110950	2021-11-03 23:26:59 +03:00
Kazu Hirata	4bef0304e1	[AArch64, AMDGPU] Use make_early_inc_range (NFC)	2021-11-03 09:22:51 -07:00
Abinav Puthan Purayil	fbe61fb0aa	[AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation. The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to optimize AGPR to AGPR copy but it missed checking for the SGPR clobbering for the S_MOV_B64_IMM_PSEUDO generation. Differential Revision: https://reviews.llvm.org/D113005	2021-11-03 09:09:24 +05:30
Jay Foad	be1a8f8834	[AMDGPU] Really preserve LiveVariables in SILowerControlFlow https://bugs.llvm.org/show_bug.cgi?id=52204 Differential Revision: https://reviews.llvm.org/D112731	2021-11-02 15:03:37 +00:00
Martin Liska	c5029023fb	Fix building with GCC 12: Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380 Differential Revision: https://reviews.llvm.org/D112990	2021-11-02 14:28:00 +01:00
Jay Foad	7afef22926	[AMDGPU] Use MachineInstrBuilder::addReg. NFC.	2021-11-01 15:29:51 +00:00
Jay Foad	2b548b18c1	[AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32 Differential Revision: https://reviews.llvm.org/D112917	2021-11-01 13:55:53 +00:00
Kazu Hirata	72710af233	[CodeGen, Target] Use MachineBasicBlock::terminators (NFC)	2021-10-31 07:57:34 -07:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
Stanislav Mekhanoshin	e5340ed30c	[AMDGPU] Fix global isel for kernels using agprs on gfx90a With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case. Differential Revision: https://reviews.llvm.org/D112644	2021-10-29 14:23:14 -07:00
Jay Foad	21a1d4cf71	[AMDGPU] Change numBitsSigned for simplicity and document it. NFC. Change numBitsSigned to return the minimum size of a signed integer that can hold the value. This is different by one from the previous result but is more consistent with numBitsUnsigned. Update all callers. All callers are now more consistent between the signed and unsigned cases, and some callers get simpler, especially the ones that deal with quantities like numBitsSigned(LHS) + numBitsSigned(RHS). Differential Revision: https://reviews.llvm.org/D112813	2021-10-29 14:22:06 +01:00
Vang Thao	52b43d1549	[AMDGPU] Fix cvt_f32_ubyte combine with shl Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112733	2021-10-28 21:43:06 -07:00
Kazu Hirata	01b4789b62	[AMDGPU] Remove hasDefinedInitializer (NFC) The last use was removed on Sep 16, 2021 in commit `7a62a5b56d`.	2021-10-28 20:33:34 -07:00
Kazu Hirata	dd5d46b009	[AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC) This field seems to be unused for at least one year.	2021-10-28 20:33:32 -07:00
Kazu Hirata	309357c01a	[AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC)	2021-10-28 20:33:30 -07:00
Abinav Puthan Purayil	2da6ef3664	[AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine. mul24 intrinsic's operands are simplified by AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds the mul24hi intrinsics in the combine since its operands can be simplified like that of the mul24 intrinsics. Differential Revision: https://reviews.llvm.org/D112702	2021-10-28 16:57:48 +05:30
Sebastian Neubauer	fd1cfc9094	[AMDGPU][GlobalISel] Fix waterfall loops - Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks. Differential Revision: https://reviews.llvm.org/D109052	2021-10-28 10:30:55 +02:00
Kazu Hirata	cee3419d65	[AMDGPU] Remove unused declaration findNumUsedRegistersSI (NFC)	2021-10-27 21:24:02 -07:00
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Austin Kerbow	02e60f2e77	[AMDGPU] Use max waves for scheduler's initial occupancy target The scheduler should set critical/excess register usage thresholds that are guided by the maximum possible occupancy for the function. This change is focused on setting proper lower bounds on register usage which we would typically only see when a specific number of maximum waves is requested with the "waves-per-eu" attribute, or by setting "amdgpu-num-vgpr\|sgpr" directly. This was broken previously. I have a follow-on patch that will address issues with the scheduler not targeting correct upper bounds on register usage which is typical with launch bounds and min "waves-per-eu". Changes by this patch: Set the initial critical register usage thresholds to minimum values that are determined by the maximum possible occupancy for the function, or the number of allocatable registers, whichever is lower. Avoid unisgned overflow if register limits are lower than the register tracking "ErrorMargin", I.e. when using stress-regalloc=2. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112373	2021-10-26 15:30:26 -07:00
Neubauer, Sebastian	eb16570ab0	[AMDGPU] Remove unused CSR defs CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used anywhere, so remove them. Differential Revision: https://reviews.llvm.org/D112535	2021-10-26 16:01:49 +02:00
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Abinav Puthan Purayil	781dd39b7b	[AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare. We were bailing out of creating 24-bit muls for results wider than 32 bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul correctly. Differential Revision: https://reviews.llvm.org/D112395	2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil	9bd5cfeb1f	[AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics. These intrinsics maps to the 24-bit v_mul_hi instructions. This change also fixes an incorrect assumption on the associativity of 24-bit mulhi in its SDNode record in tblgen. Differential Revision: https://reviews.llvm.org/D112394	2021-10-26 18:53:07 +05:30
Neubauer, Sebastian	487f15603e	[AMDGPU] Fix setcc combine for i128 The combine asserted if constants could not be represented as uint64_t. Use APInts to fix this. Differential Revision: https://reviews.llvm.org/D112416	2021-10-26 13:39:50 +02:00
Jay Foad	c8e5aef1a0	[AMDGPU] Use standard MachineBasicBlock::getFallThrough method. NFCI. Differential Revision: https://reviews.llvm.org/D101825	2021-10-26 12:07:54 +01:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Matt Arsenault	ec57b37551	AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size This can merge the acceptable ranges based on the call graph, rather than the simple application of the attribute. Remove the handling from the old pass.	2021-10-22 16:23:50 -04:00
Matt Arsenault	8d4b74ac3f	AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set It should be semantically identical if it was set to the same value as the default. Also improve the documentation.	2021-10-22 16:23:50 -04:00
Jay Foad	58e7ec471c	[AMDGPU] Run SIShrinkInstructions before post-RA scheduling Run post-RA SIShrinkInstructions just before post-RA scheduling, instead of afterwards. After the fixes in D112305 and D112317 this seems to make no difference, but it paves the way for scheduler tweaks that are sensitive to the e32 vs e64 encoding of VALU instructions. Differential Revision: https://reviews.llvm.org/D112341	2021-10-22 20:24:03 +01:00
Jay Foad	3f34f75a68	[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32 As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as implicit pseudo operands. Fix this by setting the correct latency from the SchedModel after addPhysRegDataDeps wrongly set it to 0. Differential Revision: https://reviews.llvm.org/D112317	2021-10-22 20:03:29 +01:00
Kazu Hirata	6fe949c4ed	[Target, Transforms] Use StringRef::contains (NFC)	2021-10-22 08:52:33 -07:00

1 2 3 4 5 ...

6384 Commits