llvm-project

Commit Graph

Author	SHA1	Message	Date
Brendon Cahoon	d45a247998	[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info Removing dead frame indices for VGPR to AGPR spills is incorrect when the frame index is shared by multiple objects, which may occur due to stack slot coloring. The problem is that subsequent code that processes the other object will assert because the stack frame index is marked dead. Removing dead frame indices is needed prior to stack slot coloring, which is what happens with SGPR to VGPR spills. These spills are lowered prior to stack slot coloring, but the VGPR to AGPR spills are processed afterwards during the Prolog/Epilog Inserter pass. This patch marks the VGPR to AGPR spill slot as dead if the slot is not used by another object. Differential Revision: https://reviews.llvm.org/D115996	2021-12-23 11:09:19 -06:00
Petar Avramovic	fd3cde600b	AMDGPU/GlobalISel: Fix attempt to select non-legal instr in mir test Delete inst-select-insert.xfail.mir. G_INSERT instructions in inst-select-insert.xfail.mir are no longer legal after D114198. This breaks build bots, since builds with LLVM_ENABLE_ASSERTIONS=Off don't check for legality and report cannot select while build with LLVM_ENABLE_ASSERTIONS=On reports instruction is not legal.	2021-12-23 16:14:33 +01:00
Petar Avramovic	29f88b93fd	[GlobalISel] Rework more/fewer elements for vectors Artifact combiner is not able to access individual elements after using LCMTy style merge/unmerge, extract and insert to change vector number of elements (pad with undef or split to sub-vector instructions). Use unmerge to individual elements instead and then merge elements into requested types. Change argument lowering for vectors and moreElementsVector to use buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements. FewerElementsVector had a few helpers that had different behavior, introduce new helper for most of the opcodes. FewerElementsVector helper is more flexible since it can create leftover instruction smaller then requested type (useful in case target wants to avoid pad with undef and use fewer registers). If target does not want leftover of different type it should call more elements first. Some helpers were performing more elements first to have split without leftover. Opcodes that used this helper use clampMaxNumElementsStrict (does more elements first) in LegalizerInfo to avoid test changes. Fixes failures caused by failing to combine artifacts created during more/fewer elements vector. Differential Revision: https://reviews.llvm.org/D114198	2021-12-23 14:30:02 +01:00
Petar Avramovic	d2863088ab	GlobalISel: Regen vector mir tests, add tests for vector arg lowering Precommit for D114198 (Rework more/fewer elements for vectors). Regenerate auto-generated mir tests for vectors (use CHECK-NEXT instead of CHECK). Remove -global-isel-abort=0 where it is no longer needed. Add mir tests for different AMDGPU sub-targets and they way they lower function vector arguments (tests for legalization artifact combiner).	2021-12-23 14:30:01 +01:00
alex-t	e4103c91f8	[AMDGPU] Select build_vector DAG nodes according to the divergence This change enables divergence-driven instruction selection for the build_vector DAG nodes. It also enables packed i16 instructions for GFX9. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116187	2021-12-23 02:27:12 +03:00
Ron Lieberman	09b53296cf	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit `9075009d1f`. Failed amdgpu runtime buildbot # 3514	2021-12-22 11:39:28 -05:00
RamNalamothu	9075009d1f	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652	2021-12-22 20:51:12 +05:30
Matt Arsenault	01d97dfde1	AMDGPU/GlobalISel: Regenerate test checks	2021-12-21 10:57:46 -05:00
Jay Foad	17006033f9	[GlobalISel] Verify operand types for G_SHL, G_LSHR, G_ASHR Differential Revision: https://reviews.llvm.org/D115868	2021-12-21 11:59:33 +00:00
Matt Arsenault	c222972442	AMDGPU/GlobalISel: Stop using NarrowScalar/FewerElements for unaligned splitting These actions should only be used for adjusting the register types (and the memory type as needed to satisfy the register type). Unaligned accesses should be split as a type of lowering. This has the effect of improving the code in many cases since now we produce zextloads instead of separate loads with ands. The load/store legality rules still seem far more complicated than necessary though.	2021-12-20 18:07:11 -05:00
alex-t	19727e31fb	[AMDGPU] Enable divergence predicates for ctlz/cttz ctlz/cttz get lowered to the set of target opcodes This change enables the ISel to select SALU or VALU form according to the SDNode divergence. CTLZ - S_FLBIT_I32_B32 if uniform and V_FFBH_U32_e64 if divergent CTTZ - S_FF1_I32_B32 if uniform and V_FFBL_B32_e64 if divergent Also @llvm.amdgcn.sffbh.i32 gets lowered to S_FLBIT_I32 if uniform and V_FFBH_I32_e64 if divergent NOTE: 64bit versions S_FF1_I32_B64 and S_FLBIT_I32_B64 are not currently supported by the DAG ISel. ctlz/cttz with i64 input are split into two 32bit instructions. Nevertheless, they already have the patterns and were equipped with the divergence predicates to make sure they will be selected correctly when enabled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116044	2021-12-20 20:53:48 +03:00
alex-t	98d09705e1	[AMDGPU] Re-enabling divergence predicates for min/max This patch enables divergence predicates for min/max nodes. It makes ISD::MIN/MAX selected to S_MIN_I(U)32/S_MAX_I(U)32 or V_MIN_I(U)32_e64/V_MAX_I(U)32_e64 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115954	2021-12-20 16:10:55 +03:00
alex-t	1448aa9dbd	[AMDGPU] Expand not pattern according to the XOR node divergence The "not" is defined as XOR $src -1. We need to transform this pattern to either S_NOT_B32 or V_NOT_B32_e32 dependent on the "xor" node divergence. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D115884	2021-12-20 14:41:38 +03:00
Matt Arsenault	37a203f63e	AMDGPU: Regenerate more mir test checks with -NEXT	2021-12-18 11:38:30 -05:00
Matt Arsenault	591371f7df	AMDGPU: Regenerate some mir test checks with -NEXT	2021-12-18 10:46:15 -05:00
rkorsa	c680fb69d6	[AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16 bit on The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions. 'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk. The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand. Differential Revision: https://reviews.llvm.org/D111754	2021-12-17 16:11:51 +05:30
Mircea Trofin	09103807e7	[NFC][regalloc] Introduce the RegAllocEvictionAdvisorAnalysis This patch introduces the eviction analysis and the eviction advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Differential Revision: https://reviews.llvm.org/D115707	2021-12-16 17:56:46 -08:00
Ron Lieberman	f4420f5224	Revert "AMDGPU: Update pass pipeline test" needed to match revert of rG2b4876157562: AMDGPU: Remove AMDGPUFixFunctionBitcasts pass This reverts commit `7ca355225d`.	2021-12-16 21:39:04 +00:00
Ron Lieberman	8a85be807b	Revert "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" Offload abort in Nekbone This reverts commit `2b48761575`.	2021-12-16 21:21:32 +00:00
Jay Foad	cce93b3397	[MachineVerifier] Undef subreg operands do not require subranges D112556 added verification that the live interval for a subreg operand must have subranges. This patch fixes a corner case, where if all subreg operands for a particular register are undef uses then no subranges are required. This matches how LiveIntervalCalc would build the live intervals in the first place, since an undef use is not considered to read the register. Before this patch, CodeGen/AMDGPU/no-remat-indirect-mov.mir would fail with -early-live-intervals: # After Live Interval Analysis ... * Bad machine code: Live interval for subreg operand has no subranges * - function: index_vgpr_waterfall_loop - basic block: %bb.1 (0x6a9a968) [352B;496B) - instruction: 432B %24:vgpr_32 = V_MOV_B32_e32 undef %18.sub0:vreg_512, implicit $exec, implicit %18:vreg_512, implicit $m0 - operand 1: undef %18.sub0:vreg_512 Differential Revision: https://reviews.llvm.org/D115360	2021-12-16 09:49:27 +00:00
Matt Arsenault	7ca355225d	AMDGPU: Update pass pipeline test	2021-12-15 18:37:18 -05:00
Matt Arsenault	f0cc43cc91	AMDGPU: Use v_accvgpr_mov_b32 when copying AGPR tuples on gfx90a This is an optimization, but also fixes a compile failure when no free VGPRs are available. The problem still exists for gfx908 where a scratch register is still required. This also still exists for the SGPR to AGPR case.	2021-12-15 18:20:49 -05:00
Matt Arsenault	20a6cbd220	AMDGPU: Regenerate checks	2021-12-15 18:20:49 -05:00
Matt Arsenault	2b48761575	AMDGPU: Remove AMDGPUFixFunctionBitcasts pass This was a workaround for not supporting indirect calls when instcombine didn't eliminate constant expression casts of the callee at -O0. Indirect calls are supposed to work now, so drop the hack.	2021-12-15 18:20:48 -05:00
Jay Foad	54fc9eb9b3	[AMDGPU] Use v_fma_f16 on GFX10 Teach convertToThreeAddress to use the V_FMA_F16_gfx9 pseudo (i.e. the standard instruction in GFX9 onwards) instead of V_FMA_F16 (the legacy pseudo for GFX8 compatibility, which is no longer supported in GFX10). This follows the example of macToMad in SIFoldOperands. Differential Revision: https://reviews.llvm.org/D115731	2021-12-15 13:14:48 +00:00
Jay Foad	4db7422771	[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the same zeroing behaviour as on GFX8. Differential Revision: https://reviews.llvm.org/D115729	2021-12-15 13:14:48 +00:00
Jon Chesterfield	624f12d34f	[amdgpu] Drop lowering of LDS used by global variables Approximately revert D103431. LDS variables are allocated at kernel launch and deallocated at kernel exit. The address is therefore kernel execution dependent. Global variables are initialized by values written to .data, which can't be done for a LDS variable as there is no kernel running, or by a global constructor. Initializing the global to the address of some LDS allocated by a global constructor is possible but indistinguishable from undef. Assigning the address of a LDS variable to a global should be a sema error. It isn't for openmp, haven't checked other languages. Failing that it could be set to undef, perhaps in this pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115413	2021-12-14 21:59:26 +00:00
Jay Foad	819fb457a6	[AMDGPU] Regenerate checks in high-bits-zeroed-16-bit-ops.mir	2021-12-14 15:33:35 +00:00
Stanislav Mekhanoshin	c4aef9c281	Check subrange liveness at rematerialization LiveRangeEdit::allUsesAvailableAt checks that VNI at use is the same as at the original use slot. However, the VNI can be the same while a specific subrange needed for use can be dead at the new index. This patch adds subrange liveness check if there is a subreg use. Fixes: SWDEV-312810 Differential Revision: https://reviews.llvm.org/D115278	2021-12-13 11:11:55 -08:00
Neubauer, Sebastian	26924b57e8	[AMDGPU] Ignore special ABI registers for graphics Fixed ABI arguments are compute specific and should not be added to graphics shaders or functions, so do not try to add them. Differential Revision: https://reviews.llvm.org/D115344	2021-12-13 16:44:37 +01:00
Jon Chesterfield	28345d7f6f	[amdgpu] Add regression test for LDS in metadata	2021-12-13 13:35:38 +00:00
Jon Chesterfield	24b28db8cc	[amdgpu] Increase alignment of all LDS variables Currently the superalign option only increases the alignment of variables that are moved into the module.lds block. Change that to all LDS variables. Also only increase the alignment once, instead of once per function. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D115488	2021-12-12 19:30:32 +00:00
Matt Arsenault	06b90175e7	AMDGPU: Remove fixed function ABI option	2021-12-10 19:41:19 -05:00
Christudasan Devadasan	cf58b9ce98	[AMDGPU] Add AV class spill pseudo instructions While enabling vector superclasses with D109301, the AV spills are converted into VGPR spills by introducing appropriate copies. The whole thing ended up adding two instructions per spill (a copy + vgpr spill pseudo) and caused an incorrect liverange update during inline spiller. This patch adds the pseudo instructions for all AV spills from 32b to 1024b and handles them in the way all other spills are lowered. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115439	2021-12-10 03:10:34 -05:00
Konstantin Schwarz	a344653725	[GlobalISel] Fix IRTranslator for constexpr fcmp The existing code assumed fcmp to always be an Instruction, but it can also be a ConstExpr. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D115450	2021-12-10 08:49:12 +01:00
Matt Arsenault	017ef78549	AMDGPU: Mark scc defs dead in SGPR to VMEM path for no free SGPRs This introduces verifier errors into this broken situation which we do not handle correctly, which is better than being silently miscompiled. For the emergency stack slot, the scavenger likes to move the restore instruction as late as possible, which ends up separating the SCC def from the conditional branch.	2021-12-08 18:40:49 -05:00
Matt Arsenault	0383872295	AMDGPU: Simplify test for SGPR spilling bug Remove the control flow from the test from `25eb7fa01d`. It's not necessary to reproduce the original assert with the patch reverted. The control flow happens to expose a different issue that calls for a separate test in a future change.	2021-12-08 18:40:44 -05:00
Matt Arsenault	a92cf7cea5	AMDGPU: Mark SCC def as dead when expanding frame indexes This improves liveness queries in a future change.	2021-12-08 18:40:39 -05:00
Sanjay Patel	e9179a6a02	[Support] improve known bits analysis for multiply by power-of-2 (1 set bit) This can be viewed as recognizing that multiply-by-power-of-2 doesn't have a carry into the top bit of an M-bit * N-bit number. Enhancing canonicalization of mul -> select might also handle some of these if we were ok with increasing instruction count with casts in some cases. This doesn't help https://llvm.org/PR49055 , but it's a simpler pattern that we miss. Note: "-sccp" already gets these examples using a constant range analysis. Differential Revision: https://reviews.llvm.org/D114962	2021-12-08 11:50:05 -05:00
Jack Andersen	f108c7f59d	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852	2021-12-05 15:55:59 -05:00
Matt Arsenault	729bf9b26b	AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway. Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.	2021-12-04 10:49:18 -05:00
Matt Arsenault	2959e082e1	AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm.amdgcn.implicitarg.ptr. This violates the principle of being correct by default. Some OpenMP testcases were broken recently since it wasn't correctly setting this attribute, and no known frontends are setting this to anything other than the maximum. Most of the test changes are from load widening of argument loads since there now more implied dereferenceable bytes.	2021-12-04 10:38:25 -05:00
Matt Arsenault	ae0ba7dedd	AMDGPU: Optimize out implicit kernarg argument allocation if unused We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arguments if unneeded.	2021-12-04 10:38:25 -05:00
Jay Foad	2774bad112	[AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to match how the hardware instruction works. Don't change the API of the corresponding OpenCL builtins. Differential Revision: https://reviews.llvm.org/D115032	2021-12-04 10:32:11 +00:00
Jay Foad	bc7dacf589	[AMDGPU] Generate checks for llvm.amdgcn.image.bvh.intersect.ray Differential Revision: https://reviews.llvm.org/D114955	2021-12-04 10:32:11 +00:00
Stanislav Mekhanoshin	e1d6306815	[AMDGPU] Fixed incomplete definitions in twoaddr-fma.mir. NFC.	2021-12-03 10:18:03 -08:00
Stanislav Mekhanoshin	3b17cb1506	[AMDGPU] Kill def when folding immediate in two-addr pass Two-address pass works right before RA and if an immediate was folded into an instruction there is nothing to remove the dead def. We end up with something like: v_mov_b32_e32 v14, 0xc1700000 v_mov_b32_e32 v14, 0x41200000 v_fmaak_f32 v51, s67, v19, 0xc1700000 v_fmaak_f32 v38, v51, v19, 0x4120000 The patch kills the dead move instruction right in the folding. Differential Revision: https://reviews.llvm.org/D114999	2021-12-03 09:37:49 -08:00
Jay Foad	b670dcb81b	[AMDGPU] Add some more GFX10 test coverage	2021-12-03 14:03:31 +00:00
Jay Foad	b29b6f92af	[AMDGPU] Add some more GFX10 GlobalISel test coverage	2021-12-03 13:40:27 +00:00
Petar Avramovic	0b34ffe4a6	AMDGPU/GlobalISel: Add clamp combine Add clamp combine. Source is fminnum(fmaxnum(Val, 0.0), 1.0) or fmaxnum(fminnum(Val, 1.0), 0.0) or fmed3 intrinsic with 0.0 and 1.0 as two out of three operands. Differential Revision: https://reviews.llvm.org/D90052	2021-12-03 12:49:39 +01:00

1 2 3 4 5 ...

5060 Commits