llvm-project

Commit Graph

Author	SHA1	Message	Date
Austin Kerbow	30f18ed387	[AMDGPU] Handle SMRD signed offset immediate Summary: This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a signed byte offset immediate, however we can overflow into a negative since we treat it as unsigned. Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We sometimes tried to use negative values here. Third, S_BUFFER instructions should never use a signed offset immediate. Differential Revision: https://reviews.llvm.org/D77082	2020-04-02 17:41:52 -07:00
Simon Pilgrim	be7a233e93	Fix operator precedence warning. NFCI.	2020-04-01 14:36:52 +01:00
Matt Arsenault	d0dd24a381	AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources No test since these cases shouldn't really be getting through the legalizer.	2020-03-30 18:14:04 -04:00
Matt Arsenault	bcb643c8af	AMDGPU/GlobalISel: Handle image atomics	2020-03-30 17:41:04 -04:00
Matt Arsenault	48eda37282	AMDGPU/GlobalISel: Start selecting image intrinsics Does not handled atomics yet.	2020-03-30 17:33:04 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Matt Arsenault	ed72bcae34	AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul We weren't considering the packed case correctly, and this was passing through to the selector. The selector only checked the size, so this would incorrectly compile to a single 32-bit scalar add. As usual, the LegalizerHelper is somewhat awkward to use from applyMappingImpl. I think this is the first place we've needed multi-step legalization here though.	2020-03-09 22:51:54 -04:00
Matt Arsenault	15bf916b54	AMDGPU: Remove VOP3OpSelMods0 complex pattern Use default operand of 0 instead.	2020-03-04 17:18:22 -05:00
Matt Arsenault	0b46b078b6	AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding We use some s32 values in VOP3P operands, and won't see any intervening casts from a 32-bit fneg. Make sure it's really a packed fneg before folding.	2020-02-24 21:20:35 -05:00
Matt Arsenault	bf4933b4ea	AMDGPU/GlobalISel: Remove dead code	2020-02-21 19:19:32 -05:00
Jay Foad	b72f1448ce	AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74987	2020-02-21 21:16:39 +00:00
Matt Arsenault	dfce5fd50a	AMDGPU/GlobalISel: Select VOP3P instructions This only handles the basic cases. More work is needed to make better use of op_sel.	2020-02-21 13:35:40 -05:00
Matt Arsenault	72eef820d5	AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel for VOP3P instructions. Expand it in some other way in case it doesn't fold into the use instructions.	2020-02-21 13:35:40 -05:00
Matt Arsenault	043ed2e22a	AMDGPU/GlobalISel: Fix xnor matching We should try the generated matchers before the manual selection. This means the patterns are now handling the common cases, but the manual selection code is not yet dead. It's still handling the non-s32/s64 cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice way to have a single pattern that covers multiple types.	2020-02-21 11:42:49 -05:00
Matt Arsenault	ac7abe0ba9	AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC We have patterns for s_pack* selection, but they assume the inputs are a build_vector with 16-bit inputs, not a truncating build vector. Since there's still outstanding work for how to handle mismatched result and source element vector operations, and since I'm trying a different packed vector strategy than SelectionDAG, just manually select this for now.	2020-02-21 10:34:11 -05:00
Matt Arsenault	b64aa8c715	AMDGPU/GlobalISel: Fix constant bus violation with source modifiers This looked through copies to find the source modifiers, which may have been SGPR->VGPR copies added to avoid potential constant bus violations. Re-insert a copy to a VGPR if this happens.	2020-02-21 10:30:23 -05:00
Matt Arsenault	ff4639f060	AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg I'm not sure why this isn't a pattern, but the DAG manually selects this.	2020-02-19 06:19:22 -08:00
Matt Arsenault	86813e2768	AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering does.	2020-02-17 08:02:40 -08:00
Matt Arsenault	e5805529bf	AMDGPU/GlobalISel: Select v2s32->v2s16 G_TRUNC It would be nice if there was a way to avoid the tied operand, but as far as I can tell there isn't a way to use or with op_sel to achieve this	2020-02-17 09:20:13 -05:00
Matt Arsenault	8d8d46b57a	AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops	2020-02-14 22:35:30 -05:00
Matt Arsenault	dc3e499dd4	AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results This would assert on an unhandled size in getRegSplitParts.	2020-02-14 15:57:40 -08:00
Austin Kerbow	3a312c3ee5	[AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned Differential Revision: https://reviews.llvm.org/D74261	2020-02-11 16:57:13 -08:00
Stanislav Mekhanoshin	453a8f3af7	[AMDGPU] Remove AMDGPURegisterInfo R600 and GCN do not have anything in common in terms of register file organization anymore. Differential Revision: https://reviews.llvm.org/D74426	2020-02-11 11:13:38 -08:00
Matt Arsenault	2126c70e3a	AMDGPU/GlobalISel: Don't mis-select vector index on a constant Vector indexing with a constant index should be folded out in the legalizer, but this was accidentally falling through. This would produce the indexing operation with $noreg. Handle this case as a dynamic index just in case a bug like this happens again in the future.	2020-02-09 18:02:37 -05:00
Matt Arsenault	12fe9b26ec	AMDGPU/GlobalISel: Select G_SEXT_INREG	2020-02-04 13:23:53 -08:00
Matt Arsenault	49e424e08e	AMDGPU/GlobalISel: Select global MUBUF atomicrmw	2020-01-31 06:05:41 -08:00
Matt Arsenault	0426c2d07d	Reapply "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `6a4acb9d80`.	2020-01-31 06:01:28 -08:00
Matt Arsenault	6a4acb9d80	Revert "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `17dbc6611d`. A test is failing on some bots	2020-01-30 15:39:51 -08:00
Matt Arsenault	17dbc6611d	AMDGPU: Cleanup and fix SMRD offset handling I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the offsets are dword aligned before folding.	2020-01-30 15:04:21 -08:00
Matt Arsenault	d6b83d6ba5	AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal This is always a G_CONSTANT already	2020-01-30 09:38:43 -05:00
Austin Kerbow	2605adb69c	[AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73585	2020-01-29 10:42:12 -08:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Matt Arsenault	94e8ef4d4c	AMDGPU/GlobalISel: Look through copies for source modifiers When all VOP instructions are legalized to VGPRs, any SGPR source modifiers will have a copy in the way.	2020-01-29 08:08:13 -08:00
Matt Arsenault	02adfb5155	AMDGPU/GlobalISel: Manually select scalar f64 G_FNEG This should be no problem to support with a pattern, but it turns out there are just too many yaks to shave. The main problem is in the DAG emitter, which I have no desire to sink effort into fixing. If we had a bit to disable patterns in the DAG importer, fixing the GlobalISelEmitter is more manageable.	2020-01-29 06:49:16 -08:00
Matt Arsenault	d2a9739274	AMDGPU/GlobalISel: Eliminate SelectVOP3Mods_f32 Trivial type predicates should be moved into the tablegen pattern itself, and not checked inside complex patterns. This eliminates a redundant complex pattern, and fixes select source modifiers for GlobalISel. I have further patches which fully handle select in tablegen and remove all of the C++ selection, although it requires the ugliness to support the entire range of legal register types.	2020-01-27 17:53:54 -05:00
Matt Arsenault	533d650e94	AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling Treat this the same way as loads. There's less value to the intermediate nodes, but it's good to be consistent.	2020-01-27 14:59:30 -05:00
Matt Arsenault	09ed0e44d9	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load	2020-01-27 13:40:37 -05:00
Matt Arsenault	fc90222a91	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load Use intermediate instructions, unlike with buffer stores. This is necessary because of the need to have an internal way to distinguish between signed and unsigned extloads. This introduces some duplication and near duplication with the buffer store selection path. The store handling should maybe be moved into legalization to match and eliminate the duplication.	2020-01-27 12:49:23 -05:00
Matt Arsenault	e60d658260	AMDGPU/GlobalISel: Handle VOP3NoMods	2020-01-27 09:03:44 -08:00
Matt Arsenault	0968234590	AMDGPU/GlobalISel: Minor refactor of MUBUF complex patterns This will make it easier to support the small variants in the complex patterns for atomics.	2020-01-27 09:00:00 -08:00
Matt Arsenault	ac0b9b4ccf	AMDPGPU/GlobalISel: Select more MUBUF global addressing modes The handling of the high bits of the resource descriptor seem weird to me, where the 3rd dword changes based on the instruction.	2020-01-27 07:28:36 -08:00
Matt Arsenault	fdaad485e6	AMDGPU/GlobalISel: Initial selection of MUBUF addr64 load/store Fixes the main reason for compile failures on SI, but doesn't really try to use the addressing modes yet.	2020-01-27 07:13:56 -08:00
Maheaha Shivamallappa	66f93071cd	AMDGPU/GlobalISel: Clean-up code around ISel for Intrinsics. Summary: A minor code clean-up around ISel for intrinsic llvm.amdgcn.end.cf() Reviewers: arsenm, mshivama Reviewed By: arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D73358	2020-01-26 14:09:31 +05:30
Matt Arsenault	3b93945587	AMDGPU/GlobalISel: Select wqm, softwqm and wwm intrinsics	2020-01-24 13:06:44 -08:00
Matt Arsenault	1192d7b254	AMDGPU/GlobalISel: Handle 16-bank LDS llvm.amdgcn.interp.p1.f16 The pattern is also mishandled by the generated matcher, so workaround this as in the DAG path. The existing DAG tests aren't particularly targeted to just this one intrinsic. These also end up differing in scheduling from SGPR->VGPR operand constraint copies.	2020-01-22 12:10:59 -05:00
Matt Arsenault	52ec7379ad	AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT Move the subregister base like in the extract case.	2020-01-22 11:09:15 -05:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Matt Arsenault	e3d352c541	AMDGPU/GlobalISel: Fold constant offset vector extract indexes Handle dynamic vector extracts that use an index that's an add of a constant offset into moving the base subregister of the indexing operation. Force the add into the loop in regbankselect, which will be recognized when selected.	2020-01-22 10:50:59 -05:00
Matt Arsenault	a722cbf77c	AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec The intermediate instruction drops the extra volatile argument. We are missing an atomic ordering on these.	2020-01-22 09:26:17 -05:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00

1 2 3 4

190 Commits