llvm-project

Commit Graph

Author	SHA1	Message	Date
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Petar Avramovic	e03d36d4ae	[AMDGPU] Add FeatureFlatAtomicFaddF32Inst Feature used by targets that have flat_atomic_add_f32 instruction (gfx940 and gfx11). Remove isGFX940GFX11Plus. Add hasFlatAtomicFaddF32Inst Subtarget check for codegen. Differential Revision: https://reviews.llvm.org/D134532	2022-09-23 17:52:10 +02:00
Abinav Puthan Purayil	7504c7a877	[AMDGPU] Use AddedComplexity for ret and noret atomic ops selection This patch removes the predicate for return atomic ops and uses AddedComplexity to distinguish its selection from its no return variant. This will produce better matchers that doesn't unnecessarily check for the negated predicate if the initial predicate failed. Also, it simplifies the enabling of no return atomic ops selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D128241	2022-07-08 09:47:33 +05:30
Joe Nash	835e09c4c3	[AMDGPU] gfx11 FLAT Instructions MachineCode Support for FLAT type instructions Contributors: Sebastian Neubauer <sebastian.neubauer@amd.com> Patch 12/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125989 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D125992	2022-05-25 15:29:39 -04:00
Stanislav Mekhanoshin	a09af86693	[AMDGPU] Enable FLAT LDS DMA on gfx9/10 before gfx940 We always had global and scratch loads to LDS in the gfx9, but did not handle it. These were available via the 'lds' encoding bit. In gfx940 this bit was reused as 'svs' which resulted in new '_lds' opcodes effectively pushing this bit into the opcode, but functionally it is the same. These instructions are also available on gfx10. Differential Revision: https://reviews.llvm.org/D125126	2022-05-17 12:16:37 -07:00
Joe Nash	7e71a03966	[AMDGPU] Split FeatureAtomicFaddInsts FeatureAtomicFaddInsts is replaced with three more granular features. Contributors: Petar Avramovic <Petar.Avramovic@amd.com> Patch 3/N for upstreaming of AMDGPU gfx11 architecture Depends on D124537 Reviewed By: foad, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D124538	2022-05-05 13:27:45 -04:00
Stanislav Mekhanoshin	a9ccc7bc54	[AMDGPU] Properly mark MUBUF and FLAT LDS DMA instructions. NFC. Add these bits to the MUBUF and FLAT LDS DMA instructions: - LGKM_CNT - these operate on LDS; - VALU - SPG 3.9.8: This instruction acts as both a MUBUF and VALU instruction; Codegen currently does not produce any of this, so the change is NFC. Differential Revision: https://reviews.llvm.org/D124472	2022-04-26 14:20:26 -07:00
Matt Arsenault	0ecbb683a2	TableGen/GlobalISel: Make address space/align predicates consistent The builtin predicate handling has a strange behavior where the code assumes that a PatFrag is a stack of PatFrags, and each level adds at most one predicate. I don't think this particularly makes sense, especially without a diagnostic to ensure you aren't trying to set multiple at once. This wasn't followed for address spaces and alignment, which could potentially fall through to report no builtin predicate was added. Just switch these to follow the existing convention for now.	2022-04-22 15:48:07 -04:00
Abinav Puthan Purayil	272a876804	[AMDGPU] Rename the FlatSignedIntrPat multiclass to FlatSignedAtomicIntrPat. NFC	2022-04-22 11:47:23 +05:30
Abinav Puthan Purayil	165ae7276c	[AMDGPU] Remove atomic pattern args in FLAT_[Global_]Atomic_Pseudo defs We already have explicit patterns for these. Differential Revision: https://reviews.llvm.org/D124084	2022-04-22 09:37:40 +05:30
Abinav Puthan Purayil	45ca94334e	[AMDGPU] Select no-return atomic intrinsics in tblgen This is to avoid relying on the post-isel hook. This change also enable the saddr pattern selection for atomic intrinsics in GlobalISel. Differential Revision: https://reviews.llvm.org/D123583	2022-04-22 09:37:40 +05:30
Jay Foad	f707e1255e	[AMDGPU] Select d16 stores even when sramecc is enabled The sramecc feature changes the behaviour of d16 loads so they do not preserve the unused 16 bits of the result register, but it has no impact on d16 stores, so we should make use of them even when the feature is enabled. Differential Revision: https://reviews.llvm.org/D104912	2022-04-19 09:34:32 +01:00
Matt Arsenault	df29ec2f54	AMDGPU: Select i8/i16 global and flat atomic load/store As far as I know these should be atomic anyway, as long as the address is aligned. Unaligned atomics hit an ugly error in AtomicExpand.	2022-04-14 20:52:05 -04:00
Jay Foad	fb8d23b8e7	[AMDGPU] Define new feature HasFlatScratchSVSMode. NFC. This is by analogy with HasFlatScratchSTMode and is slightly more informative than using isGFX940Plus. Differential Revision: https://reviews.llvm.org/D121804	2022-03-16 19:54:02 +00:00
Stanislav Mekhanoshin	23499103f7	[AMDGPU] Support for gfx940 flat lds opcodes Differential Revision: https://reviews.llvm.org/D121414	2022-03-14 15:46:19 -07:00
Stanislav Mekhanoshin	36fe3f13a9	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254	2022-03-14 15:23:36 -07:00
Jay Foad	c7218164c4	[AMDGPU] Remove HasAtomicFaddInstsGFX90X and HasAtomicFaddInstsGFX940 These compound predicates are not required, since we can use a combination of setting the SubtargetPredicate (to a subtarget predicate like isGFX940Plus) and OtherPredicates (to a list of feature predicates like HasAtomicFaddInsts) instead. NFC. Differential Revision: https://reviews.llvm.org/D121289	2022-03-09 18:02:21 +00:00
Stanislav Mekhanoshin	932f628121	[AMDGPU] new gfx940 fp atomics Differential Revision: https://reviews.llvm.org/D121028	2022-03-07 12:32:02 -08:00
Abinav Puthan Purayil	29bd3fadbc	[AMDGPU] Select no-return atomic ops in FLATInstructions.td. This change adds the selection for the no-return global_* and flat_* instructions in tblgen. The motivation for this is to get the no-return atomic isel working without relying on post-isel hooks so that GlobalISel can start selecting them (once GlobalISelEmitter allows no return atomic patterns like how DAGISel does). Differential Revision: https://reviews.llvm.org/D119227	2022-02-10 09:26:37 +05:30
Matt Arsenault	8ff3c9e0be	AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics The struct/raw forms for the buffer atomics now work as expected. However, we're incorrectly handling the legacy form (which we probably shouldn't handle at all). We also are not diagnosing the use of the return value on gfx908. These will be addressed separately.	2022-01-20 12:12:06 -05:00
Jessica Clarke	3ee56eed2f	[AMDGPU][NFC] Alter ComplexPattern types to be consistent with their uses When used as a non-leaf node, TableGen does not currently use the type of a ComplexPattern for type inference, which also means it does not check it doesn't conflict with the use. This differs from when used as a leaf value, where the type is used for inference. Fixing that discrepancy is something I intend to upstream as a subsequent review. AMDGPU currently has several ComplexPatterns that are used in contexts where they're expected to be an iPTR, and where using an iPTR instead of a fixed-width integer type matters. With my locally-patched TableGen, none of these mismatches result in type contradictions, but do change the patterns and cause various failures to select. These changes to the ComplexPatterns' types reflect how they are actually used, result in bit-for-bit identical TableGen output (without my local TableGen patch), and ensure that with improved type inference AMDGPU's backend will continue to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109032	2021-12-03 07:04:59 +00:00
Dmitry Preobrazhensky	91f4650ebb	[AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap* Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes. Differential Revision: https://reviews.llvm.org/D113746	2021-11-15 12:51:12 +03:00
Dmitry Preobrazhensky	b8e7f53208	[AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics Differential Revision: https://reviews.llvm.org/D109614	2021-09-21 16:23:20 +03:00
Jay Foad	128a49727a	[AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC. The warning is implemented by D109359 which is still in review. Differential Revision: https://reviews.llvm.org/D109826	2021-09-16 09:07:18 +01:00
Joe Nash	e381833ba5	[AMDGPU] Support global_atomic_fmin/max on gfx10 Makes patterns added for gfx90a usable with the gfx10 versions of the insts. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108654 Change-Id: I86167bf6b4823f975f74ccb619bd6190331ba16b	2021-08-25 09:35:10 -04:00
Jay Foad	24ffc343f9	[AMDGPU] Set IsAtomicRet and IsAtomicNoRet on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 12:23:29 +01:00
Jay Foad	323b3e645d	[AMDGPU] Set mayLoad and mayStore on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 12:10:23 +01:00
Jay Foad	6f778fed8e	[AMDGPU] Set more flags on Real instructions This does not affect codegen, which only tests these flags on Pseudo instructions, but might help llvm-mca which has to work with Real instructions. In particular setting LGKM_CNT on DS instructions helps with the problem identified in D104149. Differential Revision: https://reviews.llvm.org/D104293	2021-06-16 09:58:50 +01:00
Matt Arsenault	ed633a1daa	AMDGPU: Restore atomic fp feature on FP atomic instruction definitions `9931b1f7a4` switched this to checking for the two specific subtargets, instead of the dedicated feature. This broke supporting functions which force added the feature when emitting targets that do not actually support them. This stil does not work for the targets that use the gfx6/7 or gfx10 encodings.	2021-04-22 21:32:01 -04:00
Sebastian Neubauer	cc7add5298	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
Sebastian Neubauer	36138db116	[AMDGPU] IsFlatScratch/Global -> FlatScratch/Global Remove 'Is' from IsFlatScratch/Global. NFC Differential Revision: https://reviews.llvm.org/D100108	2021-04-09 11:20:31 +02:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Stanislav Mekhanoshin	9931b1f7a4	[AMDGPU] Disable SCC bit on fp atomics Differential Revision: https://reviews.llvm.org/D98221	2021-03-10 12:36:09 -08:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin	5cf9292ce3	[AMDGPU] Add two TSFlags: IsAtomicNoRtn and IsAtomicRtn We are using AtomicNoRet map in multiple places to determine if an instruction atomic, rtn or nortn atomic. This method does not work always since we have some instructions which only has rtn or nortn version. One such instruction is ds_wrxchg_rtn_b32 which does not have nortn version. This has caused changes in memory legalizer tests. Differential Revision: https://reviews.llvm.org/D96639	2021-02-15 11:27:59 -08:00
Sebastian Neubauer	409a2f0f9e	[AMDGPU] Allow no saddr for global addtid insts I think the global_load/store_dword_addtid instructions support switching off the scalar address. Add assembler and disassembler support for this. Differential Revision: https://reviews.llvm.org/D93288	2020-12-16 10:01:40 +01:00
Stanislav Mekhanoshin	f738aee0bb	[AMDGPU] Add default 1 glc operand to rtn atomics This change adds a real glc operand to the return atomic instead of just string " glc" in the middle of the asm string. Improves asm parser diagnostics. Differential Revision: https://reviews.llvm.org/D90730	2020-11-05 10:41:59 -08:00
Stanislav Mekhanoshin	c9d6fe6f7d	[AMDGPU] Improve FLAT scratch detection We were useing too broad check for isFLATScratch() which also includes FLAT global. Differential Revision: https://reviews.llvm.org/D90505	2020-11-02 11:37:33 -08:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Jay Foad	f6a5699c6c	[AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC.	2020-10-21 09:56:43 +01:00
Stanislav Mekhanoshin	6ddadf9901	[AMDGPU] flat scratch ST addressing mode on gfx10 GFX10 enables third addressing mode for flat scratch instructions, an ST mode. In that mode both register operands are omitted and only swizzled offset is used in addition to flat_scratch base. Differential Revision: https://reviews.llvm.org/D89501	2020-10-19 15:29:52 -07:00
Stanislav Mekhanoshin	45014ce36f	[AMDGPU] Add tied operand to d16 scratch loads This is still no-op because there is no selection for these opcodes. Differential Revision: https://reviews.llvm.org/D88927	2020-10-07 11:13:01 -07:00
Stanislav Mekhanoshin	7361ce73ef	[AMDGPU] Use default zero flag operands in flat scratch This is no-op so far because we do not select these yet. Differential Revision: https://reviews.llvm.org/D88920	2020-10-07 10:56:47 -07:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Matt Arsenault	e1a2f4713c	AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cases due to the shift/scale applied in getelementptrs. This relies on reducing the 64-bit shift to 32-bits.	2020-08-17 15:28:14 -04:00
Matt Arsenault	e0375dbcb3	AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics Global instructions have the signed offsets.	2020-08-17 09:19:15 -04:00
Matt Arsenault	f0af434b79	AMDGPU: Remove register class params from flat memory patterns	2020-08-15 12:12:33 -04:00
Matt Arsenault	a7455652c0	AMDGPU: Fix global atomic saddr operand class	2020-08-15 12:12:28 -04:00
Matt Arsenault	625db2fe5b	AMDGPU: Remove slc from flat offset complex patterns This was always set to 0. Use a default value of 0 in this context to satisfy the instruction definition patterns. We can't unconditionally use SLC with a default value of 0 due to limitations in TableGen's handling of defaulted operands when followed by non-default operands.	2020-08-15 12:12:24 -04:00

1 2 3

119 Commits