llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	9ad8a1f6fb	AMDGPU: Fix high 16-bit optimization on gfx9 We can do this optimization in the majority of cases, but we currently don't have a way to do it. We do not track/model which instructions have which behavior, the control bit to change the high bit behavior, or making use of preserved bits at all. This is a bit fuzzy since we don't know precisely how the source instruction will be lowered, but that only really matters in one case (for fma_mixlo). We do need to fixup some of these cases after selection, but the pattern helps eliminate many of these zexts.	2021-06-22 13:16:45 -04:00
Matt Arsenault	a7786badb7	AMDGPU: Move zeroed FP high bits optimization to patterns	2021-06-22 12:47:56 -04:00
Sebastian Neubauer	96e1fcb1e0	[AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differential Revision: https://reviews.llvm.org/D103322	2021-06-07 16:09:48 +02:00
Sebastian Neubauer	690f5b7a01	[AMDGPU] Fix function calls with flat scratch When flat scratch is used, the stack pointer needs to be added when writing arguments to the stack. For buffer instructions, this is done in SelectMUBUFScratchOffen and SelectMUBUFScratchOffset. Move that to call argument lowering, like it is done in GlobalISel. Differential Revision: https://reviews.llvm.org/D103166	2021-05-28 11:22:13 +02:00
Sebastian Neubauer	13c0316239	[AMDGPU] Restrict immediate scratch offsets gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets. This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch. Differential Revision: https://reviews.llvm.org/D101292	2021-05-07 14:51:32 +02:00
Stanislav Mekhanoshin	909a5ccf3b	[AMDGPU] Improve global SADDR selection An address can be a uniform sum of two i64 bit values. That regularly happens in a loop where index is an induction variable promoted to 64 bit by the LSR. We can materialize zero in a VGPR and still use SADDR form of the load. Differential Revision: https://reviews.llvm.org/D101591	2021-05-05 14:44:21 -07:00
Stanislav Mekhanoshin	b7ebb25e53	[AMDGPU] Factor out SelectSAddrFI() This is a service function generally useful for selection of a FI in an SADDR. NFC for now, needed for future patch. Differential Revision: https://reviews.llvm.org/D100406	2021-04-14 09:40:02 -07:00
Sebastian Neubauer	cc7add5298	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
Stanislav Mekhanoshin	edd6da10d2	[AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns These are always selected as 0 anyway. Differential Revision: https://reviews.llvm.org/D98663	2021-03-18 14:36:04 -07:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Simon Pilgrim	25b788716b	[AMDGPU] Fix "initialization is never read" clang-tidy warnings. NFCI.	2021-03-02 12:06:24 +00:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sebastian Neubauer	91445979be	[AMDGPU] Unify flat offset logic Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into AMDGPUBaseInfo. Differential Revision: https://reviews.llvm.org/D93287	2020-12-15 14:59:59 +01:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Matt Arsenault	e722943e05	AMDGPU: Factor out large flat offset splitting	2020-11-13 11:22:13 -05:00
Matt Arsenault	0fd6a04ba4	AMDGPU: Refactor getBaseWithOffsetUsingSplitOR usage	2020-11-13 10:58:17 -05:00
Stanislav Mekhanoshin	f738aee0bb	[AMDGPU] Add default 1 glc operand to rtn atomics This change adds a real glc operand to the return atomic instead of just string " glc" in the middle of the asm string. Improves asm parser diagnostics. Differential Revision: https://reviews.llvm.org/D90730	2020-11-05 10:41:59 -08:00
Jay Foad	040c50278c	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jay Foad	0892d2a311	Revert "Fix ds_read2/write2 unaligned offsets" This reverts commit `2e7e898c8f`. It was committed by mistake.	2020-11-02 14:01:33 +00:00
Jay Foad	2e7e898c8f	Fix ds_read2/write2 unaligned offsets	2020-11-02 13:57:13 +00:00
Jay Foad	7a79921edd	[AMDGPU] Remove gds operand from ds_gws_* MachineInstrs The operand value was always 1 (except in some bad MIR tests) so it was redundant. Differential Revision: https://reviews.llvm.org/D90378	2020-10-29 15:04:23 +00:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Christudasan Devadasan	5a061041ec	[AMDGPU] Avoid offset register in MUBUF for direct stack object accesses We use an absolute address for stack objects and it would be necessary to have a constant 0 for soffset field. Fixes: SWDEV-228562 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89234	2020-10-26 11:08:37 +05:30
Jay Foad	e20f459229	[AMDGPU] Simplify getNumFlatOffsetBits. NFC. Remove some checks that have already been done in the only caller.	2020-10-01 15:24:09 +01:00
Jonas Paulsson	714ceefad9	[SelectionDAG] Always intersect SDNode flags during getNode() node memoization. Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was in an undefined state, which is very bad given that this is the default when getNode() is called without passing an explicit SDNodeFlags argument. This meant that if an already existing and reused node had a flag which the second caller to getNode() did not set, that flag would remain uncleared. This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW flag was incorrectly set on an add instruction (which did in fact overflow in one of the two original contexts), so when SystemZElimCompare removed the compare with 0 trusting that flag, wrong-code resulted. There is more that needs to be done in this area as discussed here: Differential Revision: https://reviews.llvm.org/D86871 Review: Ulrich Weigand, Sanjay Patel	2020-09-05 10:30:38 +02:00
Matt Arsenault	f78687df9b	AMDGPU: Don't assert on misaligned DS read2/write2 offsets This would assert with unaligned DS access enabled. The offset may not be aligned. Theoretically the pattern predicate should check the memory alignment, although it is possible to have the memory be aligned but not the immediate offset. In this case I would expect it to use ds_{read\|write}_b64 with unaligned access, but am not clear if there's a reason it doesn't.	2020-08-26 14:08:05 -04:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Matt Arsenault	e1a2f4713c	AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cases due to the shift/scale applied in getelementptrs. This relies on reducing the 64-bit shift to 32-bits.	2020-08-17 15:28:14 -04:00
Matt Arsenault	625db2fe5b	AMDGPU: Remove slc from flat offset complex patterns This was always set to 0. Use a default value of 0 in this context to satisfy the instruction definition patterns. We can't unconditionally use SLC with a default value of 0 due to limitations in TableGen's handling of defaulted operands when followed by non-default operands.	2020-08-15 12:12:24 -04:00
Matt Arsenault	8cb022982a	AMDGPU: Remove redundant FLAT complex patterns These were identical to the non-atomic cases. I'm not sure why these were ever separated.	2020-08-15 12:12:01 -04:00
Matt Arsenault	cdd45d5f9c	AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub Remove the custom node boilerplate. Not sure why this tried to handle the LDS atomic stuff.	2020-07-29 08:27:31 -04:00
Jay Foad	760af7a074	[AMDGPU] Avoid splitting FLAT offsets in unsafe ways As explained in the comment: // For a FLAT instruction the hardware decides whether to access // global/scratch/shared memory based on the high bits of vaddr, // ignoring the offset field, so we have to ensure that when we add // remainder to vaddr it still points into the same underlying object. // The easiest way to do that is to make sure that we split the offset // into two pieces that are both >= 0 or both <= 0. In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have an unsigned immediate offset field, so we can't use it to help split a negative offset. Differential Revision: https://reviews.llvm.org/D83394	2020-07-17 11:44:10 +01:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Michael Liao	b1360caa82	[SDAG] Add new AssertAlign ISD node. Summary: - AssertAlign node records the guaranteed alignment on its source node, where these alignments are retrieved from alignment attributes in LLVM IR. These tracked alignments could help DAG combining and lowering generating efficient code. - In this patch, the basic support of AssertAlign node is added. So far, we only generate AssertAlign nodes on return values from intrinsic calls. - Addressing selection in AMDGPU is revised accordingly to capture the new (base + offset) patterns. Reviewers: arsenm, bogner Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81711	2020-06-23 00:51:11 -04:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Simon Pilgrim	eda13c2420	LegacyDivergenceAnalysis.h - reduce DivergenceAnalysis.h include to forward declaration. NFC. Move implicit include dependencies down to source file.	2020-06-06 13:30:00 +01:00
Matt Arsenault	bb10fa3a53	AMDGPU: Fix wrong null value for private address space I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before. Also don't allow offset folding for -1 pointers, since it looks weird to partially fold this.	2020-05-26 16:35:13 -04:00
Matt Arsenault	e09064e97f	AMDGPU: Update store node checks for atomics Prepare to switch to using StoreSDNode for atomic stores.	2020-05-26 15:20:03 -04:00
Stanislav Mekhanoshin	591b029f40	[AMDGPU] Optimized indirect multi-VGPR addressing SelectMOVRELOffset prevents peeling of a constant from an index if final base could be negative. isBaseWithConstantOffset() succeeds if a value is an "add" or "or" operator. In case of "or" it shall be an add-like "or" which never changes a sign of the sum given a non-negative offset. I.e. we can safely allow peeling if operator is an "or". Differential Revision: https://reviews.llvm.org/D79898	2020-05-13 14:53:16 -07:00
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Jay Foad	658f33dcea	[AMDGPU] Remove selectSGPRVectorRegClassID. NFC. This was yet another function that had to be updated whenever you added a new register class. Remove it by refactoring its only caller to use standard helper functions from SIRegisterInfo. Differential Revision: https://reviews.llvm.org/D78557	2020-04-21 16:29:21 +01:00

1 2 3 4 5 ...

286 Commits