llvm-project

Commit Graph

Author	SHA1	Message	Date
Mirko Brkusanin	971f4173f8	[AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary While v_cmp will AND inactive lanes with 0, that is not the case for logical operations. This fixes a Vulkan CTS test that would hang otherwise. Differential Revision: https://reviews.llvm.org/D105709	2021-07-29 11:20:49 +02:00
Brendon Cahoon	f9f5d41545	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
Sander de Smalen	d5e14ba88c	[GlobalISel] NFC: Change LLT::vector to take ElementCount. This also adds new interfaces for the fixed- and scalable case: * LLT::fixed_vector * LLT::scalable_vector The strategy for migrating to the new interfaces was as follows: * If the new LLT is a (modified) clone of another LLT, taking the same number of elements, then use LLT::vector(OtherTy.getElementCount()) or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2) or operator. That is because there is no reason to specifically restrict the types to 'fixed_vector'. If the algorithm works on the number of elements (as unsigned), then just use fixed_vector. This will need to be fixed up in the future when modifying the algorithm to also work for scalable vectors, and will need then need additional tests to confirm the behaviour works the same for scalable vectors. * If the test used the '/Scalable=/true` flag of LLT::vector, then this is replaced by LLT::scalable_vector. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104451	2021-06-24 11:26:12 +01:00
Sebastian Neubauer	96e1fcb1e0	[AMDGPU] Use s_add_i32 for address additions This allows to convert the add instruction to s_addk_i32 and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction. Differential Revision: https://reviews.llvm.org/D103322	2021-06-07 16:09:48 +02:00
Stanislav Mekhanoshin	9e2e49328f	[AMDGPU] All GWS instructions need aligned VGPR on gfx90a Fixes: SWDEV-288006 Differential Revision: https://reviews.llvm.org/D103197	2021-06-01 17:08:03 -07:00
Sebastian Neubauer	690f5b7a01	[AMDGPU] Fix function calls with flat scratch When flat scratch is used, the stack pointer needs to be added when writing arguments to the stack. For buffer instructions, this is done in SelectMUBUFScratchOffen and SelectMUBUFScratchOffset. Move that to call argument lowering, like it is done in GlobalISel. Differential Revision: https://reviews.llvm.org/D103166	2021-05-28 11:22:13 +02:00
David Stuttard	31b62aa162	[AMDGPU] Fix codegen of image intrinsics for g16 and a16 For gfx10 gradient (g16) and address (a16) can be independent. Previous implementation assumed that a16 implied g16. There are some other changes that fix the verification (as well as asm/disasm) that are required for the included test to pass - the XFAIL will be removed in those changes. This also includes required fixes for GlobalISel Differential Revision: https://reviews.llvm.org/D102066 Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09	2021-05-14 09:28:15 +01:00
Stanislav Mekhanoshin	909a5ccf3b	[AMDGPU] Improve global SADDR selection An address can be a uniform sum of two i64 bit values. That regularly happens in a loop where index is an induction variable promoted to 64 bit by the LSR. We can materialize zero in a VGPR and still use SADDR form of the load. Differential Revision: https://reviews.llvm.org/D101591	2021-05-05 14:44:21 -07:00
Petar Avramovic	c34900e133	AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return When atomic image intrinsic return value is unused, register class for destination of a sub-register copy of return value ends up not being set. This copy then hits 'Register class not set' assert later. If return value has uses, register class is determined by use instruction. Fix is to not create sub-register copy when image intrinsic destination has no uses because it would be deleted by dead-mi-elimination later anyway. Differential Revision: https://reviews.llvm.org/D101448	2021-04-29 20:56:03 +02:00
Sebastian Neubauer	cc7add5298	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
Jay Foad	3d07a6d891	[AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic Doing this during instruction selection avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99670	2021-04-01 18:13:17 +01:00
Jay Foad	5d0e9ddfa5	[AMDGPU][GlobalISel] Add support for global atomicrmw fadd This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously implemented for selecting from the llvm.amdgcn.global.atomic.fadd intrinsic. Differential Revision: https://reviews.llvm.org/D97767	2021-03-31 11:13:00 +01:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Matt Arsenault	70cb57d7da	AMDGPU/GlobalISel: Improve private addressing mode matching This enables the look-through-copy to hack around not correctly regbankselecting constants to match the use bank.	2021-03-11 10:23:35 -05:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
Matt Arsenault	78b6d73a93	AMDGPU: Add even aligned VGPR/AGPR register classes gfx90a operations require even aligned registers, but this was previously achieved by reserving registers inside the full class. Ideally this would be captured in the static instruction definitions for the operands, and we would have different instructions per subtarget. The hackiest part of this is we need to manually reassign AGPR register classes after instruction selection (we get away without this for VGPRs since those types are actually registered for legal types).	2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Mirko Brkusanin	3c979ae9ec	[AMDGPU][GlobalISel] Remove redundant cmp when copying constant to vcc Differential Revision: https://reviews.llvm.org/D95540	2021-01-28 11:20:09 +01:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Mirko Brkusanin	a6a72dfdf2	[AMDGPU][GlobalISel] Avoid selecting S_PACK with constants If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT. For that purpose we extend getConstantVRegValWithLookThrough with option to handle G_ANYEXT same way as G_SEXT. Differential Revision: https://reviews.llvm.org/D92219	2021-01-20 11:54:53 +01:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sebastian Neubauer	221fdedc69	[AMDGPU][GlobalISel] Fold flat vgpr + constant addresses Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more vgpr+constant addresses. Differential Revision: https://reviews.llvm.org/D93692	2020-12-23 10:40:30 +01:00
Matt Arsenault	581d13f8ae	GlobalISel: Return APInt from getConstantVRegVal Returning int64_t was arbitrarily limiting for wide integer types, and the functions should handle the full generality of the IR. Also changes the full form which returns the originally defined vreg. Add another wrapper for the common case of just immediately converting to int64_t (arguably this would be useful for the full return value case as well). One possible issue with this change is some of the existing uses did break without conversion to getConstantVRegSExtVal, and it's possible some without adequate test coverage are now broken.	2020-12-22 22:23:58 -05:00
Stanislav Mekhanoshin	d15119a02d	[AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670	2020-12-22 16:33:06 -08:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Jay Foad	4f87d30a06	[AMDGPU] Introduce and use isGFX10Plus. NFC. It's more future-proof to use isGFX10Plus from the start, on the assumption that future architectures will be based on current architectures. Also make use of the existing isGFX9Plus in a few places. Differential Revision: https://reviews.llvm.org/D92092	2020-11-26 09:02:36 +00:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Jessica Paquette	b184a2eccf	[GlobalISel] Add matchers for specific constants and a matcher for negations It's fairly common to need matchers for a specific constant value, or for common idioms like finding a negated register. Add - `m_SpecificICst`, which returns true when matching a specific value.. - `m_ZeroInt`, which returns true when an integer 0 is matched. - `m_Neg`, which returns when a register is negated. Also update a few places which use idioms related to the new matchers. Differential Revision: https://reviews.llvm.org/D91397	2020-11-13 09:24:54 -08:00
Jay Foad	0ad4d04002	[AMDGPU] Remove an unused return value. NFC. Differential Revision: https://reviews.llvm.org/D91063	2020-11-10 09:15:14 +00:00
Stanislav Mekhanoshin	f738aee0bb	[AMDGPU] Add default 1 glc operand to rtn atomics This change adds a real glc operand to the return atomic instead of just string " glc" in the middle of the asm string. Improves asm parser diagnostics. Differential Revision: https://reviews.llvm.org/D90730	2020-11-05 10:41:59 -08:00
Jay Foad	040c50278c	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jay Foad	7a79921edd	[AMDGPU] Remove gds operand from ds_gws_* MachineInstrs The operand value was always 1 (except in some bad MIR tests) so it was redundant. Differential Revision: https://reviews.llvm.org/D90378	2020-10-29 15:04:23 +00:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Rodrigo Dominguez	f71f5f39f6	[AMDGPU] Implement hardware bug workaround for image instructions Summary: This implements a workaround for a hardware bug in gfx8 and gfx9, where register usage is not estimated correctly for image_store and image_gather4 instructions when D16 is used. Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81172	2020-10-07 07:39:52 -04:00
Sebastian Neubauer	6a089ce0e4	[AMDGPU] Use tablegen for argument indices Use tablegen generic tables to get the index of image intrinsic arguments. Before, the computation of which image intrinsic argument is at which index was scattered in a few places, tablegen, the SDag instruction selection and GlobalISel. This patch changes that, so only tablegen contains code to compute indices and the ImageDimIntrinsicInfo table provides these information. Differential Revision: https://reviews.llvm.org/D86270	2020-10-05 11:50:52 +02:00
Stanislav Mekhanoshin	27a62f6317	[AMDGPU] global-isel support for RT Differential Revision: https://reviews.llvm.org/D87847	2020-09-24 10:29:45 -07:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Petar Avramovic	6e2a86ed5a	AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN Check for NoNaNsFPMath function attribute in isKnownNeverSNaN. Function attributes are in held in 'TargetMachine.Options'. Among other things, this allows selection of some patterns imported in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN returns true in lowerFMinNumMaxNum. However we notice some incorrect results since function attributes are not correctly written in TargetMachine.Options when next function is processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0, it has "no-nans-fp-math"="false" but TargetMachine.Options still has it set to true since first function in test file had this attribute set to true. This will be fixed in D87511. Differential Revision: https://reviews.llvm.org/D87456	2020-09-14 12:11:00 +02:00
Petar Avramovic	09b8871f8d	AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands Predicates with 'let PredicateCodeUsesOperands = 1' want to examine matched operands. When we encounter predicate code that uses operands, analyze its named operand arguments and create a map between argument index and name. Later, when leaf node with name is encountered, emit GIM_RecordNamedOperand that will store that operand at its argument index in operand list. This operand list will be an argument to c++ code of the predicate. Differential Revision: https://reviews.llvm.org/D87285	2020-09-14 10:39:56 +02:00
Jay Foad	831457c6d5	[AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guaranteed to come to the same point at the same time. This is the same optimization that was implemented for SelectionDAG in D31731. Differential Revision: https://reviews.llvm.org/D86609	2020-08-26 13:47:51 +01:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Matt Arsenault	2f5f5febf3	AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize Previously, it would successfully select and assert if not HSA or PAL when expanding the pseudoinstruction. We don't need the pseudoinstruction anymore since we know the total size after legalization.	2020-08-18 09:28:01 -04:00
Matt Arsenault	3ba7777b94	AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT The code to determine the value size was overcomplicated and only correct in the case where the result register already had a register class assigned. We can always take the size directly from the register's type.	2020-08-18 09:28:01 -04:00
Matt Arsenault	a9ee0589a8	AMDGPU/GlobalISel: Match global saddr addressing mode	2020-08-17 15:48:06 -04:00

1 2 3 4 5 ...

278 Commits