llvm-project

Commit Graph

Author	SHA1	Message	Date
Pierre van Houtryve	9e7febb4f7	[AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics Adds FP CCs opcodes/selection logic, including src mods selection Depends on D136591, D136448 Resolves #58326 (https://github.com/llvm/llvm-project/issues/58326) Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D136592	2022-11-22 14:18:58 +00:00
Petar Avramovic	0f3e72e86c	AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection When selectVOP3PMadMixModsImpl fails, it can still create new copy instr via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr will remain dead but will not be automatically removed. InstructionSelect does not check if instructions created during selection are dead. Such dead copy doesn't have register class on dst operand and causes crash. Fix is to build copy when operands are being added to selected instruction. Differential Revision: https://reviews.llvm.org/D138044	2022-11-18 18:02:26 +01:00
Pierre van Houtryve	767999fca8	[AMDGPU][GlobalISel] Support mad/fma_mix selection Adds support for selecting the following instructions using GlobalISel: - v_mad_mix/v_fma_mix - v_mad_mixhi/v_fma_mixhi - v_mad_mixlo/v_fma_mixlo To select those instructions properly, some additional changes were needed which impacted other tests as well. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134354	2022-11-08 08:02:34 +00:00
Pierre van Houtryve	c93104073c	[AMDGPU] Always lower SHUFFLE_VECTOR Make it illegal, remove InstructionSelector logic for it Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134967	2022-10-04 14:23:17 +00:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Petar Avramovic	6db7921b65	AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd Remove manual selection for atomic fadd from global-isel. Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD which corresponds to llvm-ir's atomicrmw fadd instruction. global and flat atomic fadd patterns changes: Split rtn/no-rtn patterns Add missing patterns or fix predicates Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors). Patterns now check addrspace of pointer, added patterns for flat intrinsic. with global addrspace pointer that selects into global atomic instruction. buffer atomic fadd patterns changes: Rdit patterns to import into global-isel. Remove gfx6/gfx7 _addr64 and _offset patterns. Remove patterns that can't be reached (same pattern but different feature). Differential Revision: https://reviews.llvm.org/D130579	2022-09-23 17:52:10 +02:00
Jay Foad	3822a01e0b	[AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction Differential Revision: https://reviews.llvm.org/D133928	2022-09-15 16:46:14 +01:00
Ivan Kosarev	f33645301e	[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130263	2022-09-05 12:53:05 +01:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Joe Nash	20d20156f4	[AMDGPU] gfx11 VINTERP intrinsics and ISel support Depends on D127664 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127756	2022-06-17 09:16:59 -04:00
Joe Nash	2d43de13df	[AMDGPU] gfx11 new dot instruction codegen support Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127904	2022-06-16 14:19:34 -04:00
Jay Foad	7b9f620e78	[AMDGPU] Work around GFX11 flat scratch SVS swizzling bug Differential Revision: https://reviews.llvm.org/D127635	2022-06-13 21:00:42 +01:00
Nicolai Hähnle	5df2893a9a	AMDGPU: Add G_AMDGPU_MAD_64_32 instructions These generic instructions are trivially selected to V_MAD_[IU]64_[IU]32 instructions when run on the VALU. When at least both factors are scalar, it is usually better to execute some or all of the instruction on the SALU. To this end, we lower the instruction to simpler instructions that are supported on the SALU when applying the register bank mapping. Differential Revision: https://reviews.llvm.org/D124843	2022-05-27 12:36:17 -05:00
Stanislav Mekhanoshin	dee3190293	[AMDGPU] Add llvm.amdgcn.global.load.lds intrinsic Differential Revision: https://reviews.llvm.org/D125279	2022-05-17 12:35:27 -07:00
Stanislav Mekhanoshin	791ec1c68e	[AMDGPU] Add intrinsics llvm.amdgcn.{raw\|struct}.buffer.load.lds Differential Revision: https://reviews.llvm.org/D124884	2022-05-17 10:32:13 -07:00
Stanislav Mekhanoshin	6e3e14f600	[AMDGPU] Support gfx940 smfmac instructions Differential Revision: https://reviews.llvm.org/D122191	2022-03-24 12:40:42 -07:00
Abinav Puthan Purayil	f59cb41ba1	[AMDGPU] Select buffer_atomic_cmpswap* in tblgen This change replaces the manual selection of buffer_atomic_cmpswap* instructions in SelectionDAG and GlobalISel with a tblgen based selection in BUFInstructions.td. This allows us to select the return and no-return variants in tblgen. Differential Revision: https://reviews.llvm.org/D121770	2022-03-17 10:12:32 +05:30
Stanislav Mekhanoshin	c4500de255	[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions Differential Revision: https://reviews.llvm.org/D121634	2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin	36fe3f13a9	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254	2022-03-14 15:23:36 -07:00
Matt Arsenault	7f26a1027f	AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address. We want to fold fold into the MUBUF form to use the SP in the SGPR offset, and previously we were special casing the interpretation of the pointer value if the access memory operand said it was relative to the stack pointer. `690f5b7a01` removed this check, and moved the DAG path to special casing copies from SGPRs. This is not an entirely sound approach, since it's still changing the interpretation of pointer values based the context. Introduce a new pseudo which corresponds to the wave-to-vector address transform. This way the memory instruction has consistent semantics where the incoming pointer is always interpreted as a vector address, and we're not obligated to optimize into the MUBUF offset-only addressing mode. The DAG should probably have an equivalent pseudo. This should fix some correctness issues, and folding this into addressing modes will be a future optimization patch.	2022-01-19 10:13:31 -05:00
Kazu Hirata	41bfac6aed	[Target] Remove unused forward declarations (NFC)	2022-01-02 10:20:15 -08:00
Abinav Puthan Purayil	078da26b1c	[AMDGPU] Check for unneeded shift mask in shift PatFrags. The existing constrained shift PatFrags only dealt with masked shift from OpenCL front-ends. This change copies the X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in the shift PatFrag predicates. Differential Revision: https://reviews.llvm.org/D113448	2021-11-24 10:53:12 +05:30
Kazu Hirata	aee86f9b6c	[AMDGPU] Remove unused declaration selectSMRD (NFC) The function body proper was removed on Feb 20, 2019 in commit `79b5c3842b`.	2021-11-07 09:53:18 -08:00
Daniil Fukalov	48958d02d2	[NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596	2021-08-25 12:01:55 +03:00
Brendon Cahoon	f9f5d41545	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
Sebastian Neubauer	cc7add5298	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
Jay Foad	5d0e9ddfa5	[AMDGPU][GlobalISel] Add support for global atomicrmw fadd This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously implemented for selecting from the llvm.amdgcn.global.atomic.fadd intrinsic. Differential Revision: https://reviews.llvm.org/D97767	2021-03-31 11:13:00 +01:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Jay Foad	7e9ceed9a2	[TableGen][GlobalISel] Allow duplicate RendererFns Allow different GICustomOperandRenderers to use the same RendererFn. This avoids the need for targets to define a bunch of identical C++ renderer functions with different names. Without this fix TableGen would have emitted code that tried to define the GICR enumeration with duplicate enumerators. Differential Revision: https://reviews.llvm.org/D96587	2021-02-12 15:05:32 +00:00
Kazu Hirata	6bde085366	[AMDGPU] Forward-declare TargetRegisterClass (NFC) AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a forward declaration of TargetRegisterClass in InstructionSelector.h. This patch adds a forward declaration right in AMDGPUInstructionSelector.h. While we are at it, this patch removes the one in InstructionSelector.h, where it is unnecessary.	2021-01-26 20:00:16 -08:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Stanislav Mekhanoshin	d15119a02d	[AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670	2020-12-22 16:33:06 -08:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Jay Foad	040c50278c	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Stanislav Mekhanoshin	27a62f6317	[AMDGPU] global-isel support for RT Differential Revision: https://reviews.llvm.org/D87847	2020-09-24 10:29:45 -07:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Petar Avramovic	09b8871f8d	AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands Predicates with 'let PredicateCodeUsesOperands = 1' want to examine matched operands. When we encounter predicate code that uses operands, analyze its named operand arguments and create a map between argument index and name. Later, when leaf node with name is encountered, emit GIM_RecordNamedOperand that will store that operand at its argument index in operand list. This operand list will be an argument to c++ code of the predicate. Differential Revision: https://reviews.llvm.org/D87285	2020-09-14 10:39:56 +02:00
Jay Foad	831457c6d5	[AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guaranteed to come to the same point at the same time. This is the same optimization that was implemented for SelectionDAG in D31731. Differential Revision: https://reviews.llvm.org/D86609	2020-08-26 13:47:51 +01:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Matt Arsenault	2f5f5febf3	AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize Previously, it would successfully select and assert if not HSA or PAL when expanding the pseudoinstruction. We don't need the pseudoinstruction anymore since we know the total size after legalization.	2020-08-18 09:28:01 -04:00
Matt Arsenault	a9ee0589a8	AMDGPU/GlobalISel: Match global saddr addressing mode	2020-08-17 15:48:06 -04:00
Matt Arsenault	0dc4c36d3a	AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane Fixup the special case constant bus handling pre-gfx10.	2020-08-11 11:56:16 -04:00
Matt Arsenault	a0ec81f70d	AMDGPU/GlobalISel: Merge load/store select cases	2020-08-10 08:46:26 -04:00
Matt Arsenault	dcf3ffb0a8	AMDGPU/GlobalISel: Move frame index selection to patterns Doesn't really save any code until global value is handled too.	2020-08-06 10:42:15 -04:00
Matt Arsenault	89011fc3c9	AMDGPU/GlobalISel: Select llvm.returnaddress	2020-08-04 17:14:38 -04:00
Matt Arsenault	59fac51ff2	AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant	2020-07-29 14:24:21 -04:00

1 2 3 4

151 Commits