llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	d15119a02d	[AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670	2020-12-22 16:33:06 -08:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Jay Foad	4f87d30a06	[AMDGPU] Introduce and use isGFX10Plus. NFC. It's more future-proof to use isGFX10Plus from the start, on the assumption that future architectures will be based on current architectures. Also make use of the existing isGFX9Plus in a few places. Differential Revision: https://reviews.llvm.org/D92092	2020-11-26 09:02:36 +00:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Jessica Paquette	b184a2eccf	[GlobalISel] Add matchers for specific constants and a matcher for negations It's fairly common to need matchers for a specific constant value, or for common idioms like finding a negated register. Add - `m_SpecificICst`, which returns true when matching a specific value.. - `m_ZeroInt`, which returns true when an integer 0 is matched. - `m_Neg`, which returns when a register is negated. Also update a few places which use idioms related to the new matchers. Differential Revision: https://reviews.llvm.org/D91397	2020-11-13 09:24:54 -08:00
Jay Foad	0ad4d04002	[AMDGPU] Remove an unused return value. NFC. Differential Revision: https://reviews.llvm.org/D91063	2020-11-10 09:15:14 +00:00
Stanislav Mekhanoshin	f738aee0bb	[AMDGPU] Add default 1 glc operand to rtn atomics This change adds a real glc operand to the return atomic instead of just string " glc" in the middle of the asm string. Improves asm parser diagnostics. Differential Revision: https://reviews.llvm.org/D90730	2020-11-05 10:41:59 -08:00
Jay Foad	040c50278c	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jay Foad	7a79921edd	[AMDGPU] Remove gds operand from ds_gws_* MachineInstrs The operand value was always 1 (except in some bad MIR tests) so it was redundant. Differential Revision: https://reviews.llvm.org/D90378	2020-10-29 15:04:23 +00:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Rodrigo Dominguez	f71f5f39f6	[AMDGPU] Implement hardware bug workaround for image instructions Summary: This implements a workaround for a hardware bug in gfx8 and gfx9, where register usage is not estimated correctly for image_store and image_gather4 instructions when D16 is used. Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81172	2020-10-07 07:39:52 -04:00
Sebastian Neubauer	6a089ce0e4	[AMDGPU] Use tablegen for argument indices Use tablegen generic tables to get the index of image intrinsic arguments. Before, the computation of which image intrinsic argument is at which index was scattered in a few places, tablegen, the SDag instruction selection and GlobalISel. This patch changes that, so only tablegen contains code to compute indices and the ImageDimIntrinsicInfo table provides these information. Differential Revision: https://reviews.llvm.org/D86270	2020-10-05 11:50:52 +02:00
Stanislav Mekhanoshin	27a62f6317	[AMDGPU] global-isel support for RT Differential Revision: https://reviews.llvm.org/D87847	2020-09-24 10:29:45 -07:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Petar Avramovic	6e2a86ed5a	AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN Check for NoNaNsFPMath function attribute in isKnownNeverSNaN. Function attributes are in held in 'TargetMachine.Options'. Among other things, this allows selection of some patterns imported in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN returns true in lowerFMinNumMaxNum. However we notice some incorrect results since function attributes are not correctly written in TargetMachine.Options when next function is processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0, it has "no-nans-fp-math"="false" but TargetMachine.Options still has it set to true since first function in test file had this attribute set to true. This will be fixed in D87511. Differential Revision: https://reviews.llvm.org/D87456	2020-09-14 12:11:00 +02:00
Petar Avramovic	09b8871f8d	AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands Predicates with 'let PredicateCodeUsesOperands = 1' want to examine matched operands. When we encounter predicate code that uses operands, analyze its named operand arguments and create a map between argument index and name. Later, when leaf node with name is encountered, emit GIM_RecordNamedOperand that will store that operand at its argument index in operand list. This operand list will be an argument to c++ code of the predicate. Differential Revision: https://reviews.llvm.org/D87285	2020-09-14 10:39:56 +02:00
Jay Foad	831457c6d5	[AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guaranteed to come to the same point at the same time. This is the same optimization that was implemented for SelectionDAG in D31731. Differential Revision: https://reviews.llvm.org/D86609	2020-08-26 13:47:51 +01:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Matt Arsenault	2f5f5febf3	AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize Previously, it would successfully select and assert if not HSA or PAL when expanding the pseudoinstruction. We don't need the pseudoinstruction anymore since we know the total size after legalization.	2020-08-18 09:28:01 -04:00
Matt Arsenault	3ba7777b94	AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT The code to determine the value size was overcomplicated and only correct in the case where the result register already had a register class assigned. We can always take the size directly from the register's type.	2020-08-18 09:28:01 -04:00
Matt Arsenault	a9ee0589a8	AMDGPU/GlobalISel: Match global saddr addressing mode	2020-08-17 15:48:06 -04:00
Matt Arsenault	c8a9872259	AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset We may have an SGPR->VGPR copy if a totally uniform pointer calculation is used for a VGPR pointer operand. Also hack around a bug in MUBUF matching which would incorrectly use MUBUF for global when flat was requested. This should really be a predicate on the parent pattern, but the DAG always checked this manually inside the complex pattern.	2020-08-17 12:31:38 -04:00
Matt Arsenault	625db2fe5b	AMDGPU: Remove slc from flat offset complex patterns This was always set to 0. Use a default value of 0 in this context to satisfy the instruction definition patterns. We can't unconditionally use SLC with a default value of 0 due to limitations in TableGen's handling of defaulted operands when followed by non-default operands.	2020-08-15 12:12:24 -04:00
Matt Arsenault	0dc4c36d3a	AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane Fixup the special case constant bus handling pre-gfx10.	2020-08-11 11:56:16 -04:00
Matt Arsenault	40188f807d	AMDGPU/GlobalISel: Don't try to handle undef source operand This is now illegal MIR	2020-08-10 08:49:43 -04:00
Matt Arsenault	a0ec81f70d	AMDGPU/GlobalISel: Merge load/store select cases	2020-08-10 08:46:26 -04:00
Matt Arsenault	63cdc9a49f	AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd\|fmin\|fmax} These intrinsics are missing mangling for both the pointer and data type.	2020-08-06 11:09:08 -04:00
Matt Arsenault	dcf3ffb0a8	AMDGPU/GlobalISel: Move frame index selection to patterns Doesn't really save any code until global value is handled too.	2020-08-06 10:42:15 -04:00
Matt Arsenault	d188a608bd	AMDGPU: Fix code duplication between the selectors Not sure this is the right place for this helper.	2020-08-06 10:42:15 -04:00
Matt Arsenault	5316256709	AMDGPU/GlobalISel: Fix assert on copy to vcc This was trying to constrain a physical register. By the verifier's understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so don't try to handle physregs.	2020-08-06 09:41:14 -04:00
Matt Arsenault	486e84dfa4	AMDGPU/GlobalISel: Use live in helper function for returnaddress	2020-08-04 17:36:01 -04:00
Matt Arsenault	89011fc3c9	AMDGPU/GlobalISel: Select llvm.returnaddress	2020-08-04 17:14:38 -04:00
Matt Arsenault	0de547ed4a	AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES Fixes verifier error with SGPR unmerges with 96-bit result types.	2020-08-04 12:27:34 -04:00
Matt Arsenault	2414bab5d7	AMDGPU/GlobalISel: Remove old hacks for boolean selection There were various hacks used to try to avoid making s1 SGPR vs. s1 VCC ambiguous after constraining the register before we had a strategy to deal with this. This also attempted to handle undef operands, which are now illegal gMIR.	2020-08-03 09:04:14 -04:00
Matt Arsenault	d8ef1d1251	AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext These should probably not be legal in the first place, but that might also be a pain.	2020-08-03 08:36:41 -04:00
Stanislav Mekhanoshin	5b32518f96	[AMDGPU] Do not use undef on indirect source We are using undef on the indirect move source subreg and then using implicit super-reg. This creates a problem in RA when Greedy decides to split the register. It reassigns the implicit super-reg but does not bother to change undef source because it is really does not matter. The fix is to stop lying to RA and drop undef flag. This has also hit a problem in SIFoldOperands as it can fold immediate into an indirect move since there is no undef flag anymore. That results in multiple test failures, so added the check for this case. Differential Revision: https://reviews.llvm.org/D84899	2020-07-30 10:41:59 -07:00
Matt Arsenault	59fac51ff2	AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant	2020-07-29 14:24:21 -04:00
Matt Arsenault	bb23b5cfe0	AMDGPU/GlobalISel: Merge identical select cases	2020-07-28 11:42:17 -04:00
Matt Arsenault	d35e2c101d	AMDGPU/GlobalISel: Fix not constraining ds_append/consume operands	2020-07-26 10:17:36 -04:00
Matt Arsenault	5819159995	AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting	2020-07-26 09:55:34 -04:00
Matt Arsenault	4033aa1467	AMDGPU/GlobalISel: Sign extend integer constants This matches the DAG behavior and fixes immediate folding	2020-07-26 09:30:14 -04:00
Matt Arsenault	392b969c32	AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits Just fallback for now. Really tablegen needs to generate all of the subregister index handling we need.	2020-07-25 10:05:44 -04:00
Matt Arsenault	21ef01b7e3	AMDGPU: Remove outdated fixme	2020-07-20 11:41:41 -04:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Petar Avramovic	5658002b80	AMDGPU/GlobalISel: Select G_FREEZE Select G_FREEZE in the same way that COPY is selected. Differential Revision: https://reviews.llvm.org/D83031	2020-07-16 11:10:48 +02:00
Mirko Brkusanin	38998cfa9c	[AMDGPU][GlobalISel] Fix subregister index for EXEC register in selectBallot. Temporarily remove subregister for EXEC in selectBallot added in https://reviews.llvm.org/D83214 to fix failures on expensive checks buildbot.	2020-07-13 13:35:34 +02:00
Mirko Brkusanin	ce23e54162	[AMDGPU][GlobalISel] Select llvm.amdgcn.ballot Select ballot intrinsic for GlobalISel. Differential Revision: https://reviews.llvm.org/D83214	2020-07-13 12:14:43 +02:00
Petar Avramovic	d717382633	AMDGPU/GlobalISel: Select icmp intrinsic Select into corresponding V_CMP instruction based on CmpInst predicate, stored as immediate, in last operand. Differential Revision: https://reviews.llvm.org/D82652	2020-06-30 10:57:41 +02:00
Matt Arsenault	fb51d508ee	AMDGPU/GlobalISel: Select general case for G_PTRMASK	2020-06-14 13:12:29 -04:00
Sebastian Neubauer	29a6ad94fd	[AMDGPU] Add G16 support to image instructions Add G16 feature for GFX10 and support A16 and G16 in GlobalISel. Differential Revision: https://reviews.llvm.org/D76836	2020-06-12 11:26:31 +02:00
Matt Arsenault	bb10fa3a53	AMDGPU: Fix wrong null value for private address space I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before. Also don't allow offset folding for -1 pointers, since it looks weird to partially fold this.	2020-05-26 16:35:13 -04:00
Matt Arsenault	50d4b22ca0	AMDGPU/GlobalISel: Fix assert on 16-bit G_EXTRACT results I consider this to be a hack, since we probably should not mark any 16-bit extract as legal, and require all extracts to be done on multiples of 32. There are quite a few more battles to fight in the legalizer for sub-dword vectors, so just select this for now so we can pass OpenCL conformance without crashing. Also fix the same assert for G_INSERTs. Unlike G_EXTRACT there's not a trivial way to select this so just fail on it.	2020-05-26 12:14:08 -04:00
Matt Arsenault	8bc03d2168	GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic Confusingly, these were unrelated and had different semantics. The G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a different format. G_PTR_MASK only allows clearing the low bits of a pointer, and only a constant number of bits. The ptrmask intrinsic allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic. Only selects the cases that look like the old instruction. More work is needed to select the general case. Also new legalization code is still needed to deal with the case where the incoming mask size does not match the pointer size, which has a specified behavior in the langref.	2020-05-26 11:48:13 -04:00
Matt Arsenault	2dd7714b8d	AMDGPU/GlobalISel: Don't select boolean phi by default This is currently missing most of the hard parts to lower correctly, so disable it for now. This fixes at least one OpenCL conformance test and allows it to pass with fallback. Hide this behind an option for now.	2020-05-26 11:01:21 -04:00
Matt Arsenault	7dece2fde3	AMDGPU: Use Register	2020-04-21 15:19:35 -04:00
Jay Foad	858d8db470	AMDGPU/GlobalISel: Work around another selector crash This does for G_EXTRACT_VECTOR_ELT what `588bd7be36` did for G_TRUNC. Ideally types without a corresponding register class wouldn't reach here, but we're currently missing some (in particular a 192-bit class is missing).	2020-04-17 12:07:54 +01:00
Matt Arsenault	588bd7be36	AMDGPU/GlobalISel: Work around a selector crash Ideally types without a corresponding register class wouldn't reach here, but we're currently missing some (in particular a 192-bit class is missing).	2020-04-15 14:38:50 -04:00
Matt Arsenault	cc149172da	AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS This wasn't covered by existing tablegen patterns, but also suffers the same issues as G_FNEG. Workaround them by manually selecting, like G_FNEG.	2020-04-14 22:05:22 -04:00
Matt Arsenault	8a5f0dafd4	AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale	2020-04-06 11:50:19 -04:00
Austin Kerbow	30f18ed387	[AMDGPU] Handle SMRD signed offset immediate Summary: This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a signed byte offset immediate, however we can overflow into a negative since we treat it as unsigned. Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We sometimes tried to use negative values here. Third, S_BUFFER instructions should never use a signed offset immediate. Differential Revision: https://reviews.llvm.org/D77082	2020-04-02 17:41:52 -07:00
Simon Pilgrim	be7a233e93	Fix operator precedence warning. NFCI.	2020-04-01 14:36:52 +01:00
Matt Arsenault	d0dd24a381	AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources No test since these cases shouldn't really be getting through the legalizer.	2020-03-30 18:14:04 -04:00
Matt Arsenault	bcb643c8af	AMDGPU/GlobalISel: Handle image atomics	2020-03-30 17:41:04 -04:00
Matt Arsenault	48eda37282	AMDGPU/GlobalISel: Start selecting image intrinsics Does not handled atomics yet.	2020-03-30 17:33:04 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Matt Arsenault	ed72bcae34	AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul We weren't considering the packed case correctly, and this was passing through to the selector. The selector only checked the size, so this would incorrectly compile to a single 32-bit scalar add. As usual, the LegalizerHelper is somewhat awkward to use from applyMappingImpl. I think this is the first place we've needed multi-step legalization here though.	2020-03-09 22:51:54 -04:00
Matt Arsenault	15bf916b54	AMDGPU: Remove VOP3OpSelMods0 complex pattern Use default operand of 0 instead.	2020-03-04 17:18:22 -05:00
Matt Arsenault	0b46b078b6	AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding We use some s32 values in VOP3P operands, and won't see any intervening casts from a 32-bit fneg. Make sure it's really a packed fneg before folding.	2020-02-24 21:20:35 -05:00
Matt Arsenault	bf4933b4ea	AMDGPU/GlobalISel: Remove dead code	2020-02-21 19:19:32 -05:00
Jay Foad	b72f1448ce	AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74987	2020-02-21 21:16:39 +00:00
Matt Arsenault	dfce5fd50a	AMDGPU/GlobalISel: Select VOP3P instructions This only handles the basic cases. More work is needed to make better use of op_sel.	2020-02-21 13:35:40 -05:00
Matt Arsenault	72eef820d5	AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel for VOP3P instructions. Expand it in some other way in case it doesn't fold into the use instructions.	2020-02-21 13:35:40 -05:00
Matt Arsenault	043ed2e22a	AMDGPU/GlobalISel: Fix xnor matching We should try the generated matchers before the manual selection. This means the patterns are now handling the common cases, but the manual selection code is not yet dead. It's still handling the non-s32/s64 cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice way to have a single pattern that covers multiple types.	2020-02-21 11:42:49 -05:00
Matt Arsenault	ac7abe0ba9	AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC We have patterns for s_pack* selection, but they assume the inputs are a build_vector with 16-bit inputs, not a truncating build vector. Since there's still outstanding work for how to handle mismatched result and source element vector operations, and since I'm trying a different packed vector strategy than SelectionDAG, just manually select this for now.	2020-02-21 10:34:11 -05:00
Matt Arsenault	b64aa8c715	AMDGPU/GlobalISel: Fix constant bus violation with source modifiers This looked through copies to find the source modifiers, which may have been SGPR->VGPR copies added to avoid potential constant bus violations. Re-insert a copy to a VGPR if this happens.	2020-02-21 10:30:23 -05:00
Matt Arsenault	ff4639f060	AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg I'm not sure why this isn't a pattern, but the DAG manually selects this.	2020-02-19 06:19:22 -08:00
Matt Arsenault	86813e2768	AMDGPU/GlobalISel: Select llvm.amdgcn.s.buffer.load Doesn't try to fail on the dlc bit pre-gfx10 like the DAG lowering does.	2020-02-17 08:02:40 -08:00
Matt Arsenault	e5805529bf	AMDGPU/GlobalISel: Select v2s32->v2s16 G_TRUNC It would be nice if there was a way to avoid the tied operand, but as far as I can tell there isn't a way to use or with op_sel to achieve this	2020-02-17 09:20:13 -05:00
Matt Arsenault	8d8d46b57a	AMDGPU/GlobalISel: Fix missing impdef of scc on boolean bit ops	2020-02-14 22:35:30 -05:00
Matt Arsenault	dc3e499dd4	AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results This would assert on an unhandled size in getRegSplitParts.	2020-02-14 15:57:40 -08:00
Austin Kerbow	3a312c3ee5	[AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned Differential Revision: https://reviews.llvm.org/D74261	2020-02-11 16:57:13 -08:00
Stanislav Mekhanoshin	453a8f3af7	[AMDGPU] Remove AMDGPURegisterInfo R600 and GCN do not have anything in common in terms of register file organization anymore. Differential Revision: https://reviews.llvm.org/D74426	2020-02-11 11:13:38 -08:00
Matt Arsenault	2126c70e3a	AMDGPU/GlobalISel: Don't mis-select vector index on a constant Vector indexing with a constant index should be folded out in the legalizer, but this was accidentally falling through. This would produce the indexing operation with $noreg. Handle this case as a dynamic index just in case a bug like this happens again in the future.	2020-02-09 18:02:37 -05:00
Matt Arsenault	12fe9b26ec	AMDGPU/GlobalISel: Select G_SEXT_INREG	2020-02-04 13:23:53 -08:00
Matt Arsenault	49e424e08e	AMDGPU/GlobalISel: Select global MUBUF atomicrmw	2020-01-31 06:05:41 -08:00
Matt Arsenault	0426c2d07d	Reapply "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `6a4acb9d80`.	2020-01-31 06:01:28 -08:00
Matt Arsenault	6a4acb9d80	Revert "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `17dbc6611d`. A test is failing on some bots	2020-01-30 15:39:51 -08:00
Matt Arsenault	17dbc6611d	AMDGPU: Cleanup and fix SMRD offset handling I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the offsets are dword aligned before folding.	2020-01-30 15:04:21 -08:00
Matt Arsenault	d6b83d6ba5	AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal This is always a G_CONSTANT already	2020-01-30 09:38:43 -05:00
Austin Kerbow	2605adb69c	[AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73585	2020-01-29 10:42:12 -08:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Matt Arsenault	94e8ef4d4c	AMDGPU/GlobalISel: Look through copies for source modifiers When all VOP instructions are legalized to VGPRs, any SGPR source modifiers will have a copy in the way.	2020-01-29 08:08:13 -08:00
Matt Arsenault	02adfb5155	AMDGPU/GlobalISel: Manually select scalar f64 G_FNEG This should be no problem to support with a pattern, but it turns out there are just too many yaks to shave. The main problem is in the DAG emitter, which I have no desire to sink effort into fixing. If we had a bit to disable patterns in the DAG importer, fixing the GlobalISelEmitter is more manageable.	2020-01-29 06:49:16 -08:00
Matt Arsenault	d2a9739274	AMDGPU/GlobalISel: Eliminate SelectVOP3Mods_f32 Trivial type predicates should be moved into the tablegen pattern itself, and not checked inside complex patterns. This eliminates a redundant complex pattern, and fixes select source modifiers for GlobalISel. I have further patches which fully handle select in tablegen and remove all of the C++ selection, although it requires the ugliness to support the entire range of legal register types.	2020-01-27 17:53:54 -05:00
Matt Arsenault	533d650e94	AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling Treat this the same way as loads. There's less value to the intermediate nodes, but it's good to be consistent.	2020-01-27 14:59:30 -05:00
Matt Arsenault	09ed0e44d9	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load	2020-01-27 13:40:37 -05:00
Matt Arsenault	fc90222a91	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load Use intermediate instructions, unlike with buffer stores. This is necessary because of the need to have an internal way to distinguish between signed and unsigned extloads. This introduces some duplication and near duplication with the buffer store selection path. The store handling should maybe be moved into legalization to match and eliminate the duplication.	2020-01-27 12:49:23 -05:00
Matt Arsenault	e60d658260	AMDGPU/GlobalISel: Handle VOP3NoMods	2020-01-27 09:03:44 -08:00
Matt Arsenault	0968234590	AMDGPU/GlobalISel: Minor refactor of MUBUF complex patterns This will make it easier to support the small variants in the complex patterns for atomics.	2020-01-27 09:00:00 -08:00
Matt Arsenault	ac0b9b4ccf	AMDPGPU/GlobalISel: Select more MUBUF global addressing modes The handling of the high bits of the resource descriptor seem weird to me, where the 3rd dword changes based on the instruction.	2020-01-27 07:28:36 -08:00
Matt Arsenault	fdaad485e6	AMDGPU/GlobalISel: Initial selection of MUBUF addr64 load/store Fixes the main reason for compile failures on SI, but doesn't really try to use the addressing modes yet.	2020-01-27 07:13:56 -08:00
Maheaha Shivamallappa	66f93071cd	AMDGPU/GlobalISel: Clean-up code around ISel for Intrinsics. Summary: A minor code clean-up around ISel for intrinsic llvm.amdgcn.end.cf() Reviewers: arsenm, mshivama Reviewed By: arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D73358	2020-01-26 14:09:31 +05:30
Matt Arsenault	3b93945587	AMDGPU/GlobalISel: Select wqm, softwqm and wwm intrinsics	2020-01-24 13:06:44 -08:00
Matt Arsenault	1192d7b254	AMDGPU/GlobalISel: Handle 16-bank LDS llvm.amdgcn.interp.p1.f16 The pattern is also mishandled by the generated matcher, so workaround this as in the DAG path. The existing DAG tests aren't particularly targeted to just this one intrinsic. These also end up differing in scheduling from SGPR->VGPR operand constraint copies.	2020-01-22 12:10:59 -05:00
Matt Arsenault	52ec7379ad	AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT Move the subregister base like in the extract case.	2020-01-22 11:09:15 -05:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Matt Arsenault	e3d352c541	AMDGPU/GlobalISel: Fold constant offset vector extract indexes Handle dynamic vector extracts that use an index that's an add of a constant offset into moving the base subregister of the indexing operation. Force the add into the loop in regbankselect, which will be recognized when selected.	2020-01-22 10:50:59 -05:00
Matt Arsenault	a722cbf77c	AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec The intermediate instruction drops the extra volatile argument. We are missing an atomic ordering on these.	2020-01-22 09:26:17 -05:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	ec9628318d	AMDGPU/GlobalISel: Select DS append/consume	2020-01-17 20:09:53 -05:00
Matt Arsenault	9b2f3532c7	AMDGPU/GlobalISel: Select DS GWS intrinsics	2020-01-16 11:25:10 -05:00
Matt Arsenault	711a17afaf	AMDGPU/GlobalISel: Select exp with patterns This does produce slightly different code. Now a unique IMPLICIT_DEF is emitted for each of the implicit_def operands, rather than reusing the same one.	2020-01-15 18:33:15 -05:00
Matt Arsenault	203801425d	AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add\|swap}	2020-01-13 13:09:38 -05:00
Matt Arsenault	35c3d101ae	AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.	2020-01-09 19:52:24 -05:00
Matt Arsenault	b4a647449f	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.	2020-01-09 17:37:52 -05:00
Matt Arsenault	7d67742160	AMDGPU/GlobalISel: Fix import of zext of s16 op patterns	2020-01-09 10:29:32 -05:00
Matt Arsenault	e71af77568	AMDGPU/GlobalISel: Add IMMPopCount xform Partially fixes BFE pattern import.	2020-01-09 10:29:32 -05:00
Matt Arsenault	79450a4ea2	AMDGPU/GlobalISel: Add selectVOP3Mods_nnan This doesn't enable any new imports yet, but moves the fmed patterns from failing on this to hitting the "complex suboperand referenced more than once" limitation in tablegen.	2020-01-09 10:29:32 -05:00
Matt Arsenault	d964086c62	AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32 Only partially fixes one pattern import.	2020-01-09 10:29:31 -05:00
Matt Arsenault	3952748ffd	AMDGPU/GlobalISel: Fix add of neg inline constant pattern	2020-01-09 10:29:31 -05:00
Matt Arsenault	c3a10faadc	AMDGPU: Remove VOP3Mods0Clamp0OMod Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.	2020-01-07 15:10:08 -05:00
Matt Arsenault	d4c9e13324	AMDGPU/GlobalISel: Select G_UADDE/G_USUBE	2020-01-06 18:27:52 -05:00
Matt Arsenault	4e85ca9562	AMDGPU/GlobalISel: Replace handling of boolean values This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.	2020-01-06 18:26:42 -05:00
Matt Arsenault	14d25052a2	AMDGPU: Use ImmLeaf for inline immediate predicates	2020-01-06 17:21:51 -05:00
Matt Arsenault	e4464bf3d4	AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR	2020-01-06 11:19:33 -05:00
Matt Arsenault	f1c85ecdfc	AMDGPU/GlobalISel: Select more G_EXTRACTs correctly This assumed a 32-bit extract size, which would produce invalid copies with 64-bit extracts. Handle the easy case. Ideally we would have a way to get the proper subreg index for any 32-bit offset, but there should probably be a tablegenerated way of getting the subreg index for any size and offset.	2020-01-06 11:10:13 -05:00
Matt Arsenault	9861a8538c	AMDGPU/GlobalISel: Add new utils file There are some things that are shareable between the legalizer, regbankselect, and the selector that don't have an obvious place to go.	2020-01-03 15:25:50 -05:00
Matt Arsenault	53fc484067	AMDGPU/GlobalISel: Fix off by one in operand index This should be looking at the RHS of the add for a constant.	2020-01-03 10:30:30 -05:00
Matt Arsenault	25e7da0c24	AMDGPU/GlobalISel: Remove manual G_FENCE selection The tablegen emitter now handles the immediate operand correctly, so let the generatedd matcher works.	2020-01-02 17:16:10 -05:00
Matt Arsenault	48e0e68edb	AMDGPU/GlobalISel: Re-use MRI available in selector	2019-12-30 13:00:17 -05:00
Matt Arsenault	dff3f8d742	AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xor	2019-12-21 04:55:36 -05:00
Matt Arsenault	bc276c6379	GlobalISel: Lower s1 source G_SITOFP/G_UITOFP	2019-11-15 13:37:20 +05:30
Daniel Sanders	e74c5b9661	[globalisel] Rename G_GEP to G_PTR_ADD Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734	2019-11-05 10:31:17 -08:00
Matt Arsenault	f9a42ed0a7	AMDGPU: Relax 32-bit SGPR register class Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267	2019-10-18 18:26:37 +00:00
Matt Arsenault	538b73b797	AMDGPU/GlobalISel: Handle more G_INSERT cases Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947	2019-10-07 19:16:26 +00:00
Matt Arsenault	0b2ea91d6d	AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943	2019-10-07 19:07:19 +00:00
Matt Arsenault	b4cbf9862c	AMDGPU/GlobalISel: Select more G_INSERT cases At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938	2019-10-07 18:43:31 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Matt Arsenault	e59296a051	AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsets llvm-svn: 373842	2019-10-06 01:41:22 +00:00
Matt Arsenault	412e0bf8f3	AMDGPU/GlobalISel: Select G_PTRTOINT llvm-svn: 373715	2019-10-04 08:35:37 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Matt Arsenault	86f864dace	AMDGPU/GlobalISel: Use getIntrinsicID helper llvm-svn: 373417	2019-10-02 01:02:27 +00:00
Matt Arsenault	fdea5e02ce	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP llvm-svn: 373298	2019-10-01 02:23:20 +00:00
Matt Arsenault	59b91aa93e	AMDGPU/GlobalISel: Add support for init.exec intrinsics TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296	2019-10-01 02:07:25 +00:00
Matt Arsenault	54167ea316	AMDGPU/GlobalISel: Select G_UADDO/G_USUBO llvm-svn: 373288	2019-10-01 01:23:13 +00:00
Matt Arsenault	76f44f6b53	AMDGPU/GlobalISel: Avoid getting MRI in every function Store it in AMDGPUInstructionSelector to avoid boilerplate in nearly every select function. llvm-svn: 373139	2019-09-28 03:41:13 +00:00
Bjorn Pettersson	169cb63478	[AMDGPU] Use std::make_tuple to make some toolchains happy again My toolchain stopped working (LLVM 8.0 , libstdc++ 5.4.0) after r372338. The same problem was seen in clang-cuda-build buildbots: clang-cuda-build/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:763:12: error: chosen constructor is explicit in copy-initialization return {Reg, 0, nullptr}; ^~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ This commit adds explicit calls to std::make_tuple to work around the problem. llvm-svn: 372384	2019-09-20 12:13:12 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00

1 2 3 4 5 ...

351 Commits