llvm-project

Commit Graph

Author	SHA1	Message	Date
Joe Nash	ebb258d3b0	[AMDGPU] Make V_SAT_PK_U8_I16 a True16 Instruction The return type is two u8 packed into a 16 bit VGPR, so this instruction should be True16. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D135478	2022-10-10 10:33:49 -04:00
Dmitry Preobrazhensky	f4b1cfa1cb	[AMDGPU][MC][GFX11] Correct e64_dpp variants of v_movreld and v_movrelsd Differential Revision: https://reviews.llvm.org/D135079	2022-10-05 16:47:18 +03:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Joe Nash	3e39ab25e6	[AMDGPU][GFX11] Fix dst register class for V_CVT_U32_U16 This instruction was referring to the wrong VOPProfile, likely due to a typo, leading to an incorrect destination register type. The MC layer will care about this change, but is NFC while 16-bit values actually use 32 bit registers. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132878	2022-08-30 14:01:25 -04:00
Joe Nash	70e7a1257c	[AMDGPU][NFC] Allow separate RC for VOP3 DPP Dst Create a field in VOPProfile called DstRCVOP3DPP to allow the VOP3 versions of DPP instructions to have a different destination register class than the non-VOP3 encoding. NFC for current instructions, but planned to be functional in upcoming ones. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D132673	2022-08-29 11:22:07 -04:00
Stanislav Mekhanoshin	9fa5a6b7e8	[AMDGPU] Support for gfx940 fp8 conversions Differential Revision: https://reviews.llvm.org/D129902	2022-07-18 11:48:43 -07:00
Joe Nash	07b7fada73	[AMDGPU] gfx11 VOPD instructions MC support VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only. A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D128218	2022-06-24 11:08:39 -04:00
Jay Foad	7e681ef35e	[AMDGPU] Add GFX11 codegen for llvm.amdgcn.mov.dpp8 Differential Revision: https://reviews.llvm.org/D127980	2022-06-16 19:44:28 +01:00
Dmitry Preobrazhensky	b26afab9d1	[AMDGPU][MC][GFX11] Correct src0 for dpp variants of v_cvt_*_e64 Differential Revision: https://reviews.llvm.org/D127847	2022-06-16 13:48:43 +03:00
Jay Foad	bfcfd53b92	[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane. Also use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D127662	2022-06-13 21:12:11 +01:00
Joe Nash	086a9c1062	Reland [AMDGPU] gfx11 VOP1+VOP2 Instruction MC support The reverted dependent commit is now relanded, so reland this. Includes dpp instructions and vop1/vop2 promoted to vop3 Patch 17/N for upstreaming of AMDGPU gfx11 architecture Depends on D126483 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126917	2022-06-08 11:10:57 -04:00
Joe Nash	f617f89e5b	Revert "[AMDGPU] gfx11 VOP1+VOP2 Instruction MC support" This reverts commit `6079804498`.	2022-06-06 17:11:35 -04:00
Joe Nash	6079804498	[AMDGPU] gfx11 VOP1+VOP2 Instruction MC support Includes dpp instructions and vop1/vop2 promoted to vop3 Patch 17/N for upstreaming of AMDGPU gfx11 architecture Depends on D126483 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126917	2022-06-06 09:57:59 -04:00
Dmitry Preobrazhensky	5c0bf1303e	[AMDGPU][MC][GFX10] Removed unsupported 64bit DPP opcodes Removed 64bit DPP opcodes from asm matcher tables. Differential Revision: https://reviews.llvm.org/D123611	2022-04-13 14:43:40 +03:00
Stanislav Mekhanoshin	e7b362d75d	[AMDGPU] Add v_mov_b64 gfx940 opcode Differential Revision: https://reviews.llvm.org/D121023	2022-03-07 12:07:12 -08:00
Jay Foad	05d79e3562	[AMDGPU] Divergence-driven instruction selection for bitreverse Differential Revision: https://reviews.llvm.org/D119702	2022-02-24 20:21:59 +00:00
Jay Foad	ff7f2cfa95	[AMDGPU] Add an implicit use of M0 to all V_MOV_B32_indirect_read/write NFCI. Previously the implicit use was added to V_MOV_B32_indirect_read when building the instruction. V_MOV_B32_indirect_write didn't have an implicit use of M0 at all, but apparently it did not cause any problems. Differential Revision: https://reviews.llvm.org/D114239	2021-11-19 19:00:17 +00:00
Jay Foad	30b27ecfc2	[AMDGPU] Use new opcode for indexed vgpr reads Introduce V_MOV_B32_indirect_read for indexed vgpr reads (and rename the old V_MOV_B32_indirect to V_MOV_B32_indirect_write) so they can be unambiguously distinguished from regular V_MOV_B32_e32. Previously they were distinguished by looking for extra implicit operands but this is fragile because regular moves sometimes have extra implicit operands too: - either by accident, when instructions end up with duplicate implicit operands (see e.g. D100939) - or by design, when SIInstrInfo::copyPhysReg breaks a multi-dword copy into individual subreg mov instructions and adds implicit operands for the super-register. The effect of this is that SIInstrInfo::isFoldableCopy can be simplified and identifies more foldable copies. The test diffs show that more immediate 0 values have been folded as inline operands. SIInstrInfo::isReallyTriviallyReMaterializable could probably be simplified too but that is not part of this patch. Differential Revision: https://reviews.llvm.org/D114230	2021-11-19 13:08:11 +00:00
Stanislav Mekhanoshin	4eb24817ec	[AMDGPU] Mark all relevant VOP1 instructions rematerializable Differential Revision: https://reviews.llvm.org/D105919	2021-07-21 14:05:32 -07:00
Stanislav Mekhanoshin	d46d534dbb	[AMDGPU] Make some VOP1 instructions rematerializable This is a pilot change to verify the logic. The rest will be done in a same way, at least the rest of VOP1. Differential Revision: https://reviews.llvm.org/D105742	2021-07-12 23:43:45 -07:00
Jay Foad	7f3ac6714a	[AMDGPU] Set SALU, VALU and other instruction type flags on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 13:36:02 +01:00
Jay Foad	323b3e645d	[AMDGPU] Set mayLoad and mayStore on Real instructions This does not affect codegen but might benefit llvm-mca.	2021-06-16 12:10:23 +01:00
Joe Nash	a0ed70abde	[AMDGPU] Remove redundant field from DPP8 def These lines set the value to what it already was, so they are redundant. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100664 Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3	2021-04-16 16:23:52 -04:00
Dmitry Preobrazhensky	0f5ebbcc7f	[AMDGPU][MC] Added flag to identify VOP instructions which have a single variant By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086. This fix simplifies handling of single VOP instructions by adding a dedicated flag. Differential Revision: https://reviews.llvm.org/D99408	2021-04-01 13:53:12 +03:00
Jay Foad	fc7e3e7dd9	[AMDGPU] Set SchedRW on real instructions Coyp SchedRW from pseudos to real instructions so that llvm-mca has access to it. This is NFC for normal compiler codegen, which schedules pseudos not real instructions. Add an llvm-mca test for some high latency double-precision instructions as a smoke test. Differential Revision: https://reviews.llvm.org/D99187	2021-03-23 15:38:11 +00:00
Joe Nash	5531f24cc2	[AMDGPU] Make OMod explicit for V_CVT_{U,I}* Make OMod explicit instead of implied by HasModifiers in the operand list. Requires explicitly setting HasOMod=1 for irregular OMod usage in instruction V_CVT_{U,I}* Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97587 Change-Id: I230e1476f529e816eec60e242531f23a99e3839f	2021-03-02 13:32:06 -05:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Mirko Brkusanin	608ac62540	[AMDGPU] Fix use of HasModifiers in VopProfile HasModifiers should be true if at least one modifier is used. This should make the use of this field bit more consistent. Differential Revision: https://reviews.llvm.org/D94795	2021-01-26 15:21:11 +01:00
Jay Foad	4926eed59c	[AMDGPU] Add a TRANS bit to TSFlags. NFC. This is used to mark transcendental instructions that execute on a separate pipeline from the normal VALU pipeline. Differential Revision: https://reviews.llvm.org/D92042	2020-11-24 17:49:56 +00:00
Dmitry Preobrazhensky	2e87acac9b	[AMDGPU] Removed s_mov_regrd and mov_fed opcodes These opcodes are not intended for public use. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81659	2020-07-17 19:52:54 +03:00
Matt Arsenault	9e03bdebc1	AMDGPU: Add llvm.amdgcn.sqrt intrinsic I spread the GlobalISel test into the regular one, which I've been avoiding so far.	2020-06-26 15:07:07 -04:00
Matt Arsenault	d259668731	AMDGPU: Set mayRaiseFPException This may be missing a few overrides to set it off still in some special cases. Since the flags set during selection should now be reliably preserved, this should not change codegen for non-strictfp functions.	2020-06-04 17:35:27 -04:00
Jay Foad	9ce0f7eed6	[AMDGPU] Introduce new sched classes for transcendental instructions This is in preparation for scheduling them slightly differently on gfx10. NFC. Differential Revision: https://reviews.llvm.org/D81011	2020-06-04 10:29:32 +01:00
Matt Arsenault	4b4496312e	AMDGPU: Start adding MODE register uses to instructions This is the groundwork required to implement strictfp. For now, this should be NFC for regular instructoins (many instructions just gain an extra use of a reserved register). Regalloc won't rematerialize instructions with reads of physical registers, but we were suffering from that anyway with the exec reads. Should add it for all the related FP uses (possibly with some extras). I did not add it to either the gpr index mode instructions (or every single VALU instruction) since it's a ridiculous feature already modeled as an arbitrary side effect. Also work towards marking instructions with FP exceptions. This doesn't actually set the bit yet since this would start to change codegen. It seems nofpexcept is currently not implied from the regular IR FP operations. Add it to some MIR tests where I think it might matter.	2020-05-27 14:47:00 -04:00
Matt Arsenault	b27a538dda	AMDGPU: Fix illegally constant folding from V_MOV_B32_sdwa This was assumed to be a simple move, and interpreting the immediate modifier operand as a materialized immediate. Apparently the SDWA pass never produces these, but GlobalISel does emit these for some vector shuffles.	2020-05-18 15:34:33 -04:00
Kazuaki Ishizaki	0312b9f550	[llvm] NFC: Fix trivial typo in rst and td files Differential Revision: https://reviews.llvm.org/D77469	2020-04-23 14:26:32 +09:00
Matt Arsenault	f463792506	AMDGPU: Remove custom node for RSQ_LEGACY Directly select from the intrinsic. This wasn't getting much value from the custom node.	2020-04-17 19:50:36 -04:00
Matt Arsenault	79b29d6df7	AMDGPU: Remove DisableInst feature I'm not sure why these were bothering to check the instruction profile, since those profiles should only be used with these instruction classes.	2020-04-06 09:27:44 -04:00
Matt Arsenault	9564f46766	AMDGPU: Make use of default operands	2020-03-28 17:33:29 -04:00
Jay Foad	c8f0d27ef3	[AMDGPU] Fix the gfx10 scheduling model for f32 conversions Summary: As far as I can tell on gfx10 conversions to/from f32 (that are not converting f32 to/from f64) are full rate instructions, but they were marked as quarter rate instructions. I have fixed this for gfx10 only. I assume the scheduling model was correct for older architectures, though I don't have any documentation handy to confirm that. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75392	2020-03-10 19:31:24 +00:00
Matt Arsenault	d1b393d92c	AMDGPU/GlobalISel: Select G_CTTZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:46 -08:00
Matt Arsenault	c05f23e409	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp This is deprecated, but easy to support.	2020-01-22 11:43:53 -05:00
Matt Arsenault	dd09ec1208	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp8	2020-01-22 11:43:40 -05:00
Matt Arsenault	9b13b4a0e3	AMDGPU: Prepare to use scalar register indexing Define pseudos mirroring the the VGPR indexing ones, and adjust the operands in the s_movrel* instructions to avoid the result def.	2020-01-20 17:19:16 -05:00
Matt Arsenault	8615eeb455	AMDGPU: Partially merge indirect register write handling `a785209bc2` switched to using a pseudos instead of manually tying operands on the regular instruction. The VGPR indexing mode path should have the same problems that change attempted to avoid, so these should use the same strategy. Use a single pseudo for the VGPR indexing mode and movreld paths, and expand it based on the subtarget later. These have essentially the same constraints, reading the index from m0. Switch from using an offset to the subregister index directly, instead of computing an offset and re-adding it back. Also add missing pseudos for existing register class sizes.	2020-01-20 17:19:16 -05:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	78b30a54c9	AMDGPU/GlobalISel: Fix readfirstlane pattern import The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.	2020-01-07 11:07:08 -05:00
Dmitry Preobrazhensky	edd9f70163	[AMDGPU][MC][GFX10] Enabled v_movrel*[sdwa\|dpp\|dpp8] opcodes See https://bugs.llvm.org/show_bug.cgi?id=43712 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70170	2019-11-18 17:23:40 +03:00
Dmitry Preobrazhensky	e25bc5e024	[AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32 See https://bugs.llvm.org/show_bug.cgi?id=40903 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69888	2019-11-08 16:38:56 +03:00
Stanislav Mekhanoshin	4312c4afd4	[AMDGPU] deduplicate tablegen predicates We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815	2019-11-04 12:19:17 -08:00

1 2 3

110 Commits