llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	e2fb8c0f4b	Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506	2022-07-13 12:53:32 +01:00
Jay Foad	5d41fe0768	[AMDGPU] SILowerControlFlow uses LiveIntervals The availability of LiveIntervals affects kill flags in the output, so declare the use to avoid strange effects where the output of this pass is different depending on what other passes are scheduled after it. Differential Revision: https://reviews.llvm.org/D129555	2022-07-12 16:53:53 +01:00
Piotr Sobczak	2bd8e74b94	[AMDGPU] Fix bitcast v4i64/v16i16 Fix a regression introduced in D128865. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129375	2022-07-11 22:27:52 +02:00
NAKAMURA Takumi	393e12bddd	R600ISelLowering.h: Silence a warning. [-Warray-parameter] FIXME: Could it be rewritten with llvm::ArrayRef ?	2022-07-10 18:29:55 +09:00
David Blaikie	9008d0a38e	Fix -Warray-parameter warning Remove the bound in the definition, since it's not guaranteed/could provide a false sense of security (I'd be inclined to go further and change this to a pointer parameter, since that's what it really is - but figured I'd preserve some of the author's intent here)	2022-07-09 17:04:01 +00:00
serge-sans-paille	e1272ab6ec	[AMDGPU][NFC] Harmonize decl&def of R600TargetLowering::OptimizeSwizzle The freshly baked -Warray-parameter warning discovered an inconsistency in argument declaration, use the stricter one. This fixes build issues like https://lab.llvm.org/buildbot#builders/18/builds/5305	2022-07-09 09:07:31 +02:00
Abinav Puthan Purayil	17a81ecf85	[AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection This change replaces the C++ predicates with the HasNoUse builtin predicate that would enable the no-ret atomic op selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D125213	2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil	7504c7a877	[AMDGPU] Use AddedComplexity for ret and noret atomic ops selection This patch removes the predicate for return atomic ops and uses AddedComplexity to distinguish its selection from its no return variant. This will produce better matchers that doesn't unnecessarily check for the negated predicate if the initial predicate failed. Also, it simplifies the enabling of no return atomic ops selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D128241	2022-07-08 09:47:33 +05:30
Austin Kerbow	6817031d0b	[AMDGPU] Disable FillMFMAShadowMutation by default Disable amdgpu mfma power sched. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D129172	2022-07-07 09:34:45 -07:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Thomas Symalla	86bd7e2065	[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass. This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D129086	2022-07-06 11:03:03 +02:00
Carl Ritson	8bc5e7ac51	[AMDGPU] Additional liveness tests for si-optimize-exec-masking-pre-ra Merge tests and fixes from D128110 and D128315 on top of already committed D128800. Original author: arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128882	2022-07-06 15:05:32 +09:00
Jay Foad	4dbc2876cf	[AMDGPU] GFX11 trivial NFC tweaks A few miscellaneous comment, whitespace and indentation tweaks.	2022-07-05 17:20:17 +01:00
Jay Foad	12fd00ee17	[AMDGPU] Add patterns for GFX11 v_minmax and v_maxmin instructions Differential Revision: https://reviews.llvm.org/D128445	2022-07-05 16:07:47 +01:00
Joe Nash	0483c91eee	[AMDGPU] gfx11 CodeGen for new DPP instructions Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC with DPP. Depends on D128656 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128682	2022-07-05 10:17:59 -04:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Ivan Kosarev	4696a33dfa	[AMDGPU][NFC] Refine matching SMRD offsets. Tell the matcher what we are looking for instead of matching everything and then discarding the result if doesn't fit. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128171	2022-07-05 14:07:22 +01:00
Ivan Kosarev	8cd79bc12c	[AMDGPU][GlobalISel] Support register offsets for SMRDs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128836	2022-07-05 13:41:06 +01:00
Thomas Symalla	04c5fed5e0	[NFC] Fix wrong comment.	2022-07-05 13:37:44 +02:00
Nikita Popov	8e70258b18	[AMDGPUCodeGenPrepare] Check result of ConstantFoldBinaryOpOperands() This function will become fallible once we don't support constant expressions for all binops, so make sure to check the result.	2022-07-04 14:20:23 +02:00
Mirko Brkusanin	2208342c9b	[AMDGPU][GlobalISel] Always use VGPR bank for G_FCMP Differential Revision: https://reviews.llvm.org/D128980	2022-07-01 15:03:37 +02:00
Piotr Sobczak	b6ef36a1c4	[AMDGPU] Update WMMA intrinsics with explicit f16 types Update intrinsics to use n x f16 and n x i16 instead of 32-bit types. This may avoid the need for a bitcast and is probably less confusing. Depends on making v16f16 and v16i16 types legal. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128951	2022-07-01 08:55:25 +02:00
Piotr Sobczak	bd675af2a2	[AMDGPU] Make v16i16/v16f16 legal There are upcoming intrinsics to use the new types. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128865	2022-06-30 23:08:40 +02:00
Jay Foad	0f94d2b385	[AMDGPU] GFX11: automatically release VGPRs at the end of the shader GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs. Differential Revision: https://reviews.llvm.org/D128442	2022-06-30 20:55:14 +01:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Carl Ritson	d0f6641615	[AMDGPU] Fix liveness for loops in si-optimize-exec-masking-pre-ra Follow up to D127894, new liveness update code needs to handle the case where S_ANDN2 input must be extended through loops when V_CNDMASK_B32 has been hoisted. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128800	2022-06-30 15:26:50 +09:00
Jay Foad	cfb7ffdec0	[AMDGPU] New AMDGPUInsertDelayAlu pass Differential Revision: https://reviews.llvm.org/D128270	2022-06-29 21:30:20 +01:00
Matt Arsenault	0bdaef38c9	AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs The total user+system SGPR count needs to be padded out to 16 if fewer inputs are enabled.	2022-06-29 14:52:19 -04:00
Matt Arsenault	ffd6aaf5b6	AMDGPU: Make packed 32-bit instructions rematerializable	2022-06-29 11:57:54 -04:00
Matt Arsenault	4c400dc103	AMDGPU: Make 16-bit pk instructions rematerializable	2022-06-29 11:57:53 -04:00
Matt Arsenault	da6d7728d4	AMDGPU: Mark more instructions as rematerializable D106023 excluded 16-bit instructions from rematerialization, with the justification that we can't rematerialize instructions that preserve the high bits (plus the instructions which do are a confusing mess between different subtargets). This doesn't make sense to me as a problem since cases where we would rely on the high bit behavior would still need to be represented as a register value constraint with a tied operand. It's not a hidden side effect and should still be rematerializable.	2022-06-29 11:19:15 -04:00
Matt Arsenault	d342d130da	AMDGPU: Use isMeta flags on pseudoinstructions	2022-06-29 10:31:29 -04:00
Stanislav Mekhanoshin	21895c6b50	[AMDGPU] Relax verification of soffset in scalar stores It must use m0 only on GFX8. Later chips can use ang SGPR. Differential Revision: https://reviews.llvm.org/D128765	2022-06-28 16:10:08 -07:00
Jay Foad	3fbc945c3a	[AMDGPU] llvm.amdgcn.exp.compr is not supported on GFX11 Differential Revision: https://reviews.llvm.org/D128259	2022-06-28 14:48:25 +01:00
Joe Nash	f1cfaa956d	[AMDGPU] Use GFX11 S_PACK_HL instruction in more cases Differential Revision: https://reviews.llvm.org/D128527	2022-06-28 14:35:19 +01:00
Jay Foad	b5818e4eb4	[AMDGPU] Cluster stores as well as loads for GFX11 Differential Revision: https://reviews.llvm.org/D128517	2022-06-27 16:41:41 +01:00
Jay Foad	77e63b25f9	[AMDGPU] Fix assertion failure on mad with negative immediate addend Without this, the new test case would fail with: AMDGPUInstPrinter.cpp:545: void llvm::AMDGPUInstPrinter::printImmediate64(uint64_t, const llvm::MCSubtargetInfo &, llvm::raw_ostream &): Assertion `isUInt<32>(Imm) \|\| Imm == 0x3fc45f306dc9c882' failed. Differential Revision: https://reviews.llvm.org/D128435	2022-06-27 09:49:20 +01:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit `aa8feeefd3`.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Min-Yih Hsu	97579dcc6d	[MCA] Introducing incremental SourceMgr and resumable pipeline The new resumable mca::Pipeline capability introduced in this patch allows users to save the current state of pipeline and resume from the very checkpoint. It is better (but not require) to use with the new IncrementalSourceMgr, where users can add mca::Instruction incrementally rather than having a fixed number of instructions ahead-of-time. Note that we're using unit tests to test these new features. Because integrating them into the `llvm-mca` tool will make too many churns. Differential Revision: https://reviews.llvm.org/D127083	2022-06-24 15:39:51 -07:00
Joe Nash	07b7fada73	[AMDGPU] gfx11 VOPD instructions MC support VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only. A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D128218	2022-06-24 11:08:39 -04:00
Konstantin Zhuravlyov	7736ce1c56	AMDGPU: Clear kill flags when optimizing vcmp save exec sequence It was causing bad machine code for several blender scenes: * Bad machine code: Using an undefined physical register * - function: kernel_holdout_emission_blurring_pathtermination_ao - basic block: %bb.28 if.end40.i (0x7f84861a2320) - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec - operand 1: $vgpr3 Differential Revision: https://reviews.llvm.org/D127768	2022-06-24 11:30:22 -04:00
Joe Nash	ae72fee74e	[AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type Reviewed By: #amdgpu, foad, rampitec Differential Revision: https://reviews.llvm.org/D128205	2022-06-23 11:05:32 -04:00
Baptiste Saleil	79e77a9f39	[AMDGPU] Flush the vmcnt counter in loop preheaders when necessary waitcnt vmcnt instructions are currently generated in loop bodies before using values loaded outside of the loop. In some cases, it is better to flush the vmcnt counter in a loop preheader before entering the loop body. This patch detects these cases and generates waitcnt instructions to flush the counter. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D115747	2022-06-23 10:53:21 -04:00
Rodrigo Dominguez	971fa4b196	[AMDGPU] GFX11: remove ShaderType from ds_ordered_count offset field In GFX11 ShaderType is determined by the hardware and should no longer be written into bits[3:2] of the ds_ordered_count offset field. Differential Revision: https://reviews.llvm.org/D128196	2022-06-23 14:20:33 +01:00
Ruiling Song	49b8ca3f7c	AMDGPU: Don't crash on global_ctor/dtor declaration Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128320	2022-06-23 21:04:54 +08:00
Dmitry Preobrazhensky	dcb24f93af	[AMDGPU][MC][GFX11] Correct disassembly of VOP3.DPP8 opcodes Fix bug #56163. Add W32/W64 tests for all VOP3.DPP opcodes. Differential Revision: https://reviews.llvm.org/D128369	2022-06-23 13:07:45 +03:00
Matt Arsenault	b03d902b61	AMDGPU: Fix invalid liveness after si-optimize-exec-masking-pre-ra This was leaving behind a use at the deleted instruction which the verifier would fail during allocation.	2022-06-22 20:49:03 -04:00
serge-sans-paille	27fd01d3f8	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fb67d683db` detected a few regressions, fixing them. The impact on preprocessed output is negligible: -4k lines.	2022-06-22 18:50:39 +02:00
Guillaume Chatelet	cef65864af	[Alignment] Use Align for MaxKernArgAlign Differential Revision: https://reviews.llvm.org/D128118	2022-06-22 13:40:37 +00:00
Ruiling Song	4dcb42fae5	AMDGPU: Skip unexpected CFG in SIOptimizeVGPRLiveRange There are some cases that we use si_if/si_else in unatural way. Just skip them. Fixes: https://github.com/llvm/llvm-project/issues/55922 Reviewed by: critson Differential Revision: https://reviews.llvm.org/D128193	2022-06-22 12:49:41 +08:00
Joe Nash	90254d524f	[AMDGPU] gfx11 Remove SDWA from shuffle_vector ISel gfx11 does not have SDWA Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128208	2022-06-21 14:55:00 -04:00
Jay Foad	929a8ad2b6	[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11 The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed in GFX11. It is now in units of 256 dwords instead of 128 dwords. COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of 128 dwords. Differential Revision: https://reviews.llvm.org/D128179	2022-06-21 14:48:12 +01:00
Carl Ritson	62abc8c200	[AMDGPU] Set GFX11 null export target based on export attributes If shader only has depth exports use MRTZ otherwise use MRT0. Differential Revision: https://reviews.llvm.org/D128185	2022-06-21 09:40:31 +01:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Kazu Hirata	064a08cd95	Don't use Optional::hasValue (NFC)	2022-06-20 20:05:16 -07:00
Ruiling Song	732eed40fd	[AMDGPU] Mark GFX11 dual source blend export as strict-wqm The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we need to enable all four lanes of that quad to make the shuffling operation before exporting to dual source blend target work correctly. Differential Revision: https://reviews.llvm.org/D127981	2022-06-20 21:58:12 +01:00
Piotr Sobczak	29621c13ef	[AMDGPU] Tag GFX11 LDS loads as using strict_wqm LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad). Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm to enforce this and avoid lane clobbering issues. Note that only the instruction itself is tagged. The implicit uses of these do not need to be set WQM. The reduces unnecessary WQM calculation of M0. Differential Revision: https://reviews.llvm.org/D127977	2022-06-20 21:58:12 +01:00
Jay Foad	13107c2770	[AMDGPU] Add support for GFX11 LDSDIR hazards Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to allow concurrent issue of LDS direct with VALU execution. Also detect LDS direct versus VMEM source VGPR hazards and insert vm_vsrc=0 waits using s_waitcnt_depctr. Differential Revision: https://reviews.llvm.org/D127963	2022-06-20 21:58:12 +01:00
Guillaume Chatelet	d154d0ac06	[NFC] Simplify code	2022-06-20 15:15:52 +00:00
Jay Foad	ba306216d2	[AMDGPU] Reorder cases. NFC.	2022-06-20 14:30:17 +01:00
Jay Foad	d7762a3b36	[AMDGPU] Increase instruction cache line size to 128 bytes for GFX11 Differential Revision: https://reviews.llvm.org/D128189	2022-06-20 14:25:10 +01:00
Jay Foad	b8e32e808d	[AMDGPU] Remove a duplicate atomic fadd pattern This was left over after D124538.	2022-06-20 14:08:57 +01:00
Dmitry Preobrazhensky	485e8b4f63	[AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065. Differential Revision: https://reviews.llvm.org/D128075	2022-06-20 14:23:07 +03:00
Mirko Brkusanin	6cae753bf4	[AMDGPU][GlobalISel] Legalize G_FSUB for s16 Differential Revision: https://reviews.llvm.org/D128066	2022-06-20 12:25:49 +02:00
Guillaume Chatelet	f1255186c7	[NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code more opaque. Differential Revision: https://reviews.llvm.org/D128121	2022-06-20 08:37:48 +00:00
Jay Foad	7050d5b98c	[AMDGPU] Limit GFX11 to using 128 VGPRs This is a temporary measure to avoid generating incorrect code until the compiler understands the new way that GFX11 encodes 16-bit operands in VOP instructions. Differential Revision: https://reviews.llvm.org/D128054	2022-06-20 07:58:27 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Kazu Hirata	437f960062	[llvm] Call *set::insert without checking membership first (NFC)	2022-06-18 10:22:05 -07:00
Kazu Hirata	4271a1ff33	[llvm] Call *set::insert without checking membership first (NFC)	2022-06-18 10:17:22 -07:00
Joe Nash	2a68364745	[AMDGPU] gfx11 waitcnt support for VINTERP and LDSDIR instructions Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127781	2022-06-17 09:30:37 -04:00
Joe Nash	20d20156f4	[AMDGPU] gfx11 VINTERP intrinsics and ISel support Depends on D127664 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127756	2022-06-17 09:16:59 -04:00
Joe Nash	6d5d8b1313	[AMDGPU] gfx11 ldsdir intrinsics and ISel Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D127664	2022-06-17 09:03:16 -04:00
LiaoChunyu	6181c19283	[AMDGPU][NFC] Remove isConstantAddr fix isConstantAddr defined but not used Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127959	2022-06-17 08:49:29 +08:00
Joe Nash	2d43de13df	[AMDGPU] gfx11 new dot instruction codegen support Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127904	2022-06-16 14:19:34 -04:00
Jay Foad	7e681ef35e	[AMDGPU] Add GFX11 codegen for llvm.amdgcn.mov.dpp8 Differential Revision: https://reviews.llvm.org/D127980	2022-06-16 19:44:28 +01:00
Jay Foad	36ec1fcaac	[AMDGPU] Add GFX11 llvm.amdgcn.ds.add.gs.reg.rtn / llvm.amdgcn.ds.sub.gs.reg.rtn intrinsics Differential Revision: https://reviews.llvm.org/D127955	2022-06-16 18:23:14 +01:00
Jay Foad	c155a944fb	[AMDGPU] GFX11 CodeGen support for MIMG instructions This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAGE_SAMPLE instructions out into separate test files since they are no longer supported in GFX11 Differential Revision: https://reviews.llvm.org/D127837	2022-06-16 18:23:14 +01:00
Jay Foad	445a483b41	[AMDGPU] Add new GFX11 intrinsic llvm.amdgcn.exp.row Differential Revision: https://reviews.llvm.org/D127671	2022-06-16 18:23:14 +01:00
Dmitry Preobrazhensky	b26afab9d1	[AMDGPU][MC][GFX11] Correct src0 for dpp variants of v_cvt_*_e64 Differential Revision: https://reviews.llvm.org/D127847	2022-06-16 13:48:43 +03:00
David Stuttard	77851cc1cf	[AMDGPU] Change use null for dead sdst to be gfx1030+ Pre gfx1030 null for sdst is different. `c97436f8b6` [AMDGPU] Use null for dead sdst operand - requires a change to make it not apply to pre gfx1030 Differential Revision: https://reviews.llvm.org/D127869	2022-06-16 10:39:06 +01:00
Jay Foad	9dff14be9e	[AMDGPU] Add support for GFX11 hazards Add support for partial stall over EXEC hazard and trans use hazard. Differential Revision: https://reviews.llvm.org/D127872	2022-06-16 08:15:21 +01:00
Austin Kerbow	4bba82116a	[AMDGPU] Fix buildbot failures after `48ebc1af29` Some buildbots (lto, windows) were failing due to some function reference variables being improperly initialized.	2022-06-15 00:23:30 -07:00
Austin Kerbow	48ebc1af29	[AMDGPU] Add more expressive sched_barrier controls The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly optimized code. This patch adds more granular control over the types of instructions that are allowed to be reordered with respect to one or multiple sched_barriers. A mask is used to specify groups of instructions that should be allowed to be scheduled around a sched_barrier. The details about this mask may be used can be found in llvm/include/llvm/IR/IntrinsicsAMDGPU.td. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127123	2022-06-14 22:03:05 -07:00
Austin Kerbow	bd9eed3aec	[AMDGPU] Add isMFMA helper function. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127124	2022-06-14 22:01:49 -07:00
Joe Nash	989bd57f98	[AMDGPU] gfx11 support add_f16 The instruction was skipped in the earlier large patch adding VOP2, https://reviews.llvm.org/D126917. Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127697	2022-06-14 08:59:45 -04:00
Dmitry Preobrazhensky	365d827f65	[AMDGPU][MC][GFX11] Correct ds_swizzle_b32 Enable offset parsing. Differential Revision: https://reviews.llvm.org/D127404	2022-06-14 12:58:03 +03:00
Stanislav Mekhanoshin	c97436f8b6	[AMDGPU] Use null for dead sdst operand Differential Revision: https://reviews.llvm.org/D127542	2022-06-13 14:41:40 -07:00
Stanislav Mekhanoshin	cb9ae93712	[AMDGPU] Define SGPR_NULL64 register. NFCI. On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen. Differential Revision: https://reviews.llvm.org/D127527	2022-06-13 13:23:33 -07:00
Jay Foad	bfcfd53b92	[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane. Also use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D127662	2022-06-13 21:12:11 +01:00
Jay Foad	7b9f620e78	[AMDGPU] Work around GFX11 flat scratch SVS swizzling bug Differential Revision: https://reviews.llvm.org/D127635	2022-06-13 21:00:42 +01:00
Jay Foad	d943c51465	[AMDGPU] Fix GFX11 codegen for V_MAD_U64_U32 and V_MAD_I64_I32 GFX11 uses different pseudos for these because of a new constraint on which operands' registers can overlap. Differential Revision: https://reviews.llvm.org/D127659	2022-06-13 20:59:18 +01:00
Stanislav Mekhanoshin	0f81830632	[AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR This uses rotating reminder of division by 3 to select another temp vgpr each next time in a sequence of several agpr copies. Therefore, temp vgpr selection depends on the generated agpr number. This number could change with any unrelated change to the register definitions. Stabilize the selection by using a real agpr number. Differential Revision: https://reviews.llvm.org/D127524	2022-06-13 09:39:46 -07:00
Fangrui Song	adf4142f76	[MC] De-capitalize SwitchSection. NFC Add SwitchSection to return switchSection. The API will be removed soon.	2022-06-10 22:50:55 -07:00
Jay Foad	ff85d61a6e	Update *_TMPRING_SIZE.WAVESIZE for GFX11 The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units of 64 dwords instead of 256 dwords, and the field has been widened from 13 bits to 15 bits. Depends on D126989 Reviewed By: rampitec, arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D127248	2022-06-10 13:24:00 -04:00
Joe Nash	ea3c9a87d3	[AMDGPU] gfx11 add bits to COMPUTE_PGM_RSRC3 Contributors: Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 21/N for upstreaming of AMDGPU gfx11 architecture Depends on D127143 Reviewed By: rampitec, #amdgpu, kzhuravl Differential Revision: https://reviews.llvm.org/D127241	2022-06-10 13:07:14 -04:00
Joe Nash	78d8fdb88b	[AMDGPU] NFC. Comment change to GFX10+ in AsmParser	2022-06-10 12:34:07 -04:00
Joe Nash	9175ab7746	[AMDGPU] gfx11 SRC_POPS_EXISTING_WAVE_ID is removed	2022-06-10 12:32:22 -04:00
Joe Nash	fd3304ef85	[AMDGPU] gfx11 EXECZ and VCCZ are no longer allowed to be used as sources to SALU and VALU instructions. Contributors: Baptiste Saleil <baptiste.saleil@amd.com> Patch 20/N for upstreaming of AMDGPU gfx11 architecture Depends on D126989 Reviewed By: rampitec, foad, #amdgpu Differential Revision: https://reviews.llvm.org/D127143	2022-06-10 10:03:43 -04:00
Jay Foad	4b2d70fa5b	[AMDGPU] Basic implementation of isExtractSubvectorCheap Add a basic implementation of isExtractSubvectorCheap that only considers extracts at offset 0. Differential Revision: https://reviews.llvm.org/D127385	2022-06-10 14:43:07 +01:00
Ivan Kosarev	60d6fbb621	[AMDGPU][GFX9][GFX10] Support base+soffset+offset SMEM atomics. Resolves a part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D127314	2022-06-10 13:22:41 +01:00
Dmitry Preobrazhensky	f8aba9995a	[AMDGPU][MC][GFX1013] Enable image_msaa_load Differential Revision: https://reviews.llvm.org/D127198	2022-06-10 13:42:05 +03:00
Jay Foad	6c372daa84	[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions. Differential Revision: https://reviews.llvm.org/D127315	2022-06-10 08:15:23 +01:00
Jay Foad	b0a3849439	[AMDGPU] Update dlc usage for GFX11 In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass. Update the documentation and SIMemoryLegalizer accordingly. Set dlc for nontemporal and volatile accesses. Differential Revision: https://reviews.llvm.org/D127405	2022-06-10 08:10:34 +01:00
Jay Foad	ffe86e3bdd	[AMDGPU] Update SIInsertHardClauses for GFX11 Changes for GFX11: - Clauses may not mix instructions of different types, and there are more types. For example image instructions with and without a sampler are now different types. - The max size of a clause is explicitly documented as 63 instructions. Previously it was implicitly assumed to be 64. This is such a tiny difference that it does not seem worth making it conditional on the subtarget. - It can be beneficial to clause stores as well as loads. Differential Revision: https://reviews.llvm.org/D127391	2022-06-09 21:29:56 +01:00
Joe Nash	be1082c6d5	[AMDGPU] gfx11 VOPC instructions Supports encoding existing instrutions on gfx11 and MC support for the new VOPC dpp instructions. Patch 19/N for upstreaming of AMDGPU gfx11 architecture Depends on D126978 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126989	2022-06-09 15:22:42 -04:00
Stanislav Mekhanoshin	23db8e4b43	[AMDGPU] Use v_mad_u64_u32 for IMAD32 Nic Curtis done the experiments to prove it is faster than a separate mul and add. Fixes: SWDEV-332806 Differential Revision: https://reviews.llvm.org/D127253	2022-06-09 11:39:49 -07:00
Stanislav Mekhanoshin	5c974d086c	[AMDGPU] Fix hazard handling of v_cmpx to permlane - VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane. Differential Revision: https://reviews.llvm.org/D127344	2022-06-09 10:33:54 -07:00
Simon Moll	b8c2781ff6	[NFC] format InstructionSimplify & lowerCaseFunctionNames Clang-format InstructionSimplify and convert all "FunctionName"s to "functionName". This patch does touch a lot of files but gets done with the cleanup of InstructionSimplify in one commit. This is the alternative to the less invasive clang-format only patch: D126783 Reviewed By: spatel, rengolin Differential Revision: https://reviews.llvm.org/D126889	2022-06-09 16:10:08 +02:00
Benjamin Kramer	0abb472fff	AMDGPU/GISel: Remove unused variable. NFC.	2022-06-09 13:43:47 +02:00
Nicolai Hähnle	264d1136f9	AMDGPU/GISel: Introduce custom legalization of G_MUL The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32. Generating optimal code sequences for big integer multiplication is somewhat tricky and has a number of target-specific intricacies: - The target has V_MAD_U64_U32 instructions that multiply two 32-bit factors and add a 64-bit accumulator. Most partial products should use this instruction. - The accumulator is mapped to consecutive 32-bit GPRs, and partial- product multiply-adds can feed the accumulator into each other directly. (The register allocator's support for that is somewhat limited, but that only matters for 128-bit integers and larger.) - OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator to be stored in an even-aligned pair of GPRs. To avoid excessive register copies, it makes sense to compute odd partial products separately from even partial products (where a partial product src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both halves together as a final step. - We can combine G_MUL+G_ADD into a single cascade of multiply-adds. - The target can keep many carry-bits in flight simultaneously, so combining carries using G_UADDE is preferable over G_ZEXT + G_ADD. - Not addressed by this patch: When the factors are sign-extended, the V_MAD_I64_I32 instruction (signed version!) can be used. It is difficult to address these points generically: 1) Finding matching pairs of G_MUL and G_UMULH to find a wide multiply is expensive. We could add a G_UMUL_LOHI generic instruction and conditionally use that in the generic legalizer, but by itself this wouldn't allow us to use the accumulation capability of V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE post-legalization, but this is also expensive. 2) Similarly, making sense of the legalization outcome of a wide pre-legalization G_MUL+G_ADD pair is extremely expensive. 3) How could the generic legalizer possibly deal with the particular idiosyncracy of "odd" vs. "even" partial products. All this points in the direction of directly emitting an ideal code sequence during legalization, but the generic legalizer should not be burdened with such overly target-specific concerns. Hence, a custom legalization. Note that the implemented approach is different from that used by SelectionDAG because narrowing of scalars works differently in general. SelectionDAG iteratively cuts wide scalars into low and high halves until a legal size is reached. By contrast, GlobalISel does the narrowing in a single shot, which should be better for compile-time and for the quality of the generated code. This patch leaves three gaps open: 1. When the factors are uniform, we should execute the multiplication on the SALU. Register bank mapping already ensures this. However, the resulting code sequence is not optimal because it doesn't fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32 doesn't have a carry-in.) It is very difficult to fix this after the fact, so we should really use a different legalization sequence in this case. Unfortunately, we don't have a divergence analysis and so cannot make that choice. (This only matters for 128-bit integers and larger.) 2. Avoid unnecessary multiplies when sources are known to be zero- or sign-extended. The challenge is that the legalizer does not currently have access to GISelKnownBits. 3. When the G_MUL is followed by a G_ADD, we should consider combining the two instructions into a single multiply-add sequence, to utilize the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has multiple uses and the implied duplication of the multiply is an overall negative). However, this is also not true when the factors are uniform: in that case, it is generally better to not combine the two operations, so that the multiply can be done on the SALU. Again, we don't have a divergence analysis available and so cannot make an informed choice. Differential Revision: https://reviews.llvm.org/D124844	2022-06-09 13:38:56 +02:00
Joe Nash	40f35cef89	[AMDGPU] gfx11 VOP3P instruction MC support Includes dpp versions of VOP3P instructions. Patch 18/N for upstreaming of AMDGPU gfx11 architecture Depends on D126917 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126978	2022-06-08 13:32:01 -04:00
Joe Nash	086a9c1062	Reland [AMDGPU] gfx11 VOP1+VOP2 Instruction MC support The reverted dependent commit is now relanded, so reland this. Includes dpp instructions and vop1/vop2 promoted to vop3 Patch 17/N for upstreaming of AMDGPU gfx11 architecture Depends on D126483 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126917	2022-06-08 11:10:57 -04:00
Joe Nash	e243ead6fc	Reland [AMDGPU] gfx11 vop3dpp instructions There was an issue with encoding wide (>64 bit) instructions on BigEndian hosts, which is fixed in D127195. Therefore reland this. gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Differential Revision: https://reviews.llvm.org/D126483	2022-06-07 14:49:13 -04:00
Jay Foad	81edc831fb	[AMDGPU] Add support for the .reloc directive Differential Revision: https://reviews.llvm.org/D127117	2022-06-07 15:18:54 +01:00
Matt Arsenault	cc5a1b3dd9	llvm-reduce: Add cloning of target MachineFunctionInfo MIR support is totally unusable for AMDGPU without this, since the set of reserved registers is set from fields here. Add a clone method to MachineFunctionInfo. This is a subtle variant of the copy constructor that is required if there are any MIR constructs that use pointers. Specifically, at minimum fields that reference MachineBasicBlocks or the MachineFunction need to be adjusted to the values in the new function.	2022-06-07 10:14:48 -04:00
Matt Arsenault	cfe5168499	AMDGPU: Make PSV instances static members	2022-06-07 10:14:48 -04:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Fangrui Song	15d82c62dc	[MC] De-capitalize MCStreamer functions Follow-up to `c031378ce0` . The class is mostly consistent now.	2022-06-07 00:31:02 -07:00
Joe Nash	eaed07eb7e	Revert "[AMDGPU] gfx11 vop3dpp instructions" This reverts commit `99a83b1286`.	2022-06-06 17:12:09 -04:00
Joe Nash	f617f89e5b	Revert "[AMDGPU] gfx11 VOP1+VOP2 Instruction MC support" This reverts commit `6079804498`.	2022-06-06 17:11:35 -04:00
Ivan Kosarev	facbfb121a	[AMDGPU][GFX9+] Support base+soffset+offset s_atc_probe's. Resolves part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126791	2022-06-06 16:46:22 +01:00
Ivan Kosarev	79ec1e8fd6	[AMDGPU][GFX9][GFX10] Support base+soffset+offset s_dcache_discard's. Resolves part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126766	2022-06-06 16:32:16 +01:00
Joe Nash	6079804498	[AMDGPU] gfx11 VOP1+VOP2 Instruction MC support Includes dpp instructions and vop1/vop2 promoted to vop3 Patch 17/N for upstreaming of AMDGPU gfx11 architecture Depends on D126483 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126917	2022-06-06 09:57:59 -04:00
Joe Nash	99a83b1286	[AMDGPU] gfx11 vop3dpp instructions gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Depends on D126475 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126483	2022-06-06 09:34:59 -04:00
Fangrui Song	77e300ffdf	[MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline"	2022-06-05 15:11:01 -07:00
Fangrui Song	95a134254a	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 01:07:51 -07:00
Kazu Hirata	e0039b8d6a	Use llvm::less_second (NFC)	2022-06-04 22:48:32 -07:00
Jacob Weightman	814a0abcce	AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed that a function's callees were always analyzed prior to their callees. When it was refactored into a module pass, this assumption no longer always holds. This results in calls being erroneously identified as indirect, and reserving private segment space for them. This results in significantly slower kernel launch latency. This patch changes the order in which the module's functions are analyzed from the order in which they occur in the module to a post-order traversal of the call graph. Perhaps Clang always generates the module's functions in such an order, but this is not the case for the Cray Fortran compiler. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D126025	2022-06-03 15:55:54 -05:00
Matt Arsenault	dd7e407d81	AMDGPU: Move SpilledReg from MFI to SIRegisterInfo This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.	2022-06-02 17:11:24 -04:00
Julien Pages	2dfe419446	[AMDGPU] Improve codegen of extractelement/insertelement in some cases This patch improves the codegen of extractelement and insertelement for vector containing 8 elements. Before, a dag combine transformation was generating a sequence of 8 select/cmp. This patch changes the upper limit for this transformation and the movrel instruction will eventually be used instead. Extractlement/insertelement for vectors containing less than 8 elements are unchanged. Differential Revision: https://reviews.llvm.org/D126389	2022-06-02 17:05:55 -04:00
Joe Nash	3732cd59be	[AMDGPU] gfx11 vop3 and inherited vop instructions This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using the AssemblerPredicate = isGFX10Plus are also enabled. That predicate will be changed to isGFX10Only in a later patch. Patch 15/N for upstreaming of AMDGPU gfx11 architecture. Depends on D126468 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126475	2022-06-02 14:03:02 -04:00
Joe Nash	e4870c8357	[AMDGPU] gfx11 ds instructions MC layer support for ds instructions Contributors: Piotr Sobczak <Piotr.Sobczak@amd.com> Patch 14/N for upstreaming of AMDGPU gfx11 architecture. Depends on D126463 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D126468	2022-06-02 13:36:56 -04:00
Matt Arsenault	89b1808a2f	AMDGPU: Fix missing c++ mode comment	2022-06-01 21:14:48 -04:00
Stanislav Mekhanoshin	c9e242f6dd	[AMDGPU] Change GISel error handling for TFE on GFX90A Differential Revision: https://reviews.llvm.org/D126797	2022-06-01 11:07:25 -07:00
Scott Linder	2d43955cec	[AMDGPU][NFC] Refactor AMDGPUCallingConv.td Rename CalleeSavedRegs defs to avoid being overly specific: * CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_105 => CSR_AMDGPU_SGPRs * CSR_AMDGPU_SI_Gfx_SGPRs_4_29 + CSR_AMDGPU_SI_Gfx_SGPRs_64_105 => CSR_AMDGPU_SI_Gfx_SGPRs * CSR_AMDGPU_HighRegs => CSR_AMDGPU * CSR_AMDGPU_HighRegs_With_AGPRs => CSR_AMDGPU_GFX90AInsts * CSR_AMDGPU_SI_Gfx_With_AGPRs => CSR_AMDGPU_SI_Gfx_GFX90AInsts Introduce a class RegMask to mark the cases where we use the CalleeSavedRegs class purely as an expedient way to produce a mask. Update the names of these masks to not mention "CSR". Other targets also seem to do this, so a reasonable alternative is to actually update table-gen to include a new class to do this explicitly, but the current approach seems harmless so I opted to just make it more explicit. Reviewed By: arsenm, sebastian-ne Differential Revision: https://reviews.llvm.org/D109008	2022-06-01 16:24:09 +00:00
Matt Arsenault	0e1c71e4a4	CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function. Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.	2022-06-01 09:45:40 -04:00
Stanislav Mekhanoshin	dec1283279	[AMDGPU] Fix image opcodes GlobalISel on gfx90a. - Correct flavor of an instruction was not selected. - GFX90A does not support TFE. Differential Revision: https://reviews.llvm.org/D126312	2022-05-31 14:07:46 -07:00
jeff	2e61dfb124	[AMDGPU] Instruction Type Pipeline This patch implements a DAG mutation which adds edges between different groups of instructions. The purpose is to try to generate code that conforms to a pipeline (groupA instructions occur before groupB, groupB -> groupC, and so on). Currently the pipeline order is hardcoded as VMEM->DSRead->MFMA->DSWrite, but the patch was designed to be easily extensible. Alias analysis is problematic for pipelining as memory instructions will usually not be able to be reordered w.r.t one another. Differential Revision: https://reviews.llvm.org/D125997	2022-05-31 17:48:52 +00:00
Joe Nash	e8860bee28	[AMDGPU] gfx11 Image instructions MC layer support for instructions in the MIMG encoding(Image instructions). Contributors: Carl Ritson <carl.ritson@amd.com> Patch 13/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125992 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126463	2022-05-31 10:53:35 -04:00
Ivan Kosarev	f199b2b00f	[AMDGPU][NFC] Refine defining the offset field for GFX10+ SMEM instructions. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126662	2022-05-31 09:54:51 +01:00
Ivan Kosarev	b4dbcba3b7	[AMDGPU][GFX9][NFC] Rename the base class for SMEM stores.	2022-05-30 10:31:59 +01:00
Ivan Kosarev	082822b381	[AMDGPU][GFX9] Support base+soffset+offset SMEM stores. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126388	2022-05-30 10:27:57 +01:00
Nicolai Hähnle	5df2893a9a	AMDGPU: Add G_AMDGPU_MAD_64_32 instructions These generic instructions are trivially selected to V_MAD_[IU]64_[IU]32 instructions when run on the VALU. When at least both factors are scalar, it is usually better to execute some or all of the instruction on the SALU. To this end, we lower the instruction to simpler instructions that are supported on the SALU when applying the register bank mapping. Differential Revision: https://reviews.llvm.org/D124843	2022-05-27 12:36:17 -05:00
Ivan Kosarev	b0ccf38b01	[AMDGPU][GFX9] Support base+soffset+offset SMEM loads. Resolves part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D125700	2022-05-26 12:42:33 +01:00
serge-sans-paille	fb67d683db	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `7030654296` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D126417	2022-05-26 08:12:34 +02:00
Maksim Panchenko	bed9efed71	[MCDisassembler] Disambiguate Size parameter in tryAddingSymbolicOperand() MCSymbolizer::tryAddingSymbolicOperand() overloaded the Size parameter to specify either the instruction size or the operand size depending on the architecture. However, for proper symbolic disassembly on X86, we need to know both sizes, as an instruction can have two operands, and the instruction size cannot be reliably calculated based on the operand offset and its size. Hence, split Size into OpSize and InstSize. For X86, the new interface allows to fix a couple of issues: * Correctly adjust the value of PC-relative operands. * Set operand size to zero when the operand is specified implicitly. Differential Revision: https://reviews.llvm.org/D126101	2022-05-25 13:44:32 -07:00
Joe Nash	835e09c4c3	[AMDGPU] gfx11 FLAT Instructions MachineCode Support for FLAT type instructions Contributors: Sebastian Neubauer <sebastian.neubauer@amd.com> Patch 12/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125989 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D125992	2022-05-25 15:29:39 -04:00
Joe Nash	ef1ea5ac01	[AMDGPU] gfx11 vinterp instructions MC support A new instruction encoding. Some of these instructions were previously VOP3 encoded. Contributors: Carl Ritson <carl.ritson@amd.com> Patch 11/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125824 Reviewed By: critson Differential Revision: https://reviews.llvm.org/D125989	2022-05-25 14:59:16 -04:00

1 2 3 4 5 ...

7162 Commits