llvm-project

Commit Graph

Author	SHA1	Message	Date
Joe Nash	dc850fbf3b	[AMDGPU] NFC. Assert that mask is full with VOPC DPP VOPC DPP should not be formed when the row_mask and bank_mask are not 0xf (full) because the resulting VOP DPP would have different semantics than the MOV DPP followed by VOP. Existing checks in GCNDPPCombine cover this case but for different reasons, so assert the property for future-proofing. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D130101	2022-07-20 13:23:03 -04:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Johannes Doerfert	bf789b1957	[Attributor] Replace AAValueSimplify with AAPotentialValues For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication. This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now. `AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still. We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress. Fixes: https://github.com/llvm/llvm-project/issues/54981 Note: A previous version was flawed and consequently reverted in `6555558a80`.	2022-07-19 16:24:42 -05:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Jon Chesterfield	2224bbcd74	[nfc][amdgpu] LDS. Move selection logic up the stack.	2022-07-19 17:20:19 +01:00
Joe Nash	b28bb8cc9c	[AMDGPU] Remove old operand from VOPC DPP For most DPP instructions, the old operand stores the value that was in the current lane before the DPP operation, and is tied to the destination. For VOPC DPP, this is unnecessary and incorrect. There appears to have been a latent bug related to D122737 with SIInstrInfo::isOperandLegal. If you checked if a register operand was legal when the InstructionDesc expected an immediate, it reported that is valid. Its fix is necessary for and tested in this patch. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D130040	2022-07-19 09:35:05 -04:00
Abinav Puthan Purayil	9fa425c1ab	[AMDGPU] Set amdgpu-memory-bound if a basic block has dense global memory access AMDGPUPerfHintAnalysis doesn't set the memory bound attribute if FuncInfo::InstCost outweighs MemInstCost even if we have a basic block with relatively high global memory access. GCNSchedStrategy could revert optimal scheduling in favour of occupancy which seems to degrade performance for some kernels. This change introduces the HasDenseGlobalMemAcc metric in the heuristic that makes the analysis more conservative in these cases. This fixes SWDEV-334259/SWDEV-343932 Differential Revision: https://reviews.llvm.org/D129759	2022-07-19 15:16:28 +05:30
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Stanislav Mekhanoshin	523a99c0eb	[AMDGPU] Support for gfx940 fp8 smfmac Differential Revision: https://reviews.llvm.org/D129908	2022-07-18 12:12:41 -07:00
Stanislav Mekhanoshin	2695f0a688	[AMDGPU] Support for gfx940 fp8 mfma Differential Revision: https://reviews.llvm.org/D129906	2022-07-18 11:49:56 -07:00
Stanislav Mekhanoshin	9fa5a6b7e8	[AMDGPU] Support for gfx940 fp8 conversions Differential Revision: https://reviews.llvm.org/D129902	2022-07-18 11:48:43 -07:00
Petar Avramovic	c287bc4841	[AMDGPU][MC][GFX11] AsmParser for op_sel for VOP3 dpp opcodes Parse op_sel for *_e64_dpp VOP3 opcodes. Depends on D129637 and setting of VOP3_OPSEL in dpp pseudos. Differential Revision: https://reviews.llvm.org/D129767	2022-07-18 15:08:52 +02:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Ivan Kosarev	9c66c02e2e	[AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets. Saves some add instructions on a couple Rage 2 shaders and is also a prerequisite for a coming-soon change matching (register + immediate) offsets. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D129095	2022-07-18 11:18:23 +01:00
Abinav Puthan Purayil	d96361d714	[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map This change introduces the dynamic stack boolean field to code-object-v3 and above under the code properties of the kernel descriptor and under the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to the is_dynamic_callstack field of amd_kernel_code_t. Differential Revision: https://reviews.llvm.org/D128344	2022-07-18 10:07:13 +05:30
Kazu Hirata	7094ab4ee7	[llvm] Modernize bool literals (NFC) Identified with modernize-use-bool-literals.	2022-07-17 18:08:51 -07:00
Carl Ritson	547e3cba7d	[AMDGPU] Improve liveness copying in si-optimize-exec-masking-pre-ra Further improve liveness copying for CC register post optimization by mirroring live internal splits. The fixes a bug in register allocation when CC register liveness is extended across a branches instead of split. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129557	2022-07-17 17:34:05 +09:00
Kazu Hirata	deac0ac523	[AMDGPU] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-07-16 12:44:35 -07:00
Kazu Hirata	6cbfffb3a3	[AMDGPU] Declare TableRef in terms of ArrayRef (NFC)	2022-07-16 10:56:20 -07:00
Jon Chesterfield	eda2bcad02	[nfc][amdgpu] Remove dead variable and function	2022-07-15 23:56:43 +01:00
Vang Thao	67357739c6	[AMDGPU] Add remarks to output some resource usage Add analyis remarks to output kernel name, register usage, occupancy, scratch usage, spills, and LDS information. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123878	2022-07-15 11:01:53 -07:00
Dmitry Preobrazhensky	185c36de73	[AMDGPU][MC][NFC] Remove unnecessary code Differential Revision: https://reviews.llvm.org/D129766	2022-07-15 13:17:36 +03:00
Dmitry Preobrazhensky	2a6532d542	[AMDGPU][MC][GFX11] Correct disassembly of *_e64_dpp opcodes which support op_sel These opcodes cannot be disassembled because op_sel operand is missing - it must be added manually. See https://github.com/llvm/llvm-project/issues/56512 for detailed issue analysis. Differential Revision: https://reviews.llvm.org/D129637	2022-07-15 13:11:59 +03:00
jeff	8a12f20ef7	[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation	2022-07-14 16:24:13 -07:00
Alexander Timofeev	2e29b0138c	[AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable. Since the divergence-driven instruction selection has been enabled for AMDGPU, all the uniform instructions are expected to be selected to SALU form, except those not having one. VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form and overhead introduced by the data transfer between the VGPRs and SGPRs. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128252	2022-07-14 23:59:02 +02:00
Jay Foad	e45aa230ad	[AMDGPU] Update LiveVariables after killing an immediate def D114999 added code to kill an immediate def if it was folded into its only use by convertToThreeAddress. This patch updates LiveVariables when that happens in order to fix verification failures exposed by D129213. Differential Revision: https://reviews.llvm.org/D129661	2022-07-14 10:49:41 +01:00
David Green	3e0bf1c7a9	[CodeGen] Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Recommitted with some fixes for the leftover MCII variables in release builds. Differential Revision: https://reviews.llvm.org/D129506	2022-07-14 09:33:28 +01:00
Jannik Silvanus	e5c4cde451	[AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features The SI machine scheduler inherits from ScheduleDAGMI. This patch adds support for a few features that are implemented in ScheduleDAGMI (or its base classes) that were missing so far because their support is implemented in overridden functions. * Support cl::opt -view-misched-dags This option allows to open a graphical window of the scheduling DAG. * Support cl::opt -misched-print-dags This option allows to print the scheduling DAG in text form. * After constructing the scheduling DAG, call postprocessDAG() to apply any registered DAG mutations. Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp in case SIScheduler is used. Still add this to avoid surprises in the future in case mutations are added. Differential Revision: https://reviews.llvm.org/D128808	2022-07-14 09:45:31 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
David Green	95252133e1	Revert "Move instruction predicate verification to emitInstruction" This reverts commit `e2fb8c0f4b` as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.	2022-07-13 13:28:11 +01:00
David Green	e2fb8c0f4b	Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506	2022-07-13 12:53:32 +01:00
Jay Foad	5d41fe0768	[AMDGPU] SILowerControlFlow uses LiveIntervals The availability of LiveIntervals affects kill flags in the output, so declare the use to avoid strange effects where the output of this pass is different depending on what other passes are scheduled after it. Differential Revision: https://reviews.llvm.org/D129555	2022-07-12 16:53:53 +01:00
Piotr Sobczak	2bd8e74b94	[AMDGPU] Fix bitcast v4i64/v16i16 Fix a regression introduced in D128865. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129375	2022-07-11 22:27:52 +02:00
NAKAMURA Takumi	393e12bddd	R600ISelLowering.h: Silence a warning. [-Warray-parameter] FIXME: Could it be rewritten with llvm::ArrayRef ?	2022-07-10 18:29:55 +09:00
David Blaikie	9008d0a38e	Fix -Warray-parameter warning Remove the bound in the definition, since it's not guaranteed/could provide a false sense of security (I'd be inclined to go further and change this to a pointer parameter, since that's what it really is - but figured I'd preserve some of the author's intent here)	2022-07-09 17:04:01 +00:00
serge-sans-paille	e1272ab6ec	[AMDGPU][NFC] Harmonize decl&def of R600TargetLowering::OptimizeSwizzle The freshly baked -Warray-parameter warning discovered an inconsistency in argument declaration, use the stricter one. This fixes build issues like https://lab.llvm.org/buildbot#builders/18/builds/5305	2022-07-09 09:07:31 +02:00
Abinav Puthan Purayil	17a81ecf85	[AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection This change replaces the C++ predicates with the HasNoUse builtin predicate that would enable the no-ret atomic op selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D125213	2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil	7504c7a877	[AMDGPU] Use AddedComplexity for ret and noret atomic ops selection This patch removes the predicate for return atomic ops and uses AddedComplexity to distinguish its selection from its no return variant. This will produce better matchers that doesn't unnecessarily check for the negated predicate if the initial predicate failed. Also, it simplifies the enabling of no return atomic ops selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D128241	2022-07-08 09:47:33 +05:30
Austin Kerbow	6817031d0b	[AMDGPU] Disable FillMFMAShadowMutation by default Disable amdgpu mfma power sched. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D129172	2022-07-07 09:34:45 -07:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Thomas Symalla	86bd7e2065	[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass. This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D129086	2022-07-06 11:03:03 +02:00
Carl Ritson	8bc5e7ac51	[AMDGPU] Additional liveness tests for si-optimize-exec-masking-pre-ra Merge tests and fixes from D128110 and D128315 on top of already committed D128800. Original author: arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128882	2022-07-06 15:05:32 +09:00
Jay Foad	4dbc2876cf	[AMDGPU] GFX11 trivial NFC tweaks A few miscellaneous comment, whitespace and indentation tweaks.	2022-07-05 17:20:17 +01:00
Jay Foad	12fd00ee17	[AMDGPU] Add patterns for GFX11 v_minmax and v_maxmin instructions Differential Revision: https://reviews.llvm.org/D128445	2022-07-05 16:07:47 +01:00
Joe Nash	0483c91eee	[AMDGPU] gfx11 CodeGen for new DPP instructions Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC with DPP. Depends on D128656 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128682	2022-07-05 10:17:59 -04:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Ivan Kosarev	4696a33dfa	[AMDGPU][NFC] Refine matching SMRD offsets. Tell the matcher what we are looking for instead of matching everything and then discarding the result if doesn't fit. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128171	2022-07-05 14:07:22 +01:00
Ivan Kosarev	8cd79bc12c	[AMDGPU][GlobalISel] Support register offsets for SMRDs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128836	2022-07-05 13:41:06 +01:00
Thomas Symalla	04c5fed5e0	[NFC] Fix wrong comment.	2022-07-05 13:37:44 +02:00
Nikita Popov	8e70258b18	[AMDGPUCodeGenPrepare] Check result of ConstantFoldBinaryOpOperands() This function will become fallible once we don't support constant expressions for all binops, so make sure to check the result.	2022-07-04 14:20:23 +02:00
Mirko Brkusanin	2208342c9b	[AMDGPU][GlobalISel] Always use VGPR bank for G_FCMP Differential Revision: https://reviews.llvm.org/D128980	2022-07-01 15:03:37 +02:00
Piotr Sobczak	b6ef36a1c4	[AMDGPU] Update WMMA intrinsics with explicit f16 types Update intrinsics to use n x f16 and n x i16 instead of 32-bit types. This may avoid the need for a bitcast and is probably less confusing. Depends on making v16f16 and v16i16 types legal. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128951	2022-07-01 08:55:25 +02:00
Piotr Sobczak	bd675af2a2	[AMDGPU] Make v16i16/v16f16 legal There are upcoming intrinsics to use the new types. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128865	2022-06-30 23:08:40 +02:00
Jay Foad	0f94d2b385	[AMDGPU] GFX11: automatically release VGPRs at the end of the shader GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs. Differential Revision: https://reviews.llvm.org/D128442	2022-06-30 20:55:14 +01:00
Piotr Sobczak	4874838a63	[AMDGPU] gfx11 WMMA instruction support gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate) instructions. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D128756	2022-06-30 11:13:45 -04:00
Carl Ritson	d0f6641615	[AMDGPU] Fix liveness for loops in si-optimize-exec-masking-pre-ra Follow up to D127894, new liveness update code needs to handle the case where S_ANDN2 input must be extended through loops when V_CNDMASK_B32 has been hoisted. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128800	2022-06-30 15:26:50 +09:00
Jay Foad	cfb7ffdec0	[AMDGPU] New AMDGPUInsertDelayAlu pass Differential Revision: https://reviews.llvm.org/D128270	2022-06-29 21:30:20 +01:00
Matt Arsenault	0bdaef38c9	AMDGPU: Add gfx11 feature to force initializing 16 input SGPRs The total user+system SGPR count needs to be padded out to 16 if fewer inputs are enabled.	2022-06-29 14:52:19 -04:00
Matt Arsenault	ffd6aaf5b6	AMDGPU: Make packed 32-bit instructions rematerializable	2022-06-29 11:57:54 -04:00
Matt Arsenault	4c400dc103	AMDGPU: Make 16-bit pk instructions rematerializable	2022-06-29 11:57:53 -04:00
Matt Arsenault	da6d7728d4	AMDGPU: Mark more instructions as rematerializable D106023 excluded 16-bit instructions from rematerialization, with the justification that we can't rematerialize instructions that preserve the high bits (plus the instructions which do are a confusing mess between different subtargets). This doesn't make sense to me as a problem since cases where we would rely on the high bit behavior would still need to be represented as a register value constraint with a tied operand. It's not a hidden side effect and should still be rematerializable.	2022-06-29 11:19:15 -04:00
Matt Arsenault	d342d130da	AMDGPU: Use isMeta flags on pseudoinstructions	2022-06-29 10:31:29 -04:00
Stanislav Mekhanoshin	21895c6b50	[AMDGPU] Relax verification of soffset in scalar stores It must use m0 only on GFX8. Later chips can use ang SGPR. Differential Revision: https://reviews.llvm.org/D128765	2022-06-28 16:10:08 -07:00
Jay Foad	3fbc945c3a	[AMDGPU] llvm.amdgcn.exp.compr is not supported on GFX11 Differential Revision: https://reviews.llvm.org/D128259	2022-06-28 14:48:25 +01:00
Joe Nash	f1cfaa956d	[AMDGPU] Use GFX11 S_PACK_HL instruction in more cases Differential Revision: https://reviews.llvm.org/D128527	2022-06-28 14:35:19 +01:00
Jay Foad	b5818e4eb4	[AMDGPU] Cluster stores as well as loads for GFX11 Differential Revision: https://reviews.llvm.org/D128517	2022-06-27 16:41:41 +01:00
Jay Foad	77e63b25f9	[AMDGPU] Fix assertion failure on mad with negative immediate addend Without this, the new test case would fail with: AMDGPUInstPrinter.cpp:545: void llvm::AMDGPUInstPrinter::printImmediate64(uint64_t, const llvm::MCSubtargetInfo &, llvm::raw_ostream &): Assertion `isUInt<32>(Imm) \|\| Imm == 0x3fc45f306dc9c882' failed. Differential Revision: https://reviews.llvm.org/D128435	2022-06-27 09:49:20 +01:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit `aa8feeefd3`.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Min-Yih Hsu	97579dcc6d	[MCA] Introducing incremental SourceMgr and resumable pipeline The new resumable mca::Pipeline capability introduced in this patch allows users to save the current state of pipeline and resume from the very checkpoint. It is better (but not require) to use with the new IncrementalSourceMgr, where users can add mca::Instruction incrementally rather than having a fixed number of instructions ahead-of-time. Note that we're using unit tests to test these new features. Because integrating them into the `llvm-mca` tool will make too many churns. Differential Revision: https://reviews.llvm.org/D127083	2022-06-24 15:39:51 -07:00
Joe Nash	07b7fada73	[AMDGPU] gfx11 VOPD instructions MC support VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only. A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D128218	2022-06-24 11:08:39 -04:00
Konstantin Zhuravlyov	7736ce1c56	AMDGPU: Clear kill flags when optimizing vcmp save exec sequence It was causing bad machine code for several blender scenes: * Bad machine code: Using an undefined physical register * - function: kernel_holdout_emission_blurring_pathtermination_ao - basic block: %bb.28 if.end40.i (0x7f84861a2320) - instruction: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr3, implicit-def $exec, implicit $exec - operand 1: $vgpr3 Differential Revision: https://reviews.llvm.org/D127768	2022-06-24 11:30:22 -04:00
Joe Nash	ae72fee74e	[AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type Reviewed By: #amdgpu, foad, rampitec Differential Revision: https://reviews.llvm.org/D128205	2022-06-23 11:05:32 -04:00
Baptiste Saleil	79e77a9f39	[AMDGPU] Flush the vmcnt counter in loop preheaders when necessary waitcnt vmcnt instructions are currently generated in loop bodies before using values loaded outside of the loop. In some cases, it is better to flush the vmcnt counter in a loop preheader before entering the loop body. This patch detects these cases and generates waitcnt instructions to flush the counter. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D115747	2022-06-23 10:53:21 -04:00
Rodrigo Dominguez	971fa4b196	[AMDGPU] GFX11: remove ShaderType from ds_ordered_count offset field In GFX11 ShaderType is determined by the hardware and should no longer be written into bits[3:2] of the ds_ordered_count offset field. Differential Revision: https://reviews.llvm.org/D128196	2022-06-23 14:20:33 +01:00
Ruiling Song	49b8ca3f7c	AMDGPU: Don't crash on global_ctor/dtor declaration Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128320	2022-06-23 21:04:54 +08:00
Dmitry Preobrazhensky	dcb24f93af	[AMDGPU][MC][GFX11] Correct disassembly of VOP3.DPP8 opcodes Fix bug #56163. Add W32/W64 tests for all VOP3.DPP opcodes. Differential Revision: https://reviews.llvm.org/D128369	2022-06-23 13:07:45 +03:00
Matt Arsenault	b03d902b61	AMDGPU: Fix invalid liveness after si-optimize-exec-masking-pre-ra This was leaving behind a use at the deleted instruction which the verifier would fail during allocation.	2022-06-22 20:49:03 -04:00
serge-sans-paille	27fd01d3f8	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fb67d683db` detected a few regressions, fixing them. The impact on preprocessed output is negligible: -4k lines.	2022-06-22 18:50:39 +02:00
Guillaume Chatelet	cef65864af	[Alignment] Use Align for MaxKernArgAlign Differential Revision: https://reviews.llvm.org/D128118	2022-06-22 13:40:37 +00:00
Ruiling Song	4dcb42fae5	AMDGPU: Skip unexpected CFG in SIOptimizeVGPRLiveRange There are some cases that we use si_if/si_else in unatural way. Just skip them. Fixes: https://github.com/llvm/llvm-project/issues/55922 Reviewed by: critson Differential Revision: https://reviews.llvm.org/D128193	2022-06-22 12:49:41 +08:00
Joe Nash	90254d524f	[AMDGPU] gfx11 Remove SDWA from shuffle_vector ISel gfx11 does not have SDWA Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128208	2022-06-21 14:55:00 -04:00
Jay Foad	929a8ad2b6	[AMDGPU] Update SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE for GFX11 The granularity of SPI_SHADER_PGM_RSRC2_PS.EXTRA_LDS_SIZE changed in GFX11. It is now in units of 256 dwords instead of 128 dwords. COMPUTE_PGM_RSRC2.LDS_SIZE is unaffected. It is still in units of 128 dwords. Differential Revision: https://reviews.llvm.org/D128179	2022-06-21 14:48:12 +01:00
Carl Ritson	62abc8c200	[AMDGPU] Set GFX11 null export target based on export attributes If shader only has depth exports use MRTZ otherwise use MRT0. Differential Revision: https://reviews.llvm.org/D128185	2022-06-21 09:40:31 +01:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Kazu Hirata	064a08cd95	Don't use Optional::hasValue (NFC)	2022-06-20 20:05:16 -07:00
Ruiling Song	732eed40fd	[AMDGPU] Mark GFX11 dual source blend export as strict-wqm The instructions that generate the source of dual source blend export should run in strict-wqm. That is if any lane in a quad is active, we need to enable all four lanes of that quad to make the shuffling operation before exporting to dual source blend target work correctly. Differential Revision: https://reviews.llvm.org/D127981	2022-06-20 21:58:12 +01:00
Piotr Sobczak	29621c13ef	[AMDGPU] Tag GFX11 LDS loads as using strict_wqm LDS_PARAM_LOAD and LDS_DIRECT_LOAD use EXEC per quad (if any pixel is enabled in the quad, data is written to all 4 pixels/threads in the quad). Tag LDS_PARAM_LOAD and LDS_DIRECT_LOAD as using strict_wqm to enforce this and avoid lane clobbering issues. Note that only the instruction itself is tagged. The implicit uses of these do not need to be set WQM. The reduces unnecessary WQM calculation of M0. Differential Revision: https://reviews.llvm.org/D127977	2022-06-20 21:58:12 +01:00
Jay Foad	13107c2770	[AMDGPU] Add support for GFX11 LDSDIR hazards Detect LDS direct WAR/WAW hazards and compute values for wait_vdst (va_vdst) parameter. Where appropriate this raises wait_vdst from the default 0 to allow concurrent issue of LDS direct with VALU execution. Also detect LDS direct versus VMEM source VGPR hazards and insert vm_vsrc=0 waits using s_waitcnt_depctr. Differential Revision: https://reviews.llvm.org/D127963	2022-06-20 21:58:12 +01:00
Guillaume Chatelet	d154d0ac06	[NFC] Simplify code	2022-06-20 15:15:52 +00:00
Jay Foad	ba306216d2	[AMDGPU] Reorder cases. NFC.	2022-06-20 14:30:17 +01:00
Jay Foad	d7762a3b36	[AMDGPU] Increase instruction cache line size to 128 bytes for GFX11 Differential Revision: https://reviews.llvm.org/D128189	2022-06-20 14:25:10 +01:00
Jay Foad	b8e32e808d	[AMDGPU] Remove a duplicate atomic fadd pattern This was left over after D124538.	2022-06-20 14:08:57 +01:00
Dmitry Preobrazhensky	485e8b4f63	[AMDGPU][MC][GFX11] Correct disassembly of DPP variants of VOPC64 opcodes Fix bugs https://github.com/llvm/llvm-project/issues/56091, https://github.com/llvm/llvm-project/issues/56065. Differential Revision: https://reviews.llvm.org/D128075	2022-06-20 14:23:07 +03:00
Mirko Brkusanin	6cae753bf4	[AMDGPU][GlobalISel] Legalize G_FSUB for s16 Differential Revision: https://reviews.llvm.org/D128066	2022-06-20 12:25:49 +02:00
Guillaume Chatelet	f1255186c7	[NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code more opaque. Differential Revision: https://reviews.llvm.org/D128121	2022-06-20 08:37:48 +00:00
Jay Foad	7050d5b98c	[AMDGPU] Limit GFX11 to using 128 VGPRs This is a temporary measure to avoid generating incorrect code until the compiler understands the new way that GFX11 encodes 16-bit operands in VOP instructions. Differential Revision: https://reviews.llvm.org/D128054	2022-06-20 07:58:27 +01:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
Kazu Hirata	437f960062	[llvm] Call *set::insert without checking membership first (NFC)	2022-06-18 10:22:05 -07:00
Kazu Hirata	4271a1ff33	[llvm] Call *set::insert without checking membership first (NFC)	2022-06-18 10:17:22 -07:00
Joe Nash	2a68364745	[AMDGPU] gfx11 waitcnt support for VINTERP and LDSDIR instructions Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127781	2022-06-17 09:30:37 -04:00
Joe Nash	20d20156f4	[AMDGPU] gfx11 VINTERP intrinsics and ISel support Depends on D127664 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127756	2022-06-17 09:16:59 -04:00
Joe Nash	6d5d8b1313	[AMDGPU] gfx11 ldsdir intrinsics and ISel Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D127664	2022-06-17 09:03:16 -04:00
LiaoChunyu	6181c19283	[AMDGPU][NFC] Remove isConstantAddr fix isConstantAddr defined but not used Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127959	2022-06-17 08:49:29 +08:00
Joe Nash	2d43de13df	[AMDGPU] gfx11 new dot instruction codegen support Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127904	2022-06-16 14:19:34 -04:00
Jay Foad	7e681ef35e	[AMDGPU] Add GFX11 codegen for llvm.amdgcn.mov.dpp8 Differential Revision: https://reviews.llvm.org/D127980	2022-06-16 19:44:28 +01:00
Jay Foad	36ec1fcaac	[AMDGPU] Add GFX11 llvm.amdgcn.ds.add.gs.reg.rtn / llvm.amdgcn.ds.sub.gs.reg.rtn intrinsics Differential Revision: https://reviews.llvm.org/D127955	2022-06-16 18:23:14 +01:00
Jay Foad	c155a944fb	[AMDGPU] GFX11 CodeGen support for MIMG instructions This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAGE_SAMPLE instructions out into separate test files since they are no longer supported in GFX11 Differential Revision: https://reviews.llvm.org/D127837	2022-06-16 18:23:14 +01:00
Jay Foad	445a483b41	[AMDGPU] Add new GFX11 intrinsic llvm.amdgcn.exp.row Differential Revision: https://reviews.llvm.org/D127671	2022-06-16 18:23:14 +01:00
Dmitry Preobrazhensky	b26afab9d1	[AMDGPU][MC][GFX11] Correct src0 for dpp variants of v_cvt_*_e64 Differential Revision: https://reviews.llvm.org/D127847	2022-06-16 13:48:43 +03:00
David Stuttard	77851cc1cf	[AMDGPU] Change use null for dead sdst to be gfx1030+ Pre gfx1030 null for sdst is different. `c97436f8b6` [AMDGPU] Use null for dead sdst operand - requires a change to make it not apply to pre gfx1030 Differential Revision: https://reviews.llvm.org/D127869	2022-06-16 10:39:06 +01:00
Jay Foad	9dff14be9e	[AMDGPU] Add support for GFX11 hazards Add support for partial stall over EXEC hazard and trans use hazard. Differential Revision: https://reviews.llvm.org/D127872	2022-06-16 08:15:21 +01:00
Austin Kerbow	4bba82116a	[AMDGPU] Fix buildbot failures after `48ebc1af29` Some buildbots (lto, windows) were failing due to some function reference variables being improperly initialized.	2022-06-15 00:23:30 -07:00
Austin Kerbow	48ebc1af29	[AMDGPU] Add more expressive sched_barrier controls The sched_barrier builtin allow the scheduler's behavior to be shaped by users when very specific codegen is needed in order to create highly optimized code. This patch adds more granular control over the types of instructions that are allowed to be reordered with respect to one or multiple sched_barriers. A mask is used to specify groups of instructions that should be allowed to be scheduled around a sched_barrier. The details about this mask may be used can be found in llvm/include/llvm/IR/IntrinsicsAMDGPU.td. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127123	2022-06-14 22:03:05 -07:00
Austin Kerbow	bd9eed3aec	[AMDGPU] Add isMFMA helper function. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D127124	2022-06-14 22:01:49 -07:00
Joe Nash	989bd57f98	[AMDGPU] gfx11 support add_f16 The instruction was skipped in the earlier large patch adding VOP2, https://reviews.llvm.org/D126917. Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D127697	2022-06-14 08:59:45 -04:00
Dmitry Preobrazhensky	365d827f65	[AMDGPU][MC][GFX11] Correct ds_swizzle_b32 Enable offset parsing. Differential Revision: https://reviews.llvm.org/D127404	2022-06-14 12:58:03 +03:00
Stanislav Mekhanoshin	c97436f8b6	[AMDGPU] Use null for dead sdst operand Differential Revision: https://reviews.llvm.org/D127542	2022-06-13 14:41:40 -07:00
Stanislav Mekhanoshin	cb9ae93712	[AMDGPU] Define SGPR_NULL64 register. NFCI. On gfx10+ null register can be used as both 32 and 64 bit operand. Define a 64 bit version of the register to use during codegen. Differential Revision: https://reviews.llvm.org/D127527	2022-06-13 13:23:33 -07:00
Jay Foad	bfcfd53b92	[AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic Compared to permlane16, permlane64 has no BC input because it has no boundary conditions, no fi input because the instruction acts as if FI were always enabled, and no OLD input because it always writes to every active lane. Also use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D127662	2022-06-13 21:12:11 +01:00
Jay Foad	7b9f620e78	[AMDGPU] Work around GFX11 flat scratch SVS swizzling bug Differential Revision: https://reviews.llvm.org/D127635	2022-06-13 21:00:42 +01:00
Jay Foad	d943c51465	[AMDGPU] Fix GFX11 codegen for V_MAD_U64_U32 and V_MAD_I64_I32 GFX11 uses different pseudos for these because of a new constraint on which operands' registers can overlap. Differential Revision: https://reviews.llvm.org/D127659	2022-06-13 20:59:18 +01:00
Stanislav Mekhanoshin	0f81830632	[AMDGPU] Make temp vgpr selection stable in indirectCopyToAGPR This uses rotating reminder of division by 3 to select another temp vgpr each next time in a sequence of several agpr copies. Therefore, temp vgpr selection depends on the generated agpr number. This number could change with any unrelated change to the register definitions. Stabilize the selection by using a real agpr number. Differential Revision: https://reviews.llvm.org/D127524	2022-06-13 09:39:46 -07:00
Fangrui Song	adf4142f76	[MC] De-capitalize SwitchSection. NFC Add SwitchSection to return switchSection. The API will be removed soon.	2022-06-10 22:50:55 -07:00
Jay Foad	ff85d61a6e	Update *_TMPRING_SIZE.WAVESIZE for GFX11 The encoding of COMPUTE_TMPRING_SIZE.WAVESIZE and SPI_TMPRING_SIZE.WAVESIZE has changed in GFX11: it is now in units of 64 dwords instead of 256 dwords, and the field has been widened from 13 bits to 15 bits. Depends on D126989 Reviewed By: rampitec, arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D127248	2022-06-10 13:24:00 -04:00
Joe Nash	ea3c9a87d3	[AMDGPU] gfx11 add bits to COMPUTE_PGM_RSRC3 Contributors: Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 21/N for upstreaming of AMDGPU gfx11 architecture Depends on D127143 Reviewed By: rampitec, #amdgpu, kzhuravl Differential Revision: https://reviews.llvm.org/D127241	2022-06-10 13:07:14 -04:00
Joe Nash	78d8fdb88b	[AMDGPU] NFC. Comment change to GFX10+ in AsmParser	2022-06-10 12:34:07 -04:00
Joe Nash	9175ab7746	[AMDGPU] gfx11 SRC_POPS_EXISTING_WAVE_ID is removed	2022-06-10 12:32:22 -04:00
Joe Nash	fd3304ef85	[AMDGPU] gfx11 EXECZ and VCCZ are no longer allowed to be used as sources to SALU and VALU instructions. Contributors: Baptiste Saleil <baptiste.saleil@amd.com> Patch 20/N for upstreaming of AMDGPU gfx11 architecture Depends on D126989 Reviewed By: rampitec, foad, #amdgpu Differential Revision: https://reviews.llvm.org/D127143	2022-06-10 10:03:43 -04:00
Jay Foad	4b2d70fa5b	[AMDGPU] Basic implementation of isExtractSubvectorCheap Add a basic implementation of isExtractSubvectorCheap that only considers extracts at offset 0. Differential Revision: https://reviews.llvm.org/D127385	2022-06-10 14:43:07 +01:00
Ivan Kosarev	60d6fbb621	[AMDGPU][GFX9][GFX10] Support base+soffset+offset SMEM atomics. Resolves a part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D127314	2022-06-10 13:22:41 +01:00
Dmitry Preobrazhensky	f8aba9995a	[AMDGPU][MC][GFX1013] Enable image_msaa_load Differential Revision: https://reviews.llvm.org/D127198	2022-06-10 13:42:05 +03:00
Jay Foad	6c372daa84	[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions. Differential Revision: https://reviews.llvm.org/D127315	2022-06-10 08:15:23 +01:00
Jay Foad	b0a3849439	[AMDGPU] Update dlc usage for GFX11 In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass. Update the documentation and SIMemoryLegalizer accordingly. Set dlc for nontemporal and volatile accesses. Differential Revision: https://reviews.llvm.org/D127405	2022-06-10 08:10:34 +01:00
Jay Foad	ffe86e3bdd	[AMDGPU] Update SIInsertHardClauses for GFX11 Changes for GFX11: - Clauses may not mix instructions of different types, and there are more types. For example image instructions with and without a sampler are now different types. - The max size of a clause is explicitly documented as 63 instructions. Previously it was implicitly assumed to be 64. This is such a tiny difference that it does not seem worth making it conditional on the subtarget. - It can be beneficial to clause stores as well as loads. Differential Revision: https://reviews.llvm.org/D127391	2022-06-09 21:29:56 +01:00
Joe Nash	be1082c6d5	[AMDGPU] gfx11 VOPC instructions Supports encoding existing instrutions on gfx11 and MC support for the new VOPC dpp instructions. Patch 19/N for upstreaming of AMDGPU gfx11 architecture Depends on D126978 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126989	2022-06-09 15:22:42 -04:00
Stanislav Mekhanoshin	23db8e4b43	[AMDGPU] Use v_mad_u64_u32 for IMAD32 Nic Curtis done the experiments to prove it is faster than a separate mul and add. Fixes: SWDEV-332806 Differential Revision: https://reviews.llvm.org/D127253	2022-06-09 11:39:49 -07:00
Stanislav Mekhanoshin	5c974d086c	[AMDGPU] Fix hazard handling of v_cmpx to permlane - VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane. Differential Revision: https://reviews.llvm.org/D127344	2022-06-09 10:33:54 -07:00
Simon Moll	b8c2781ff6	[NFC] format InstructionSimplify & lowerCaseFunctionNames Clang-format InstructionSimplify and convert all "FunctionName"s to "functionName". This patch does touch a lot of files but gets done with the cleanup of InstructionSimplify in one commit. This is the alternative to the less invasive clang-format only patch: D126783 Reviewed By: spatel, rengolin Differential Revision: https://reviews.llvm.org/D126889	2022-06-09 16:10:08 +02:00
Benjamin Kramer	0abb472fff	AMDGPU/GISel: Remove unused variable. NFC.	2022-06-09 13:43:47 +02:00
Nicolai Hähnle	264d1136f9	AMDGPU/GISel: Introduce custom legalization of G_MUL The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32. Generating optimal code sequences for big integer multiplication is somewhat tricky and has a number of target-specific intricacies: - The target has V_MAD_U64_U32 instructions that multiply two 32-bit factors and add a 64-bit accumulator. Most partial products should use this instruction. - The accumulator is mapped to consecutive 32-bit GPRs, and partial- product multiply-adds can feed the accumulator into each other directly. (The register allocator's support for that is somewhat limited, but that only matters for 128-bit integers and larger.) - OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator to be stored in an even-aligned pair of GPRs. To avoid excessive register copies, it makes sense to compute odd partial products separately from even partial products (where a partial product src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both halves together as a final step. - We can combine G_MUL+G_ADD into a single cascade of multiply-adds. - The target can keep many carry-bits in flight simultaneously, so combining carries using G_UADDE is preferable over G_ZEXT + G_ADD. - Not addressed by this patch: When the factors are sign-extended, the V_MAD_I64_I32 instruction (signed version!) can be used. It is difficult to address these points generically: 1) Finding matching pairs of G_MUL and G_UMULH to find a wide multiply is expensive. We could add a G_UMUL_LOHI generic instruction and conditionally use that in the generic legalizer, but by itself this wouldn't allow us to use the accumulation capability of V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE post-legalization, but this is also expensive. 2) Similarly, making sense of the legalization outcome of a wide pre-legalization G_MUL+G_ADD pair is extremely expensive. 3) How could the generic legalizer possibly deal with the particular idiosyncracy of "odd" vs. "even" partial products. All this points in the direction of directly emitting an ideal code sequence during legalization, but the generic legalizer should not be burdened with such overly target-specific concerns. Hence, a custom legalization. Note that the implemented approach is different from that used by SelectionDAG because narrowing of scalars works differently in general. SelectionDAG iteratively cuts wide scalars into low and high halves until a legal size is reached. By contrast, GlobalISel does the narrowing in a single shot, which should be better for compile-time and for the quality of the generated code. This patch leaves three gaps open: 1. When the factors are uniform, we should execute the multiplication on the SALU. Register bank mapping already ensures this. However, the resulting code sequence is not optimal because it doesn't fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32 doesn't have a carry-in.) It is very difficult to fix this after the fact, so we should really use a different legalization sequence in this case. Unfortunately, we don't have a divergence analysis and so cannot make that choice. (This only matters for 128-bit integers and larger.) 2. Avoid unnecessary multiplies when sources are known to be zero- or sign-extended. The challenge is that the legalizer does not currently have access to GISelKnownBits. 3. When the G_MUL is followed by a G_ADD, we should consider combining the two instructions into a single multiply-add sequence, to utilize the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has multiple uses and the implied duplication of the multiply is an overall negative). However, this is also not true when the factors are uniform: in that case, it is generally better to not combine the two operations, so that the multiply can be done on the SALU. Again, we don't have a divergence analysis available and so cannot make an informed choice. Differential Revision: https://reviews.llvm.org/D124844	2022-06-09 13:38:56 +02:00
Joe Nash	40f35cef89	[AMDGPU] gfx11 VOP3P instruction MC support Includes dpp versions of VOP3P instructions. Patch 18/N for upstreaming of AMDGPU gfx11 architecture Depends on D126917 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126978	2022-06-08 13:32:01 -04:00
Joe Nash	086a9c1062	Reland [AMDGPU] gfx11 VOP1+VOP2 Instruction MC support The reverted dependent commit is now relanded, so reland this. Includes dpp instructions and vop1/vop2 promoted to vop3 Patch 17/N for upstreaming of AMDGPU gfx11 architecture Depends on D126483 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126917	2022-06-08 11:10:57 -04:00
Joe Nash	e243ead6fc	Reland [AMDGPU] gfx11 vop3dpp instructions There was an issue with encoding wide (>64 bit) instructions on BigEndian hosts, which is fixed in D127195. Therefore reland this. gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Differential Revision: https://reviews.llvm.org/D126483	2022-06-07 14:49:13 -04:00
Jay Foad	81edc831fb	[AMDGPU] Add support for the .reloc directive Differential Revision: https://reviews.llvm.org/D127117	2022-06-07 15:18:54 +01:00
Matt Arsenault	cc5a1b3dd9	llvm-reduce: Add cloning of target MachineFunctionInfo MIR support is totally unusable for AMDGPU without this, since the set of reserved registers is set from fields here. Add a clone method to MachineFunctionInfo. This is a subtle variant of the copy constructor that is required if there are any MIR constructs that use pointers. Specifically, at minimum fields that reference MachineBasicBlocks or the MachineFunction need to be adjusted to the values in the new function.	2022-06-07 10:14:48 -04:00
Matt Arsenault	cfe5168499	AMDGPU: Make PSV instances static members	2022-06-07 10:14:48 -04:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Fangrui Song	15d82c62dc	[MC] De-capitalize MCStreamer functions Follow-up to `c031378ce0` . The class is mostly consistent now.	2022-06-07 00:31:02 -07:00
Joe Nash	eaed07eb7e	Revert "[AMDGPU] gfx11 vop3dpp instructions" This reverts commit `99a83b1286`.	2022-06-06 17:12:09 -04:00
Joe Nash	f617f89e5b	Revert "[AMDGPU] gfx11 VOP1+VOP2 Instruction MC support" This reverts commit `6079804498`.	2022-06-06 17:11:35 -04:00
Ivan Kosarev	facbfb121a	[AMDGPU][GFX9+] Support base+soffset+offset s_atc_probe's. Resolves part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126791	2022-06-06 16:46:22 +01:00
Ivan Kosarev	79ec1e8fd6	[AMDGPU][GFX9][GFX10] Support base+soffset+offset s_dcache_discard's. Resolves part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126766	2022-06-06 16:32:16 +01:00
Joe Nash	6079804498	[AMDGPU] gfx11 VOP1+VOP2 Instruction MC support Includes dpp instructions and vop1/vop2 promoted to vop3 Patch 17/N for upstreaming of AMDGPU gfx11 architecture Depends on D126483 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126917	2022-06-06 09:57:59 -04:00
Joe Nash	99a83b1286	[AMDGPU] gfx11 vop3dpp instructions gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Depends on D126475 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126483	2022-06-06 09:34:59 -04:00
Fangrui Song	77e300ffdf	[MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline"	2022-06-05 15:11:01 -07:00
Fangrui Song	95a134254a	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 01:07:51 -07:00
Kazu Hirata	e0039b8d6a	Use llvm::less_second (NFC)	2022-06-04 22:48:32 -07:00
Jacob Weightman	814a0abcce	AMDGPU: allow reordering of functions in AMDGPUResourceUsageAnalysis The AMDGPUResourceUsageAnalysis was previously a CGSCC pass, and assumed that a function's callees were always analyzed prior to their callees. When it was refactored into a module pass, this assumption no longer always holds. This results in calls being erroneously identified as indirect, and reserving private segment space for them. This results in significantly slower kernel launch latency. This patch changes the order in which the module's functions are analyzed from the order in which they occur in the module to a post-order traversal of the call graph. Perhaps Clang always generates the module's functions in such an order, but this is not the case for the Cray Fortran compiler. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D126025	2022-06-03 15:55:54 -05:00
Matt Arsenault	dd7e407d81	AMDGPU: Move SpilledReg from MFI to SIRegisterInfo This isn't the most natural place for it, but it avoids a circular include dependency in an out of tree patch.	2022-06-02 17:11:24 -04:00
Julien Pages	2dfe419446	[AMDGPU] Improve codegen of extractelement/insertelement in some cases This patch improves the codegen of extractelement and insertelement for vector containing 8 elements. Before, a dag combine transformation was generating a sequence of 8 select/cmp. This patch changes the upper limit for this transformation and the movrel instruction will eventually be used instead. Extractlement/insertelement for vectors containing less than 8 elements are unchanged. Differential Revision: https://reviews.llvm.org/D126389	2022-06-02 17:05:55 -04:00
Joe Nash	3732cd59be	[AMDGPU] gfx11 vop3 and inherited vop instructions This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using the AssemblerPredicate = isGFX10Plus are also enabled. That predicate will be changed to isGFX10Only in a later patch. Patch 15/N for upstreaming of AMDGPU gfx11 architecture. Depends on D126468 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126475	2022-06-02 14:03:02 -04:00
Joe Nash	e4870c8357	[AMDGPU] gfx11 ds instructions MC layer support for ds instructions Contributors: Piotr Sobczak <Piotr.Sobczak@amd.com> Patch 14/N for upstreaming of AMDGPU gfx11 architecture. Depends on D126463 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D126468	2022-06-02 13:36:56 -04:00
Matt Arsenault	89b1808a2f	AMDGPU: Fix missing c++ mode comment	2022-06-01 21:14:48 -04:00
Stanislav Mekhanoshin	c9e242f6dd	[AMDGPU] Change GISel error handling for TFE on GFX90A Differential Revision: https://reviews.llvm.org/D126797	2022-06-01 11:07:25 -07:00
Scott Linder	2d43955cec	[AMDGPU][NFC] Refactor AMDGPUCallingConv.td Rename CalleeSavedRegs defs to avoid being overly specific: * CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_105 => CSR_AMDGPU_SGPRs * CSR_AMDGPU_SI_Gfx_SGPRs_4_29 + CSR_AMDGPU_SI_Gfx_SGPRs_64_105 => CSR_AMDGPU_SI_Gfx_SGPRs * CSR_AMDGPU_HighRegs => CSR_AMDGPU * CSR_AMDGPU_HighRegs_With_AGPRs => CSR_AMDGPU_GFX90AInsts * CSR_AMDGPU_SI_Gfx_With_AGPRs => CSR_AMDGPU_SI_Gfx_GFX90AInsts Introduce a class RegMask to mark the cases where we use the CalleeSavedRegs class purely as an expedient way to produce a mask. Update the names of these masks to not mention "CSR". Other targets also seem to do this, so a reasonable alternative is to actually update table-gen to include a new class to do this explicitly, but the current approach seems harmless so I opted to just make it more explicit. Reviewed By: arsenm, sebastian-ne Differential Revision: https://reviews.llvm.org/D109008	2022-06-01 16:24:09 +00:00
Matt Arsenault	0e1c71e4a4	CodeGen: Move getAddressSpaceForPseudoSourceKind into TargetMachine Avoid the dependency on TargetInstrInfo, which depends on the subtarget and therefore the individual function. Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo. In order to facilitate copying MachineFunctionInfo, we need to stop allocating these there. Alternatively we could allow targets to subclass PseudoSourceValueManager, and allocate them similarly to MachineFunctionInfo.	2022-06-01 09:45:40 -04:00
Stanislav Mekhanoshin	dec1283279	[AMDGPU] Fix image opcodes GlobalISel on gfx90a. - Correct flavor of an instruction was not selected. - GFX90A does not support TFE. Differential Revision: https://reviews.llvm.org/D126312	2022-05-31 14:07:46 -07:00
jeff	2e61dfb124	[AMDGPU] Instruction Type Pipeline This patch implements a DAG mutation which adds edges between different groups of instructions. The purpose is to try to generate code that conforms to a pipeline (groupA instructions occur before groupB, groupB -> groupC, and so on). Currently the pipeline order is hardcoded as VMEM->DSRead->MFMA->DSWrite, but the patch was designed to be easily extensible. Alias analysis is problematic for pipelining as memory instructions will usually not be able to be reordered w.r.t one another. Differential Revision: https://reviews.llvm.org/D125997	2022-05-31 17:48:52 +00:00
Joe Nash	e8860bee28	[AMDGPU] gfx11 Image instructions MC layer support for instructions in the MIMG encoding(Image instructions). Contributors: Carl Ritson <carl.ritson@amd.com> Patch 13/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125992 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126463	2022-05-31 10:53:35 -04:00
Ivan Kosarev	f199b2b00f	[AMDGPU][NFC] Refine defining the offset field for GFX10+ SMEM instructions. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126662	2022-05-31 09:54:51 +01:00
Ivan Kosarev	b4dbcba3b7	[AMDGPU][GFX9][NFC] Rename the base class for SMEM stores.	2022-05-30 10:31:59 +01:00
Ivan Kosarev	082822b381	[AMDGPU][GFX9] Support base+soffset+offset SMEM stores. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126388	2022-05-30 10:27:57 +01:00
Nicolai Hähnle	5df2893a9a	AMDGPU: Add G_AMDGPU_MAD_64_32 instructions These generic instructions are trivially selected to V_MAD_[IU]64_[IU]32 instructions when run on the VALU. When at least both factors are scalar, it is usually better to execute some or all of the instruction on the SALU. To this end, we lower the instruction to simpler instructions that are supported on the SALU when applying the register bank mapping. Differential Revision: https://reviews.llvm.org/D124843	2022-05-27 12:36:17 -05:00
Ivan Kosarev	b0ccf38b01	[AMDGPU][GFX9] Support base+soffset+offset SMEM loads. Resolves part of https://github.com/llvm/llvm-project/issues/38652 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D125700	2022-05-26 12:42:33 +01:00
serge-sans-paille	fb67d683db	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `7030654296` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D126417	2022-05-26 08:12:34 +02:00
Maksim Panchenko	bed9efed71	[MCDisassembler] Disambiguate Size parameter in tryAddingSymbolicOperand() MCSymbolizer::tryAddingSymbolicOperand() overloaded the Size parameter to specify either the instruction size or the operand size depending on the architecture. However, for proper symbolic disassembly on X86, we need to know both sizes, as an instruction can have two operands, and the instruction size cannot be reliably calculated based on the operand offset and its size. Hence, split Size into OpSize and InstSize. For X86, the new interface allows to fix a couple of issues: * Correctly adjust the value of PC-relative operands. * Set operand size to zero when the operand is specified implicitly. Differential Revision: https://reviews.llvm.org/D126101	2022-05-25 13:44:32 -07:00
Joe Nash	835e09c4c3	[AMDGPU] gfx11 FLAT Instructions MachineCode Support for FLAT type instructions Contributors: Sebastian Neubauer <sebastian.neubauer@amd.com> Patch 12/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125989 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D125992	2022-05-25 15:29:39 -04:00
Joe Nash	ef1ea5ac01	[AMDGPU] gfx11 vinterp instructions MC support A new instruction encoding. Some of these instructions were previously VOP3 encoded. Contributors: Carl Ritson <carl.ritson@amd.com> Patch 11/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125824 Reviewed By: critson Differential Revision: https://reviews.llvm.org/D125989	2022-05-25 14:59:16 -04:00
Joe Nash	1a51ab766f	[AMDGPU] gfx11 export instructions Contributors: Jay Foad <jay.foad@amd.com> Dmitry Preobrazhensky <d-pre@mail.ru> Patch 10/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125822 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D125824	2022-05-25 14:44:09 -04:00
Nicolai Hähnle	affa1b1cc5	AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane A later change will add a 3rd user, so factoring out the common code seems useful. Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place). (Resubmit with missing test added.) Differential Revision: https://reviews.llvm.org/D125324	2022-05-25 12:14:01 -05:00
Nicolai Hähnle	afc90101a5	Revert "AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane" This reverts commit `2a28467e53`.	2022-05-25 12:03:23 -05:00
Nicolai Hähnle	2a28467e53	AMDGPU/GISel: Factor out AMDGPURegisterBankInfo::buildReadFirstLane A later change will add a 3rd user, so factoring out the common code seems useful. Reorganizing the executeInWaterfallLoop causes some more COPYs to be generated, but those all fold away during instruction selection. Generating the comparisons uses generic instructions over machine instructions now which admittedly shouldn't make a difference (though it should make it easier to move the waterfall loop generation to another place). Differential Revision: https://reviews.llvm.org/D125324	2022-05-25 11:35:02 -05:00
Stanislav Mekhanoshin	5df6669d45	[AMDGPU] Enforce alignment of image vaddr on gfx90a Even though single address image instructions only use a single VGPR HW accesses 4 or 5 which creates alignment requirement. Fixes: SWDEV-316648 Differential Revision: https://reviews.llvm.org/D126009	2022-05-24 10:05:39 -07:00
Ivan Kosarev	1586e1dc95	[AMDGPU][MC][GFX11] Support base+soffset+offset SMEM loads. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126207	2022-05-24 15:13:14 +01:00
Dmitry Preobrazhensky	818cc9b285	[AMDGPU][MC][GFX940] Disable v_mac_f32_dpp Differential Revision: https://reviews.llvm.org/D126070	2022-05-23 15:49:44 +03:00
Jay Foad	9af56c676e	[AMDGPU] Mark SMEM cache invalidations as not reading memory This brings the MachineInstrs in line with the corresponding intrinsics which have side effects but do not access memory. It also matches how BUF cache invalidation instructions are defined. The lit test changes are just because the machine scheduler previously treated them like loads, and added an artificial scheduling edge from them to the exit SU, which caused them to be scheduled earlier. Differential Revision: https://reviews.llvm.org/D126074	2022-05-20 17:18:03 +01:00
Jay Foad	78ec59e6ae	[AMDGPU] Handle mandatory literals in isOperandLegal Extend SIInstrInfo::isOperandLegal to enforce a limit on the number of literal operands for all VALU instructions, not just VOP3. In particular it now handles VOP2 instructions with a mandatory literal operand like V_FMAAK_F32. Differential Revision: https://reviews.llvm.org/D126064	2022-05-20 16:14:00 +01:00
Jay Foad	5b18ef7256	[AMDGPU] Add verification for mandatory literals Extend the literal operand checking in SIInstrInfo::verifyInstruction to check VOP2 instructions like V_FMAAK_F32 which have a mandatory literal operand. The rule is that src0 can also be a literal, but only if it is the same literal value. AMDGPUAsmParser::validateConstantBusLimitations already handles this correctly. Differential Revision: https://reviews.llvm.org/D126063	2022-05-20 16:14:00 +01:00
Dmitry Preobrazhensky	f598dfb3bf	[AMDGPU][MC][GFX8+] Correct SMEM offset parsing Differential Revision: https://reviews.llvm.org/D125907	2022-05-20 14:00:34 +03:00
Jay Foad	9ece051847	[AMDGPU] Mark s_get_waveid_in_workgroup as not reading memory It is already marked as having side effects, at least in MIR. It does not interact with anything else that is modelled as a memory access either in IR or MachineIR. Differential Revision: https://reviews.llvm.org/D125985	2022-05-19 21:25:46 +01:00
Jay Foad	86b55edab6	[AMDGPU] Mark s_getreg as having side effects instead of reading memory s_getreg does not interact with anything else that is modelled as a memory access either in IR or MachineIR. Differential Revision: https://reviews.llvm.org/D125968	2022-05-19 21:25:46 +01:00
Jay Foad	d14f2a6359	[AMDGPU] Allow multiple uses of the same literal in SOP2/SOPC AMDGPUAsmParser::validateSOPLiteral already knew about this but SIInstrInfo::verifyInstruction did not. Differential Revision: https://reviews.llvm.org/D125976	2022-05-19 16:42:20 +01:00
Joe Nash	ac2ff258d6	[AMDGPU] gfx11 scalar memory instructions Contributors: Mirko Brkusanin <Mirko.Brkusanin@amd.com> Patch 9/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125820 Reviewed By: kosarev, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D125822	2022-05-19 10:27:47 -04:00
Joe Nash	729467acef	[AMDGPU] gfx11 LDSDIR instructions MC support Contributors: Carl Ritson <carl.ritson@amd.com> Patch 8/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125498 Reviewed By: critson, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D125820	2022-05-19 10:08:47 -04:00
Dmitry Preobrazhensky	44673278e0	[AMDGPU][MC][GFX940] Add SMFMAC aliases Differential Revision: https://reviews.llvm.org/D125888	2022-05-19 13:40:48 +03:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Dmitry Preobrazhensky	32ca9bd7b5	[AMDGPU][MC][GFX940] Correct tied operand decoding for smfmac opcodes Differential Revision: https://reviews.llvm.org/D125790	2022-05-18 15:39:30 +03:00
Dmitry Preobrazhensky	169416c64a	[AMDGPU][MC][GFX7] Disable cache policy modifiers with SMRD Differential Revision: https://reviews.llvm.org/D125799	2022-05-18 15:17:49 +03:00
Dmitry Preobrazhensky	95a8af2750	[AMDGPU][MC][NFC] MUBUF code cleanup Removed code that is no longer used after https://reviews.llvm.org/D124485. Differential Revision: https://reviews.llvm.org/D125811	2022-05-18 15:00:38 +03:00
Jay Foad	e2926501d8	[AMDGPU] Aggressively fold immediates in SIShrinkInstructions Fold immediates regardless of how many uses they have. This is expected to increase overall code size, but decrease register usage. Differential Revision: https://reviews.llvm.org/D114644	2022-05-18 11:04:33 +01:00
Jay Foad	3eb2281bc0	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643	2022-05-18 10:19:35 +01:00
Jay Foad	dd12c3433e	[AMDGPU] Shrink F16 MAD/FMA to MADAK/MADMK/FMAAK/FMAMK on GFX10 Differential Revision: https://reviews.llvm.org/D125803	2022-05-18 10:00:06 +01:00
Shao-Ce SUN	25af3afa67	[NFC][AMDGPU][CodeGen] Use ArrayRef in TargetLowering functions Based on D123467. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124508	2022-05-18 10:50:23 +08:00
Stanislav Mekhanoshin	dee3190293	[AMDGPU] Add llvm.amdgcn.global.load.lds intrinsic Differential Revision: https://reviews.llvm.org/D125279	2022-05-17 12:35:27 -07:00
Stanislav Mekhanoshin	a09af86693	[AMDGPU] Enable FLAT LDS DMA on gfx9/10 before gfx940 We always had global and scratch loads to LDS in the gfx9, but did not handle it. These were available via the 'lds' encoding bit. In gfx940 this bit was reused as 'svs' which resulted in new '_lds' opcodes effectively pushing this bit into the opcode, but functionally it is the same. These instructions are also available on gfx10. Differential Revision: https://reviews.llvm.org/D125126	2022-05-17 12:16:37 -07:00
Joe Nash	d21b9b4946	[AMDGPU] gfx11 scalar alu instructions MC layer support for SOP(scalar alu operations) including encoding support for s_delay_alu and s_sendmsg_rtn. Contributors: Jay Foad <jay.foad@amd.com> Patch 7/N for upstreaming of AMDGPU gfx11 architecture. Depends on D125319 Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D125498	2022-05-17 13:35:41 -04:00
Stanislav Mekhanoshin	791ec1c68e	[AMDGPU] Add intrinsics llvm.amdgcn.{raw\|struct}.buffer.load.lds Differential Revision: https://reviews.llvm.org/D124884	2022-05-17 10:32:13 -07:00
Stanislav Mekhanoshin	332b73fe12	[AMDGPU] Revert wide LDS DMA support. This reverts `ffbee7acdc`, see also bug 37653 which it was fixing. The bug claims this is an undocumented feature which actually works. In the reality it is documented as not working for a good reason. It likely does something, but it is useless anyway. These instructions write into the LDS. The LDS address is: M0 + inst_offset + (TIDinWave * 4). For a store wider than a DWORD neighboring lanes will overwrite each other. Differential Revision: https://reviews.llvm.org/D125409	2022-05-16 11:23:35 -07:00
Joe Nash	6ef17f20d9	[AMDGPU] Mark sendmsg hasSideEffects. NFC Address the FIXME by marking the sendmsg instructions with hasSideEffects. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D125569	2022-05-16 09:59:27 -04:00
Jay Foad	27fa41583f	[AMDGPU] Shrink MAD/FMA to MADAK/MADMK/FMAAK/FMAMK on GFX10 On GFX10 VOP3 instructions can have a literal operand, so the conversion from VOP3 MAD/FMA to VOP2 MADAK/MADMK/FMAAK/FMAMK will not happen in SIFoldOperands. The only benefit of the VOP2 form is code size, so do it in SIShrinkInstructions instead. Differential Revision: https://reviews.llvm.org/D125567	2022-05-16 15:15:23 +01:00
Joe Nash	c70259405c	[AMDGPU] gfx11 BUF Instructions Includes MachineCode layer support and tests, and MIR tests not requiring CodeGen pass changes. Includes a small change in SMInstructions.td to correct encoded bits. Contributors: Petar Avramovic <Petar.Avramovic@amd.com> Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> Depends on D125316 Patch 6/N for upstreaming of AMDGPU gfx11 architecture. Reviewed By: dp, Petar.Avramovic Differential Revision: https://reviews.llvm.org/D125319	2022-05-16 09:41:40 -04:00
Jay Foad	c1af2d329f	[AMDGPU] SIShrinkInstructions: change static functions to methods This is a mechanical change to avoid passing MRI and TII around explicitly. NFC. Differential Revision: https://reviews.llvm.org/D125566	2022-05-16 09:43:41 +01:00
Jay Foad	dfb006c0c9	[AMDGPU] Extract SIInstrInfo::removeModOperands. NFC. Make this an externally callable function for use in a future patch. Differential Revision: https://reviews.llvm.org/D125565	2022-05-16 09:43:41 +01:00
Sheng	c644488a8b	Rename `MCFixedLenDisassembler.h` as `MCDecoderOps.h` The name `MCFixedLenDisassembler.h` is out of date after D120958. Rename it as `MCDecoderOps.h` to reflect the change. Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D124987	2022-05-15 08:44:58 +08:00
Ivan Kosarev	bf5fc0d603	[AMDGPU][NFC] Remove unused function. Introduced in https://reviews.llvm.org/rG229d5e669bbbe7ca38ad832627a9809405939f1b and then became unused in https://reviews.llvm.org/D19584 Reviewed By: foad, dp Differential Revision: https://reviews.llvm.org/D125385	2022-05-12 08:52:06 +01:00
Ivan Kosarev	cb67b2ccc4	[AMDGPU][GFX10] Support base+soffset+offset SMEM stores. Also makes another step towards resolving https://github.com/llvm/llvm-project/issues/38652 Reviewed By: foad, dp Differential Revision: https://reviews.llvm.org/D125380	2022-05-12 08:48:05 +01:00
Austin Kerbow	2db700215a	[AMDGPU] Add llvm.amdgcn.sched.barrier intrinsic Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If there is a need to have highly optimized codegen and kernel developers have knowledge of inter-wave runtime behavior which is unknown to the compiler this builtin can be used to tune scheduling. This intrinsic creates a barrier between scheduling regions. The immediate parameter is a mask to determine the types of instructions that should be prevented from crossing the sched_barrier. In this initial patch, there are only two variations. A mask of 0 means that no instructions may be scheduled across the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing instructions may cross the sched_barrier. Note that this intrinsic is only meant to work with the scheduling passes. Any other transformations that may move code will not be impacted in the ways described above. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124700	2022-05-11 13:22:51 -07:00
Joe Nash	a0a406b257	[AMDGPU] gfx11 Decode wider instructions. NFC Refactor to pass a templatized size parameter to the decoder to allow wider than 64bit decodes in a later patch. Contributors: Jay Foad <jay.foad@amd.com> Depends on D125261 Patch 5/N for upstreaming of AMDGPU gfx11 architecture. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D125316	2022-05-11 11:05:58 -04:00
Joe Nash	18ed279a3a	[AMDGPU] gfx11 subtarget features & early tests Tablegen definitions for subtarget features and cpp predicate functions to access the features. New Sub-TargetProcessors and common latencies. Simple changes to MIR codegen tests which pass on gfx11 because they have the same output as previous subtargets or operate on pseudo instructions which are reused from previous subtargets. Contributors: Jay Foad <jay.foad@amd.com> Petar Avramovic <Petar.Avramovic@amd.com> Patch 4/N for upstreaming of AMDGPU gfx11 architecture Depends on D124538 Reviewed By: Petar.Avramovic, foad Differential Revision: https://reviews.llvm.org/D125261	2022-05-11 10:31:49 -04:00
Mehdi Amini	3ffb08844c	Remove unused variable (fix -Werror build on MSVC)	2022-05-10 21:04:52 +00:00
jeff	f822db7670	[AMDGPU] Allow for MFMA Inst Clustering This patch adds cluster edges between independent MFMA instructions. Additionally, it propogates all predecessors of cluster insts to the root of the cluster(s), and all successors to the leaf(ves) of the cluster(s) -- this is done to remove the possibility that those insts will be interspersed within the cluster. Reviewed By: kerbowa Differential Revision: https://reviews.llvm.org/D124678	2022-05-10 12:57:40 -07:00
jeff	3ff8ee2447	[NFC] Fix typo Reviewed By: kerbowa Differential Revision: https://reviews.llvm.org/D124647	2022-05-10 12:11:21 -07:00
Ivan Kosarev	88f04bdbd8	[AMDGPU][GFX10] Support base+soffset+offset SMEM loads. Also makes a step towards resolving https://github.com/llvm/llvm-project/issues/38652 Reviewed By: foad, dp Differential Revision: https://reviews.llvm.org/D125117	2022-05-10 16:17:14 +01:00
Nicolai Hähnle	6c2a01ce3a	AMDGPU/SDAG: Refine the fold to v_mad_[iu]64_[iu]32 Only fold for uniform values on pre-GFX9 chips. GFX9+ allow us to keep the calculation entirely on the SALU. For subtargets where integer multiplication isn't full-rate, avoid folding if the multiply has too many uses. Finally, we expand 64x32 and 64x64 multiplies here as well, if they feed into an addition. This results in better code generation than the generic expansion for such multiplies because we end up using the accumulator of the MAD instructions. Differential Revision: https://reviews.llvm.org/D123835	2022-05-10 09:15:51 -05:00
Simon Pilgrim	7e3ef7dcd2	[AMDGPU] lowerEXTRACT_VECTOR_ELT - fold from a SCALAR_TO_VECTOR source As suggested by @foad on D124839 If we're extracting a vector element that originally came from a scalar_to_vector, then avoid the bitcasting of a vector type and perform the shift masking on the (any-extended) scalar source directly, making use of the fact that the upper elements of a scalar_to_vector are all undef. Differential Revision: https://reviews.llvm.org/D125173	2022-05-07 20:23:31 +01:00
Joe Nash	7e71a03966	[AMDGPU] Split FeatureAtomicFaddInsts FeatureAtomicFaddInsts is replaced with three more granular features. Contributors: Petar Avramovic <Petar.Avramovic@amd.com> Patch 3/N for upstreaming of AMDGPU gfx11 architecture Depends on D124537 Reviewed By: foad, #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D124538	2022-05-05 13:27:45 -04:00
Jay Foad	ba6c8d42d4	[AMDGPU] Combine DPP mov even if old reg def is in different BB Given a DPP mov like this: %2:vgpr_32 = V_MOV_B32_e32 0, implicit $exec ... %3:vgpr_32 = V_MOV_B32_dpp %2, %1, 1, 1, 1, 0, implicit $exec this patch just removes a check that %2 (the "old reg") was defined in the same BB as the DPP mov instruction. GCNDPPCombine requires that the MIR is in SSA form so I don't understand why the BB matters. This lets the optimization work in more real world cases when the definition of %2 gets hoisted out of a loop. Differential Revision: https://reviews.llvm.org/D124182	2022-05-05 11:30:31 +01:00
Mariusz Sikora	2417de2758	[AMDGPU] Use d16 flag for image.sample instructions Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled. This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic. Differential Revision: https://reviews.llvm.org/D124232	2022-05-05 06:29:19 +02:00
Stanislav Mekhanoshin	63f21f4cc7	[AMDGPU] Handle LDS DMA and LDS_DIRECT hazards There shall be 1 wait state between M0 write and LDS DMA/LDS_DIRECT use. Differential Revision: https://reviews.llvm.org/D124550	2022-05-04 14:45:16 -07:00
Jon Chesterfield	bc78c09952	[amdgpu] Elide module lds allocation in kernels with no callees Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block as before if the attribute is missing or has its default value of true. Patch uses the new attribute to detect the simplest possible instance of this, where a kernel makes no calls and thus cannot call any functions that use LDS. Tests updated to match, coverage was already good. Interesting cases is in lower-module-lds-offsets where annotating the kernel allows the backend to pick a different (in this case better) variable ordering than previously. A later patch will avoid moving kernel variables into module.lds when the kernel can have this attribute, allowing optimal ordering and locally unused variable elimination. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122091	2022-05-04 22:42:07 +01:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since `fa5a4e1b95` detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Nicolai Hähnle	8b42e6d057	AMDGPU: Remove redundant call to MachineInstrBuilder::setMBB setInstrAndDebugLoc also sets the basic block automatically. Differential Revision: https://reviews.llvm.org/D124809	2022-05-03 07:49:20 -05:00
hsmahesha	589b9df4e1	[AMDGPU] Fix scalar_to_vector for v8i16/v8f16 so that the stack access is avoided. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124734	2022-05-03 07:28:15 +05:30
hsmahesha	3175323ce1	[AMDGPU][NFC] Make lowerINSERT_VECTOR_ELT() more readable by moving around the code and by adding more comments, which would later help during any required clean-up. Differential Revision: https://reviews.llvm.org/D124733	2022-05-03 07:28:15 +05:30
Nicolai Hähnle	deaa678137	AMDGPU/SDAG: Factor out the fold (add (mul x, y), y) --> mad_[iu]64_[iu]32 Refactor to simplify a follow-up change. No functional change intended. However, there is a rather subtle logic change: the subsequent combines (e.g. reassociation) are skipped always when one of the operands of the add is a mul, instead of only when additionally mad64_32 etc. are available. This change makes sense because the subsequent combines should never apply when one of the operands is a mul. Differential Revision: https://reviews.llvm.org/D123833	2022-05-02 17:40:03 -05:00
Stanislav Mekhanoshin	51e02409f0	[AMDGPU] Produce waitcounts for LDS DMA MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written can be accessed. A load from LDS to VMEM does not need a wait. Differential Revision: https://reviews.llvm.org/D124626	2022-04-29 11:14:11 -07:00
Joe Nash	813e521e55	[AMDGPU] Add gfx11 subtarget ELF definition This is the first patch of a series to upstream support for the new subtarget. Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 1/N for upstreaming AMDGPU gfx11 architectures. Reviewed By: foad, kzhuravl, #amdgpu Differential Revision: https://reviews.llvm.org/D124536	2022-04-29 12:27:17 -04:00
Ivan Kosarev	6ddf2a824d	[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling. As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation and then issuing their VMEM loads, which in turn leads the VALU unit to idle. This patch tries to prevent this by temporarily raising the wave's priority. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D124246	2022-04-27 14:37:18 +01:00
Stanislav Mekhanoshin	6a24e37219	[AMDGPU] Remove now unused variable HasLdsModifier. NFC.	2022-04-26 17:49:30 -07:00
Stanislav Mekhanoshin	0274811b5a	[AMDGPU] Add both mayLoad and mayStore to MUBUF LDS opcodes Differential Revision: https://reviews.llvm.org/D124483	2022-04-26 17:30:24 -07:00
Stanislav Mekhanoshin	00d84a9f92	[AMDGPU] Remove vdata from buffer to lds load Differential Revision: https://reviews.llvm.org/D124485	2022-04-26 17:16:26 -07:00
Stanislav Mekhanoshin	a9ccc7bc54	[AMDGPU] Properly mark MUBUF and FLAT LDS DMA instructions. NFC. Add these bits to the MUBUF and FLAT LDS DMA instructions: - LGKM_CNT - these operate on LDS; - VALU - SPG 3.9.8: This instruction acts as both a MUBUF and VALU instruction; Codegen currently does not produce any of this, so the change is NFC. Differential Revision: https://reviews.llvm.org/D124472	2022-04-26 14:20:26 -07:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Piotr Sobczak	c6afbdb5d2	Revert "[AMDGPU] Use d16 flag for image.sample instructions" This reverts commit `d1762fc454`. Reverting D124232 as the buildbot reported some errors in sanitizers.	2022-04-25 17:18:49 +02:00
Mariusz Sikora	d1762fc454	[AMDGPU] Use d16 flag for image.sample instructions Image.sample instruction can be forced to return half type instead of float when d16 flag is enabled. This patch adds new pattern in InstCombine to detect if output of image.sample is used later only by fptrunc which converts the type from float to half. If pattern is detected then fptrunc and image.sample are combined to single image.sample which is returning half type. Later in Lowering part d16 flag is added to image sample intrinsic. Differential Revision: https://reviews.llvm.org/D124232	2022-04-25 13:05:52 +01:00
Matt Arsenault	0ecbb683a2	TableGen/GlobalISel: Make address space/align predicates consistent The builtin predicate handling has a strange behavior where the code assumes that a PatFrag is a stack of PatFrags, and each level adds at most one predicate. I don't think this particularly makes sense, especially without a diagnostic to ensure you aren't trying to set multiple at once. This wasn't followed for address spaces and alignment, which could potentially fall through to report no builtin predicate was added. Just switch these to follow the existing convention for now.	2022-04-22 15:48:07 -04:00
Matt Arsenault	794a0bb547	AMDGPU: Directly implement computeKnownBits for workitem intrinsics Currently metadata is inserted in a late pass which is lowered to an AssertZext. The metadata would be more useful if it was inserted earlier after inlining, but before codegen. Probably shouldn't change anything now. Just replacing the late metadata annotation needs more work, since we lose out on optimizations after these are lowered to CopyFromReg. Seems to be slightly better than relying on the AssertZext from the metadata. The test change in cvt_f32_ubyte.ll is a quirk from it using -start-before=amdgpu-isel instead of running the usual codegen pipeline.	2022-04-22 10:49:50 -04:00

... 3 4 5 6 7 ...

7293 Commits