llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	32aa35b504	Drop empty string literals from static_assert (NFC) Identified with modernize-unary-static-assert.	2022-09-03 11:17:47 -07:00
Kazu Hirata	fedc59734a	[llvm] Use range-based for loops (NFC)	2022-09-03 11:17:40 -07:00
Juan Manuel MARTINEZ CAAMAÑO	ee761374f7	[AMDGPU][NFC] Fix typo in commment: replace SiMemOpInfo by SIMemOpInfo	2022-09-02 16:45:10 +02:00
Jon Chesterfield	a28bbd00c6	[amdgpu][nfc] Factor predicate out of findLDSVariablesToLower	2022-08-31 15:44:51 +01:00
Stanislav Mekhanoshin	fd1f8c85f2	[AMDGPU] Limit TID / wavefrontsize uniformness to 1D kernels If a kernel has uneven dimensions we can have a value of workitem-id-x divided by the wavefrontsize non-uniform. For example dimensions (65, 2) will have workitems with address (64, 0) and (0, 1) packed into a same wave which gives 1 and 0 after the division by 64 respectively. Unfortunately, this limits the optimization to OpenCL only and only if reqd_work_group_size attribute is set. This patch limits it to 1D kernels, although that shall be possible to perform this optimization is the size of the X dimension is a power of 2, we just do not currently have infrastructure to query it. Note that presence of amdgpu-no-workitem-id-y attribute does not help as it only hints the lack of the workitem-id-y query, but not the absence of the actual 2nd dimension, therefore affecting just the SGPR allocation. Differential Revision: https://reviews.llvm.org/D132879	2022-08-30 12:22:08 -07:00
Joe Nash	3e39ab25e6	[AMDGPU][GFX11] Fix dst register class for V_CVT_U32_U16 This instruction was referring to the wrong VOPProfile, likely due to a typo, leading to an incorrect destination register type. The MC layer will care about this change, but is NFC while 16-bit values actually use 32 bit registers. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132878	2022-08-30 14:01:25 -04:00
Joe Nash	70e7a1257c	[AMDGPU][NFC] Allow separate RC for VOP3 DPP Dst Create a field in VOPProfile called DstRCVOP3DPP to allow the VOP3 versions of DPP instructions to have a different destination register class than the non-VOP3 encoding. NFC for current instructions, but planned to be functional in upcoming ones. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D132673	2022-08-29 11:22:07 -04:00
Kazu Hirata	6ed2cb4ad5	Revert "[llvm] Use llvm::is_contained (NFC)" This reverts commit `ebf574f59a`. This patch seems to cause build failures on Windows.	2022-08-28 18:52:49 -07:00
Kazu Hirata	ebf574f59a	[llvm] Use llvm::is_contained (NFC)	2022-08-28 17:35:03 -07:00
Kazu Hirata	9861a68a7c	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 10:41:50 -07:00
Kazu Hirata	ce9f007c7c	[llvm] Use llvm::find_if (NFC)	2022-08-28 10:41:48 -07:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Stanislav Mekhanoshin	813ae2871d	[AMDGPU] Detect uniformness of TID / wavefrontsize A value of 'workitemid / wavefrontize' or 'workitemid & (wavefrontize - 1)' is wave uniform. Differential Revision: https://reviews.llvm.org/D132511	2022-08-26 23:26:08 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Simon Pilgrim	3cf48963ff	[AMDGPU] Remove old isCheapToSpeculateCttz FIXME As confirmed on D132520 - this should always return true	2022-08-24 15:53:38 +01:00
Pierre van Houtryve	59cf9dd923	[AMDGPU][GISel] Enable Selection of ADD3 for G_PTR_ADD Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be simplified into a single ADD3 instruction instead of two adds. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D131254	2022-08-24 14:44:19 +00:00
Alex Richardson	38107171ed	[RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI This commit moves the information on whether a register is constant into the Tablegen files to allow generating the implementaiton of isConstantPhysReg(). I've marked isConstantPhysReg() as final in this generated file to ensure that changes are made to tablegen instead of overriding this function, but if that turns out to be too restrictive, we can remove the qualifier. This should be pretty much NFC, but I did notice that e.g. the AMDGPU generated file also includes the LO16/HI16 registers now. The new isConstant flag will also be used by D131958 to ensure that constant registers are marked as call-preserved. Differential Revision: https://reviews.llvm.org/D131962	2022-08-24 14:16:20 +00:00
Jay Foad	1bca81c12e	[AMDGPU] Remove unused S_ADD_U64_CO_PSEUDO and S_SUB_U64_CO_PSEUDO	2022-08-24 10:28:35 +01:00
Raghav	79d2529c10	AMDGPU/MetaData: Restrict address space key to only be emitted for "global_buffer" and "dynamic_shared_pointer" This matches .address_space docs at https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v3 Differential Revision: https://reviews.llvm.org/D132145	2022-08-23 14:01:01 -04:00
Thomas Symalla	5ee0fb7ed2	[NFC][AMDGPU] Some cleanups in the SIOptimizeExecMasking pass. Fix typos and remove an unused argument. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132292	2022-08-23 18:16:47 +02:00
Philip Reames	104fa367ee	[TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC] This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both. This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through. I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.	2022-08-22 15:16:39 -07:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Kazu Hirata	8b1b0d1d81	Revert "Use std::is_same_v instead of std::is_same (NFC)" This reverts commit `c5da37e42d`. This patch seems to break builds with some versions of MSVC.	2022-08-20 23:00:39 -07:00
Kazu Hirata	c5da37e42d	Use std::is_same_v instead of std::is_same (NFC)	2022-08-20 22:36:26 -07:00
Thomas	e565e2fa5c	[NFC][AMDGPU] Fix typo.	2022-08-20 08:30:42 +02:00
Austin Kerbow	b0f4678b90	[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D132079	2022-08-19 15:38:36 -07:00
jeff	20cf170e68	[InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics Certain address space dependent optimizations, like SeperateConstOffsetFromGEP, assume agreement between the address space of the recursive uses and the address space of the def. If this assumption is invalid, then optimizations may or may not be correct depending on properties of an address space for a given target, the address spaces of recursive uses, and the optimization being done. This patch infers the previous address space for flat_atomic ptr arguments. As a result, the address spaces of the uses in flat_atomic cases will agree with the address space in recursive defs. If this results in non-flat address space, then isel may infer a different intrinsic. For example, if the result is a flat_atomic using global address space, then it will be lowered to the corresponding global_atomic intrinsic. Change-Id: Ifcd981709dc2ea94d4acbcb84efe7176593ec8c7	2022-08-19 11:37:20 -07:00
Joe Nash	063ee26ea3	[AMDGPU] Update comment on shrinking dpp. NFC	2022-08-18 11:29:32 -04:00
Jeffrey Byrnes	1c8d7ea973	[AMDGPU] Implement pipeline solver for non-trivial pipelines Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm. Differential Revision: https://reviews.llvm.org/D130797	2022-08-17 16:21:59 -07:00
Daniil Fukalov	7ed3d81333	[NFCI] Move cost estimation from TargetLowering to TargetTransformInfo. TragetLowering had two last InstructionCost related `getTypeLegalizationCost()` and `getScalingFactorCost()` members, but all other costs are processed in TTI. E.g. it is not comfortable to use other TTI members in these two functions overrided in a target. Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout parameter - it was always passed from TTI. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D117723	2022-08-18 00:38:55 +03:00
Ivan Kosarev	7a355e9027	[AMDGPU][MC][NFC] Refine SMEM store, probe and discard definitions. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D131968	2022-08-17 13:54:26 +01:00
Kazu Hirata	6d9cd9199a	Use llvm::all_of (NFC)	2022-08-14 16:25:36 -07:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
David Stuttard	1d1cc05539	AMDGPU: mbcnt allow for non-zero src1 for known-bits Src1 for mbcnt can be a non-zero literal or register. Take this into account when calculating known bits. Differential Revision: https://reviews.llvm.org/D131478	2022-08-11 13:23:43 +01:00
Evgenii Stepanov	8ea1cf3111	Revert "[AMDGPU] SIFixSGPRCopies refactoring" Breaks ASan tests. This reverts commit `3f8ae7efa8`.	2022-08-10 11:32:46 -07:00
Venkata Ramanaiah Nalamothu	486594119d	[AMDGPU] Fix prologue/epilogue markers in .debug_line table for trivial functions All the prologue instructions should have unknown source location co-ordinates while the epilogue instructions should have source location of last non-debug instruction after which epilogue instructions are insrted. This ensures the prologue/epilogue markers are generated correctly in the line table. Changes are brought in from the downstream CFI patches. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D131485	2022-08-10 23:00:19 +05:30
alex-t	3f8ae7efa8	[AMDGPU] SIFixSGPRCopies refactoring This change finalizes the series of patches aiming to replace old strategy of VGPR to SGPR copies loweriong. Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed. Pass main loop is no longer used for the MIR changes but collect information for further analysis. Actual MIR lowering happens further according the analysys result in the set of separate functions. Another important change concerns the order of lowering: VGPR to SGPR copies lowering is done first to have priority on the rest of the MIR changes. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131246	2022-08-10 00:51:57 +02:00
Yaxun (Sam) Liu	e780648a15	[AMDGPU] Unify unreachable intrinsics si-annotate-control-flow does depth first traversal of BB's of a function to insert amdgcn if intrinsics for conditional branches so that isel can generate correct instructions later. si-annotate-control-flow checks whether the successor BB for the 'else' branch of a conditional branch has been visited. If it has been visited, si-annotate-control-flow assumes the conditional branch has been handled and will not try to insert if intrinsic for it. This assumption is not correct when the IR contains multiple unreachable BB's. Then 'if' intrinscs are not inserted and incorrect ISA are generated. This patch fixes the issue by let amdgpu-unify-divergent-exit-nodes unify unreachables even if they are uniformly reached. In this way the IR will not contain multiple exits, and structurizer is able to structurize the IR containing one unified exit. Reviewed by: Ruiling Song, Matt Arsenault Differential Revision: https://reviews.llvm.org/D131181 Fixes: SWDEV-343244	2022-08-09 10:23:32 -04:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	e20d210eef	[llvm] Qualify auto (NFC) Identified with readability-qualified-auto.	2022-08-07 23:55:27 -07:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	d0ec61c9ff	[Target] Remove unused forward declarations (NFC)	2022-08-07 00:16:16 -07:00
Leon Clark	6a275cd53c	Transform illegal intrinsics to V_ILLEGAL Related tasks: - SWDEV-240194 - SWDEV-309417 - SWDEV-334876 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123693	2022-08-06 08:59:00 +01:00
Mirko Brkusanin	19bb535ed9	[AMDGPU] Remove unused MIMG tablegen variants There are no AMDGPUSampleVariant versions for _G16, it is treated more like a modifier for derivatives (_D) (also for intrinsics where it is overloaded type instead of part of instrinsic name) so we ended up making more variants for these instruction then we actually needed. 32-bit derivatives need 6 dwords at most, while 16-bit need 4 at most. Using same AMDGPUSampleVariant for both, we ended up creating 2 extra variants per instruction than were necessary. In total this deletes 260 unused tablegen records. Differential Revision: https://reviews.llvm.org/D131252	2022-08-05 15:30:47 +02:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Dmitry Preobrazhensky	05b3aadfff	[AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16 Enable SGPRs for the following operands of these opcodes: - src operands of VOP3 variant. - src2 operand of DPP variants. Differential Revision: https://reviews.llvm.org/D130989	2022-08-03 15:08:23 +03:00
Dmitry Preobrazhensky	ae553f9e49	[AMDGPU][MC][GFX10] Correct encoding of VOP3 v_cmpx* opcodes Encode dst=EXEC but allow disassembler accept any dst value. Differential Revision: https://reviews.llvm.org/D130978	2022-08-03 15:03:44 +03:00
Austin Kerbow	3dfa562643	[AMDGPU] Add CL option for max-ilp scheduler. When compiling for multiple targets the scheduler that is selected via the -misched option is applied globally. This patch adds a target CL option instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131022	2022-08-02 16:52:14 -07:00
Austin Kerbow	40eec27618	[AMDGPU] Add llvm_unreachable to switch statement added in `d7100b398`.	2022-08-02 13:45:38 -07:00
Austin Kerbow	d7100b398b	[AMDGPU] Add GCNMaxILPSchedStrategy Creates a new scheduling strategy that attempts to maximize ILP for a single wave. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130869	2022-08-02 13:21:24 -07:00
Alexander Timofeev	a321d95b59	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the `2e29b0138c` we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-08-02 18:37:57 +02:00
Jay Foad	e301e071ba	[AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline This pass seems to have very little effect because all it does is hoist some instructions, but it is followed later in the codegen pipeline by the IR CodeSinking pass which does the opposite. Differential Revision: https://reviews.llvm.org/D130258	2022-08-02 17:35:20 +01:00
Jay Foad	c24d68fff1	[AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress This improves a corner case where v_fmac can be converted to v_fma on GFX10+ even if it has a literal operand. Differential Revision: https://reviews.llvm.org/D130992	2022-08-02 17:27:11 +01:00
Vang Thao	7fc52d7c8b	[AMDGPU] Fix DGEMM hazard for GFX90a For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert wait-states to avoid data hazard. However when there is a DGEMM instruction in-between them, SQ incorrectly disables the wait-states thus the data hazard needs to be handled with this workaround. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130677	2022-08-01 11:56:22 -07:00
Piotr Sobczak	f29a19b0b8	[AMDGPU] Extend cases for ReadM0MovRelInterpHazard Extend hazard recognizer of ReadM0MovRelInterpHazard with DS_READ_ADDTID and DS_WRITE_ADDTID, as they also require a manually inserted S_NOP after SALU writing m0. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130783	2022-08-01 17:59:33 +02:00
Dmitry Preobrazhensky	3aae8cd842	[AMDGPU][MC] Verify selection of LDS MUBUF opcodes Differential Revision: https://reviews.llvm.org/D130761	2022-08-01 16:44:39 +03:00
Dmitry Preobrazhensky	bb901dcc5a	[AMDGPU][MC][GFX940] Correct disassembly of MFMA opcodes Add a decoder table for GFX940 MFMA opcodes. Differential Revision: https://reviews.llvm.org/D130759	2022-08-01 16:00:47 +03:00
Pierre van Houtryve	a847e3dc52	[NFC][AMDGPU] Fix typo in SIRegisterInfo.cpp	2022-08-01 07:01:33 -04:00
Petar Avramovic	e8d260753e	[AMDGPU] gfx11 allow dlc for MUBUF atomics Add MC support for dlc in gfx11 MUBUF atomic instructions. Differential Revision: https://reviews.llvm.org/D129075	2022-08-01 12:18:01 +02:00
Austin Kerbow	7898426a72	[AMDGPU] Remove unused function	2022-07-30 07:47:35 -07:00
Simon Pilgrim	49c0980eac	Fix Wdocumentation warning. NFC. warning: '\returns' command used in a comment that is attached to a function returning void	2022-07-30 15:41:13 +01:00
Simon Pilgrim	276480b1d3	[AMDGPU] Fix \|\| vs && precedence warning. NFC.	2022-07-30 14:02:54 +01:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Austin Kerbow	2c82a126d7	[AMDGPU] Omit unnecessary waitcnt before barriers It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an exception when traps are enabled. Add a new subtarget feature which tracks which HW has this ability. Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D130722	2022-07-29 11:12:36 -07:00
Mirko Brkusanin	6a1aa627fa	[AMDGPU] Enable image_gather4h instruction for gfx10 and gfx11 Differential Revision: https://reviews.llvm.org/D130764	2022-07-29 15:42:06 +02:00
Jay Foad	3cfa9b1431	[AMDGPU] user-sgpr-init16-bug does not apply to gfx1103 Differential Revision: https://reviews.llvm.org/D130347	2022-07-29 14:21:13 +01:00
Matt Arsenault	ef906f287e	AMDGPU: Fix assertion when printing unreachable functions Since `814a0abcce`, this would break if we had a function in the module that becomes dead in any codegen IR pass. The function wasn't deleted since it was initially used in dead code, but is detached from the call graph and doesn't appear in the PO traversal. Do a second walk over the module to populate the resources of any functions which weren't already processed.	2022-07-29 08:57:43 -04:00
Alexander Timofeev	d7ae1a9097	Revert "[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs" This reverts commit `76d9ae924c`. because it causes several VK CTS tests to fail	2022-07-29 14:19:07 +02:00
Changpeng Fang	2b731b30a7	AMDGPU: Take care of "tied" operand when removeOperand Summary: Flat scratch load of D16 type by default has tied vdst_in operand (with vdst). This should be taken care of at the time of "removeOperand" in eliminateFrameIndex. Otherwise we will hit an assert saying "Cannot move tied operands". This patch unties vdst_in before the move, and retie it with vdst afterwards. Reviewers: arsenm, foad Differential Revision: https://reviews.llvm.org/D130537	2022-07-28 17:30:49 -07:00
Anshil Gandhi	5c38056431	[AMDGPU][Scheduler] Avoid initializing Register pressure tracker when tracking is disabled When register pressure tracking is disabled, the scheduler attempts to load pressures at SReg_32 and VGPR_32. This causes an index out of bounds error. This patch fixes this issue by disabling the initialization of RPTracker when not needed. NFC Reviewed By: rampitec, kerbowa, arsenm Differential Revision: https://reviews.llvm.org/D129322	2022-07-28 15:39:28 -06:00
Austin Kerbow	0f93a45b11	[AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER	2022-07-28 11:04:33 -07:00
Austin Kerbow	f5b21680d1	[AMDGPU] Add amdgcn_sched_group_barrier builtin This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior. The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them. As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0) Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D128158	2022-07-28 10:43:14 -07:00
Alexander Timofeev	76d9ae924c	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the `2e29b0138c` we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-07-28 14:30:29 +02:00
Dmitry Preobrazhensky	2b230d69ad	[AMDGPU][MC][GFX90A] Correct MIMG dst size validation Correct validator to enable MIMG dst size checks. Differential Revision: https://reviews.llvm.org/D130512	2022-07-28 14:30:08 +03:00
Dmitry Preobrazhensky	fa7fd8ec31	[AMDGPU][MC][GFX11] Disable SGPRs for src1 of v_fma_mix*_dpp opcodes Differential Revision: https://reviews.llvm.org/D130634	2022-07-28 14:20:05 +03:00
Austin Kerbow	ba0d079c7a	[AMDGPU] Aggressively schedule to reduce RP in occupancy limited regions By not clustering loads and adjusting heuristics to more aggressively reduce register pressure we may be able to increase occupancy for the function if it was dropped in a first pass scheduling. Similarly, try to reduce spilling if register usage exceeds lower bound occupancy. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130329	2022-07-27 22:34:37 -07:00
Carl Ritson	dbda30e294	[AMDGPU][SIFoldOperands] Clear kills when folding COPY Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130622	2022-07-28 11:57:55 +09:00
Stanislav Mekhanoshin	68901fdbeb	[AMDGPU] Consider S_SETPRIO a scheduling boundary The instruction is used to modify wave priority with the intent to affect VALU execution and currently we can reschedule VALU around it since that VALU does not have side effects. Differential Revision: https://reviews.llvm.org/D130654	2022-07-27 11:50:23 -07:00
Eli Friedman	1a6d82b93f	Fix misc uses of "long" variables to use "int64_t". I don't have any evidence these particular uses are actually causing any issues, but we should avoid accidentally truncating immediate values depending on the host.	2022-07-27 09:47:19 -07:00
Dmitri Gribenko	b435da027d	[amdgpu][nfc] Fix build with a certan Clang version It errors out in the Bazel CI: AMDGPULowerModuleLDSPass.cpp:384:12: error: chosen constructor is explicit in copy-initialization return {SGV, std::move(Map)}; Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D130623	2022-07-27 17:29:36 +02:00
Jon Chesterfield	3ccd88f209	[amdgpu][nfc] Separate processUsedLDS into independent pieces, rename it	2022-07-27 01:55:43 +01:00
Jon Chesterfield	9981afdd42	[amdgpu][nfc] Extract kernel annotation from processUsedLDS	2022-07-27 01:38:41 +01:00
Jon Chesterfield	923b90bddb	[amdgpu][nfc] Separate LDS struct creation from RAUW	2022-07-26 20:59:17 +01:00
Jon Chesterfield	26dcc7e64a	[amdgpu][nfc] Skip operations on padding fields in LDS struct	2022-07-26 18:31:02 +01:00
Austin Kerbow	7ca9e471fe	[AMDGPU] Start refactoring GCNSchedStrategy Tries to make the different scheduling stages a bit more self contained and modifiable. Intended to be NFC. Preface to other changes. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130147	2022-07-26 08:55:19 -07:00
Dmitry Preobrazhensky	e43621b09c	[AMDGPU][MC][GFX11] Correct src0 for VOP3_DPP variants of v_cmpclass opcodes Disable SGPRs for src0 of these opcodes. Differential Revision: https://reviews.llvm.org/D130486	2022-07-26 17:52:34 +03:00
Dmitry Preobrazhensky	0eb9f18520	[AMDGPU][MC][GFX11] Correct encoding of VOP3/VOP3_DPP v_cmpx* opcodes Encode dst=EXEC but allow disassembler accept any dst value. Differential Revision: https://reviews.llvm.org/D130345	2022-07-26 17:36:22 +03:00
Matt Arsenault	cb0c71e8b1	AMDGPU: Adjust register allocation priority values down Set the priorities consistently to number of registers in the tuple - 1. Previously we started at 1, and also tried to give SGPR higher values than VGPRs. There's no point in assigning SGPRs higher values now that those are allocated in a separate regalloc run. This avoids overflowing the 5 bits used for the class priority in the allocation heuristic for 32 element tuples. This avoids some cases where smaller registers unexpectedly get prioritized over larger.	2022-07-25 15:47:15 -04:00
David Stuttard	b14d7bf750	AMDGPU: Turn off force init 16 input SGPRS for pal Pal uses a different mechanism for user sgprs. Differential Revision: https://reviews.llvm.org/D129566	2022-07-25 10:52:46 +01:00
Kazu Hirata	b5188591a0	[llvm] Remove redundaunt virtual specifiers (NFC) Identified with modernize-use-override.	2022-07-24 21:50:35 -07:00
Kazu Hirata	ae998555ba	[AMDGPU] Remove a redundant variable (NFC) ArrayRef has operator[], so we don't need to access the contents via data().	2022-07-23 12:29:05 -07:00
Fangrui Song	c17450a094	[AMDGPU] Change DEBUG_TYPE from isel to amdgpu-isel to match all other *ISelDAGToDAG.cpp	2022-07-23 11:32:02 -07:00
Petar Avramovic	8de1f04c77	[AMDGPU] gfx11 Fix VOP3 dot instructions Fix src modifiers for operands with bf16 type. op_sel[0:1] are ignored. Differential Revision: https://reviews.llvm.org/D129084	2022-07-22 11:43:35 +02:00
Ivan Kosarev	4b9dbbdb09	[AMDGPU][MC][NFC] Refine SMEM load definitions. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D130009	2022-07-21 14:56:56 +01:00
Ivan Kosarev	75950be836	[AMDGPU][NFC] Validate G_MERGE_VALUES as we match zero-extended 32-bit scalars. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D130001	2022-07-21 14:49:57 +01:00
Matt Arsenault	5a5439cb73	AMDGPU: Refine user-sgpr-init16-bug It only applies to gfx1100 and gfx1102, and for wave32.	2022-07-21 08:57:00 -04:00
Thomas Symalla	fd64a857ee	[AMDGPU] Combine s_or_saveexec, s_xor instructions. This patch merges a consecutive sequence of s_or_saveexec s_o, s_i s_xor exec, exec, s_o into a single s_andn2_saveexec s_o, s_i instruction. This patch also cleans up the SIOptimizeExecMasking pass a bit. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D129073	2022-07-21 14:16:37 +02:00
Jay Foad	9383b09858	[AMDGPU][GlobalISel] Fix subtarget checks for combining to v_med3_i16 Differential Revision: https://reviews.llvm.org/D130243	2022-07-21 11:41:31 +01:00
Arthur Eubanks	bc9b964f8f	[NFC] Suppress unused variable warning in non-assert builds	2022-07-20 12:26:16 -07:00
Joe Nash	dc850fbf3b	[AMDGPU] NFC. Assert that mask is full with VOPC DPP VOPC DPP should not be formed when the row_mask and bank_mask are not 0xf (full) because the resulting VOP DPP would have different semantics than the MOV DPP followed by VOP. Existing checks in GCNDPPCombine cover this case but for different reasons, so assert the property for future-proofing. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D130101	2022-07-20 13:23:03 -04:00
Kazu Hirata	0387da6f4f	Use value instead of getValue (NFC)	2022-07-19 21:18:26 -07:00
Kazu Hirata	41ae78ea3a	Use has_value instead of hasValue (NFC)	2022-07-19 20:15:44 -07:00
Johannes Doerfert	bf789b1957	[Attributor] Replace AAValueSimplify with AAPotentialValues For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication. This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now. `AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still. We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress. Fixes: https://github.com/llvm/llvm-project/issues/54981 Note: A previous version was flawed and consequently reverted in `6555558a80`.	2022-07-19 16:24:42 -05:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Jon Chesterfield	2224bbcd74	[nfc][amdgpu] LDS. Move selection logic up the stack.	2022-07-19 17:20:19 +01:00
Joe Nash	b28bb8cc9c	[AMDGPU] Remove old operand from VOPC DPP For most DPP instructions, the old operand stores the value that was in the current lane before the DPP operation, and is tied to the destination. For VOPC DPP, this is unnecessary and incorrect. There appears to have been a latent bug related to D122737 with SIInstrInfo::isOperandLegal. If you checked if a register operand was legal when the InstructionDesc expected an immediate, it reported that is valid. Its fix is necessary for and tested in this patch. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D130040	2022-07-19 09:35:05 -04:00
Abinav Puthan Purayil	9fa425c1ab	[AMDGPU] Set amdgpu-memory-bound if a basic block has dense global memory access AMDGPUPerfHintAnalysis doesn't set the memory bound attribute if FuncInfo::InstCost outweighs MemInstCost even if we have a basic block with relatively high global memory access. GCNSchedStrategy could revert optimal scheduling in favour of occupancy which seems to degrade performance for some kernels. This change introduces the HasDenseGlobalMemAcc metric in the heuristic that makes the analysis more conservative in these cases. This fixes SWDEV-334259/SWDEV-343932 Differential Revision: https://reviews.llvm.org/D129759	2022-07-19 15:16:28 +05:30
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Stanislav Mekhanoshin	523a99c0eb	[AMDGPU] Support for gfx940 fp8 smfmac Differential Revision: https://reviews.llvm.org/D129908	2022-07-18 12:12:41 -07:00
Stanislav Mekhanoshin	2695f0a688	[AMDGPU] Support for gfx940 fp8 mfma Differential Revision: https://reviews.llvm.org/D129906	2022-07-18 11:49:56 -07:00
Stanislav Mekhanoshin	9fa5a6b7e8	[AMDGPU] Support for gfx940 fp8 conversions Differential Revision: https://reviews.llvm.org/D129902	2022-07-18 11:48:43 -07:00
Petar Avramovic	c287bc4841	[AMDGPU][MC][GFX11] AsmParser for op_sel for VOP3 dpp opcodes Parse op_sel for *_e64_dpp VOP3 opcodes. Depends on D129637 and setting of VOP3_OPSEL in dpp pseudos. Differential Revision: https://reviews.llvm.org/D129767	2022-07-18 15:08:52 +02:00
Ivan Kosarev	432cbd7827	[AMDGPU][CodeGen] Support (register + immediate) SMRD offsets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129381	2022-07-18 11:29:31 +01:00
Ivan Kosarev	9c66c02e2e	[AMDGPU][CodeGen] Match SMRDs with constant bases and register offsets. Saves some add instructions on a couple Rage 2 shaders and is also a prerequisite for a coming-soon change matching (register + immediate) offsets. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D129095	2022-07-18 11:18:23 +01:00
Abinav Puthan Purayil	d96361d714	[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map This change introduces the dynamic stack boolean field to code-object-v3 and above under the code properties of the kernel descriptor and under the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to the is_dynamic_callstack field of amd_kernel_code_t. Differential Revision: https://reviews.llvm.org/D128344	2022-07-18 10:07:13 +05:30
Kazu Hirata	7094ab4ee7	[llvm] Modernize bool literals (NFC) Identified with modernize-use-bool-literals.	2022-07-17 18:08:51 -07:00
Carl Ritson	547e3cba7d	[AMDGPU] Improve liveness copying in si-optimize-exec-masking-pre-ra Further improve liveness copying for CC register post optimization by mirroring live internal splits. The fixes a bug in register allocation when CC register liveness is extended across a branches instead of split. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129557	2022-07-17 17:34:05 +09:00
Kazu Hirata	deac0ac523	[AMDGPU] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-07-16 12:44:35 -07:00
Kazu Hirata	6cbfffb3a3	[AMDGPU] Declare TableRef in terms of ArrayRef (NFC)	2022-07-16 10:56:20 -07:00
Jon Chesterfield	eda2bcad02	[nfc][amdgpu] Remove dead variable and function	2022-07-15 23:56:43 +01:00
Vang Thao	67357739c6	[AMDGPU] Add remarks to output some resource usage Add analyis remarks to output kernel name, register usage, occupancy, scratch usage, spills, and LDS information. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123878	2022-07-15 11:01:53 -07:00
Dmitry Preobrazhensky	185c36de73	[AMDGPU][MC][NFC] Remove unnecessary code Differential Revision: https://reviews.llvm.org/D129766	2022-07-15 13:17:36 +03:00
Dmitry Preobrazhensky	2a6532d542	[AMDGPU][MC][GFX11] Correct disassembly of *_e64_dpp opcodes which support op_sel These opcodes cannot be disassembled because op_sel operand is missing - it must be added manually. See https://github.com/llvm/llvm-project/issues/56512 for detailed issue analysis. Differential Revision: https://reviews.llvm.org/D129637	2022-07-15 13:11:59 +03:00
jeff	8a12f20ef7	[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation	2022-07-14 16:24:13 -07:00
Alexander Timofeev	2e29b0138c	[AMDGPU] Lowering VGPR to SGPR copies to v_readfirstlane_b32 if profitable. Since the divergence-driven instruction selection has been enabled for AMDGPU, all the uniform instructions are expected to be selected to SALU form, except those not having one. VGPR to SGPR copies appear in MIR to connect values producers and consumers. This change implements an algorithm that evolves a reasonable tradeoff between the profit achieved from keeping the uniform instructions in SALU form and overhead introduced by the data transfer between the VGPRs and SGPRs. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128252	2022-07-14 23:59:02 +02:00
Jay Foad	e45aa230ad	[AMDGPU] Update LiveVariables after killing an immediate def D114999 added code to kill an immediate def if it was folded into its only use by convertToThreeAddress. This patch updates LiveVariables when that happens in order to fix verification failures exposed by D129213. Differential Revision: https://reviews.llvm.org/D129661	2022-07-14 10:49:41 +01:00
David Green	3e0bf1c7a9	[CodeGen] Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Recommitted with some fixes for the leftover MCII variables in release builds. Differential Revision: https://reviews.llvm.org/D129506	2022-07-14 09:33:28 +01:00
Jannik Silvanus	e5c4cde451	[AMDGPU] SIMachineScheduler: Add support for several MachineScheduler features The SI machine scheduler inherits from ScheduleDAGMI. This patch adds support for a few features that are implemented in ScheduleDAGMI (or its base classes) that were missing so far because their support is implemented in overridden functions. * Support cl::opt -view-misched-dags This option allows to open a graphical window of the scheduling DAG. * Support cl::opt -misched-print-dags This option allows to print the scheduling DAG in text form. * After constructing the scheduling DAG, call postprocessDAG() to apply any registered DAG mutations. Note that currently there are no mutations defined in AMDGPUTargetMachine.cpp in case SIScheduler is used. Still add this to avoid surprises in the future in case mutations are added. Differential Revision: https://reviews.llvm.org/D128808	2022-07-14 09:45:31 +02:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
David Green	95252133e1	Revert "Move instruction predicate verification to emitInstruction" This reverts commit `e2fb8c0f4b` as it does not build for Release builds, and some buildbots are giving more warning than I saw locally. Reverting to fix those issues.	2022-07-13 13:28:11 +01:00
David Green	e2fb8c0f4b	Move instruction predicate verification to emitInstruction D25618 added a method to verify the instruction predicates for an emitted instruction, through verifyInstructionPredicates added into <Target>MCCodeEmitter::encodeInstruction. This is a very useful idea, but the implementation inside MCCodeEmitter made it only fire for object files, not assembly which most of the llvm test suite uses. This patch moves the code into the <Target>_MC::verifyInstructionPredicates method, inside the InstrInfo. The allows it to be called from other places, such as in this patch where it is called from the <Target>AsmPrinter::emitInstruction methods which should trigger for both assembly and object files. It can also be called from other places such as verifyInstruction, but that is not done here (it tends to catch errors earlier, but in reality just shows all the mir tests that have incorrect feature predicates). The interface was also simplified slightly, moving computeAvailableFeatures into the function so that it does not need to be called externally. The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently show errors in the test-suite, so have been disabled with FIXME comments. Differential Revision: https://reviews.llvm.org/D129506	2022-07-13 12:53:32 +01:00
Jay Foad	5d41fe0768	[AMDGPU] SILowerControlFlow uses LiveIntervals The availability of LiveIntervals affects kill flags in the output, so declare the use to avoid strange effects where the output of this pass is different depending on what other passes are scheduled after it. Differential Revision: https://reviews.llvm.org/D129555	2022-07-12 16:53:53 +01:00
Piotr Sobczak	2bd8e74b94	[AMDGPU] Fix bitcast v4i64/v16i16 Fix a regression introduced in D128865. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129375	2022-07-11 22:27:52 +02:00
NAKAMURA Takumi	393e12bddd	R600ISelLowering.h: Silence a warning. [-Warray-parameter] FIXME: Could it be rewritten with llvm::ArrayRef ?	2022-07-10 18:29:55 +09:00
David Blaikie	9008d0a38e	Fix -Warray-parameter warning Remove the bound in the definition, since it's not guaranteed/could provide a false sense of security (I'd be inclined to go further and change this to a pointer parameter, since that's what it really is - but figured I'd preserve some of the author's intent here)	2022-07-09 17:04:01 +00:00
serge-sans-paille	e1272ab6ec	[AMDGPU][NFC] Harmonize decl&def of R600TargetLowering::OptimizeSwizzle The freshly baked -Warray-parameter warning discovered an inconsistency in argument declaration, use the stricter one. This fixes build issues like https://lab.llvm.org/buildbot#builders/18/builds/5305	2022-07-09 09:07:31 +02:00
Abinav Puthan Purayil	17a81ecf85	[AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection This change replaces the C++ predicates with the HasNoUse builtin predicate that would enable the no-ret atomic op selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D125213	2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil	7504c7a877	[AMDGPU] Use AddedComplexity for ret and noret atomic ops selection This patch removes the predicate for return atomic ops and uses AddedComplexity to distinguish its selection from its no return variant. This will produce better matchers that doesn't unnecessarily check for the negated predicate if the initial predicate failed. Also, it simplifies the enabling of no return atomic ops selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D128241	2022-07-08 09:47:33 +05:30
Austin Kerbow	6817031d0b	[AMDGPU] Disable FillMFMAShadowMutation by default Disable amdgpu mfma power sched. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D129172	2022-07-07 09:34:45 -07:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Thomas Symalla	86bd7e2065	[NFC][AMDGPU] Cleanup the SIOptimizeExecMasking pass. This patch removes a bit of code duplication and moves the v_cmpx optimization out of the runOnMachineFunction pass. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D129086	2022-07-06 11:03:03 +02:00
Carl Ritson	8bc5e7ac51	[AMDGPU] Additional liveness tests for si-optimize-exec-masking-pre-ra Merge tests and fixes from D128110 and D128315 on top of already committed D128800. Original author: arsenm Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128882	2022-07-06 15:05:32 +09:00
Jay Foad	4dbc2876cf	[AMDGPU] GFX11 trivial NFC tweaks A few miscellaneous comment, whitespace and indentation tweaks.	2022-07-05 17:20:17 +01:00
Jay Foad	12fd00ee17	[AMDGPU] Add patterns for GFX11 v_minmax and v_maxmin instructions Differential Revision: https://reviews.llvm.org/D128445	2022-07-05 16:07:47 +01:00
Joe Nash	0483c91eee	[AMDGPU] gfx11 CodeGen for new DPP instructions Modifies the GCNDPPCombine pass to enable DPP formation for the new DPP instruction in gfx11, namely VOP3 encoded instructions with DPP and VOPC with DPP. Depends on D128656 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128682	2022-07-05 10:17:59 -04:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Ivan Kosarev	4696a33dfa	[AMDGPU][NFC] Refine matching SMRD offsets. Tell the matcher what we are looking for instead of matching everything and then discarding the result if doesn't fit. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128171	2022-07-05 14:07:22 +01:00
Ivan Kosarev	8cd79bc12c	[AMDGPU][GlobalISel] Support register offsets for SMRDs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D128836	2022-07-05 13:41:06 +01:00
Thomas Symalla	04c5fed5e0	[NFC] Fix wrong comment.	2022-07-05 13:37:44 +02:00

1 2 3 4 5 ...

7293 Commits