llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	6d9cd9199a	Use llvm::all_of (NFC)	2022-08-14 16:25:36 -07:00
Kazu Hirata	109df7f9a4	[llvm] Qualify auto in range-based for loops (NFC) Identified with readability-qualified-auto.	2022-08-13 12:55:42 -07:00
David Stuttard	1d1cc05539	AMDGPU: mbcnt allow for non-zero src1 for known-bits Src1 for mbcnt can be a non-zero literal or register. Take this into account when calculating known bits. Differential Revision: https://reviews.llvm.org/D131478	2022-08-11 13:23:43 +01:00
Evgenii Stepanov	8ea1cf3111	Revert "[AMDGPU] SIFixSGPRCopies refactoring" Breaks ASan tests. This reverts commit `3f8ae7efa8`.	2022-08-10 11:32:46 -07:00
Venkata Ramanaiah Nalamothu	486594119d	[AMDGPU] Fix prologue/epilogue markers in .debug_line table for trivial functions All the prologue instructions should have unknown source location co-ordinates while the epilogue instructions should have source location of last non-debug instruction after which epilogue instructions are insrted. This ensures the prologue/epilogue markers are generated correctly in the line table. Changes are brought in from the downstream CFI patches. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D131485	2022-08-10 23:00:19 +05:30
alex-t	3f8ae7efa8	[AMDGPU] SIFixSGPRCopies refactoring This change finalizes the series of patches aiming to replace old strategy of VGPR to SGPR copies loweriong. Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed. Pass main loop is no longer used for the MIR changes but collect information for further analysis. Actual MIR lowering happens further according the analysys result in the set of separate functions. Another important change concerns the order of lowering: VGPR to SGPR copies lowering is done first to have priority on the rest of the MIR changes. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131246	2022-08-10 00:51:57 +02:00
Yaxun (Sam) Liu	e780648a15	[AMDGPU] Unify unreachable intrinsics si-annotate-control-flow does depth first traversal of BB's of a function to insert amdgcn if intrinsics for conditional branches so that isel can generate correct instructions later. si-annotate-control-flow checks whether the successor BB for the 'else' branch of a conditional branch has been visited. If it has been visited, si-annotate-control-flow assumes the conditional branch has been handled and will not try to insert if intrinsic for it. This assumption is not correct when the IR contains multiple unreachable BB's. Then 'if' intrinscs are not inserted and incorrect ISA are generated. This patch fixes the issue by let amdgpu-unify-divergent-exit-nodes unify unreachables even if they are uniformly reached. In this way the IR will not contain multiple exits, and structurizer is able to structurize the IR containing one unified exit. Reviewed by: Ruiling Song, Matt Arsenault Differential Revision: https://reviews.llvm.org/D131181 Fixes: SWDEV-343244	2022-08-09 10:23:32 -04:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	e20d210eef	[llvm] Qualify auto (NFC) Identified with readability-qualified-auto.	2022-08-07 23:55:27 -07:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	d0ec61c9ff	[Target] Remove unused forward declarations (NFC)	2022-08-07 00:16:16 -07:00
Leon Clark	6a275cd53c	Transform illegal intrinsics to V_ILLEGAL Related tasks: - SWDEV-240194 - SWDEV-309417 - SWDEV-334876 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123693	2022-08-06 08:59:00 +01:00
Mirko Brkusanin	19bb535ed9	[AMDGPU] Remove unused MIMG tablegen variants There are no AMDGPUSampleVariant versions for _G16, it is treated more like a modifier for derivatives (_D) (also for intrinsics where it is overloaded type instead of part of instrinsic name) so we ended up making more variants for these instruction then we actually needed. 32-bit derivatives need 6 dwords at most, while 16-bit need 4 at most. Using same AMDGPUSampleVariant for both, we ended up creating 2 extra variants per instruction than were necessary. In total this deletes 260 unused tablegen records. Differential Revision: https://reviews.llvm.org/D131252	2022-08-05 15:30:47 +02:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Dmitry Preobrazhensky	05b3aadfff	[AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16 Enable SGPRs for the following operands of these opcodes: - src operands of VOP3 variant. - src2 operand of DPP variants. Differential Revision: https://reviews.llvm.org/D130989	2022-08-03 15:08:23 +03:00
Dmitry Preobrazhensky	ae553f9e49	[AMDGPU][MC][GFX10] Correct encoding of VOP3 v_cmpx* opcodes Encode dst=EXEC but allow disassembler accept any dst value. Differential Revision: https://reviews.llvm.org/D130978	2022-08-03 15:03:44 +03:00
Austin Kerbow	3dfa562643	[AMDGPU] Add CL option for max-ilp scheduler. When compiling for multiple targets the scheduler that is selected via the -misched option is applied globally. This patch adds a target CL option instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131022	2022-08-02 16:52:14 -07:00
Austin Kerbow	40eec27618	[AMDGPU] Add llvm_unreachable to switch statement added in `d7100b398`.	2022-08-02 13:45:38 -07:00
Austin Kerbow	d7100b398b	[AMDGPU] Add GCNMaxILPSchedStrategy Creates a new scheduling strategy that attempts to maximize ILP for a single wave. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130869	2022-08-02 13:21:24 -07:00
Alexander Timofeev	a321d95b59	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the `2e29b0138c` we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-08-02 18:37:57 +02:00
Jay Foad	e301e071ba	[AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline This pass seems to have very little effect because all it does is hoist some instructions, but it is followed later in the codegen pipeline by the IR CodeSinking pass which does the opposite. Differential Revision: https://reviews.llvm.org/D130258	2022-08-02 17:35:20 +01:00
Jay Foad	c24d68fff1	[AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress This improves a corner case where v_fmac can be converted to v_fma on GFX10+ even if it has a literal operand. Differential Revision: https://reviews.llvm.org/D130992	2022-08-02 17:27:11 +01:00
Vang Thao	7fc52d7c8b	[AMDGPU] Fix DGEMM hazard for GFX90a For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert wait-states to avoid data hazard. However when there is a DGEMM instruction in-between them, SQ incorrectly disables the wait-states thus the data hazard needs to be handled with this workaround. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130677	2022-08-01 11:56:22 -07:00
Piotr Sobczak	f29a19b0b8	[AMDGPU] Extend cases for ReadM0MovRelInterpHazard Extend hazard recognizer of ReadM0MovRelInterpHazard with DS_READ_ADDTID and DS_WRITE_ADDTID, as they also require a manually inserted S_NOP after SALU writing m0. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130783	2022-08-01 17:59:33 +02:00
Dmitry Preobrazhensky	3aae8cd842	[AMDGPU][MC] Verify selection of LDS MUBUF opcodes Differential Revision: https://reviews.llvm.org/D130761	2022-08-01 16:44:39 +03:00
Dmitry Preobrazhensky	bb901dcc5a	[AMDGPU][MC][GFX940] Correct disassembly of MFMA opcodes Add a decoder table for GFX940 MFMA opcodes. Differential Revision: https://reviews.llvm.org/D130759	2022-08-01 16:00:47 +03:00
Pierre van Houtryve	a847e3dc52	[NFC][AMDGPU] Fix typo in SIRegisterInfo.cpp	2022-08-01 07:01:33 -04:00
Petar Avramovic	e8d260753e	[AMDGPU] gfx11 allow dlc for MUBUF atomics Add MC support for dlc in gfx11 MUBUF atomic instructions. Differential Revision: https://reviews.llvm.org/D129075	2022-08-01 12:18:01 +02:00
Austin Kerbow	7898426a72	[AMDGPU] Remove unused function	2022-07-30 07:47:35 -07:00
Simon Pilgrim	49c0980eac	Fix Wdocumentation warning. NFC. warning: '\returns' command used in a comment that is attached to a function returning void	2022-07-30 15:41:13 +01:00
Simon Pilgrim	276480b1d3	[AMDGPU] Fix \|\| vs && precedence warning. NFC.	2022-07-30 14:02:54 +01:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Austin Kerbow	2c82a126d7	[AMDGPU] Omit unnecessary waitcnt before barriers It is not necessary to wait for all outstanding memory operations before barriers on hardware that can back off of the barrier in the event of an exception when traps are enabled. Add a new subtarget feature which tracks which HW has this ability. Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D130722	2022-07-29 11:12:36 -07:00
Mirko Brkusanin	6a1aa627fa	[AMDGPU] Enable image_gather4h instruction for gfx10 and gfx11 Differential Revision: https://reviews.llvm.org/D130764	2022-07-29 15:42:06 +02:00
Jay Foad	3cfa9b1431	[AMDGPU] user-sgpr-init16-bug does not apply to gfx1103 Differential Revision: https://reviews.llvm.org/D130347	2022-07-29 14:21:13 +01:00
Matt Arsenault	ef906f287e	AMDGPU: Fix assertion when printing unreachable functions Since `814a0abcce`, this would break if we had a function in the module that becomes dead in any codegen IR pass. The function wasn't deleted since it was initially used in dead code, but is detached from the call graph and doesn't appear in the PO traversal. Do a second walk over the module to populate the resources of any functions which weren't already processed.	2022-07-29 08:57:43 -04:00
Alexander Timofeev	d7ae1a9097	Revert "[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs" This reverts commit `76d9ae924c`. because it causes several VK CTS tests to fail	2022-07-29 14:19:07 +02:00
Changpeng Fang	2b731b30a7	AMDGPU: Take care of "tied" operand when removeOperand Summary: Flat scratch load of D16 type by default has tied vdst_in operand (with vdst). This should be taken care of at the time of "removeOperand" in eliminateFrameIndex. Otherwise we will hit an assert saying "Cannot move tied operands". This patch unties vdst_in before the move, and retie it with vdst afterwards. Reviewers: arsenm, foad Differential Revision: https://reviews.llvm.org/D130537	2022-07-28 17:30:49 -07:00
Anshil Gandhi	5c38056431	[AMDGPU][Scheduler] Avoid initializing Register pressure tracker when tracking is disabled When register pressure tracking is disabled, the scheduler attempts to load pressures at SReg_32 and VGPR_32. This causes an index out of bounds error. This patch fixes this issue by disabling the initialization of RPTracker when not needed. NFC Reviewed By: rampitec, kerbowa, arsenm Differential Revision: https://reviews.llvm.org/D129322	2022-07-28 15:39:28 -06:00
Austin Kerbow	0f93a45b11	[AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER	2022-07-28 11:04:33 -07:00
Austin Kerbow	f5b21680d1	[AMDGPU] Add amdgcn_sched_group_barrier builtin This builtin allows the creation of custom scheduling pipelines on a per-region basis. Like the sched_barrier builtin this is intended to be used either for testing, in situations where the default scheduler heuristics cannot be improved, or in critical kernels where users are trying to get performance that is close to handwritten assembly. Obviously using these builtins will require extra work from the kernel writer to maintain the desired behavior. The builtin can be used to create groups of instructions called "scheduling groups" where ordering between the groups is enforced by the scheduler. __builtin_amdgcn_sched_group_barrier takes three parameters. The first parameter is a mask that determines the types of instructions that you would like to synchronize around and add to a scheduling group. These instructions will be selected from the bottom up starting from the sched_group_barrier's location during instruction scheduling. The second parameter is the number of matching instructions that will be associated with this sched_group_barrier. The third parameter is an identifier which is used to describe what other sched_group_barriers should be synchronized with. Note that multiple sched_group_barriers must be added in order for them to be useful since they only synchronize with other sched_group_barriers. Only "scheduling groups" with a matching third parameter will have any enforced ordering between them. As an example, the code below tries to create a pipeline of 1 VMEM_READ instruction followed by 1 VALU instruction followed by 5 MFMA instructions... // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 1 VALU __builtin_amdgcn_sched_group_barrier(2, 1, 0) // 5 MFMA __builtin_amdgcn_sched_group_barrier(8, 5, 0) // 1 VMEM_READ __builtin_amdgcn_sched_group_barrier(32, 1, 0) // 3 VALU __builtin_amdgcn_sched_group_barrier(2, 3, 0) // 2 VMEM_WRITE __builtin_amdgcn_sched_group_barrier(64, 2, 0) Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D128158	2022-07-28 10:43:14 -07:00
Alexander Timofeev	76d9ae924c	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the `2e29b0138c` we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-07-28 14:30:29 +02:00
Dmitry Preobrazhensky	2b230d69ad	[AMDGPU][MC][GFX90A] Correct MIMG dst size validation Correct validator to enable MIMG dst size checks. Differential Revision: https://reviews.llvm.org/D130512	2022-07-28 14:30:08 +03:00
Dmitry Preobrazhensky	fa7fd8ec31	[AMDGPU][MC][GFX11] Disable SGPRs for src1 of v_fma_mix*_dpp opcodes Differential Revision: https://reviews.llvm.org/D130634	2022-07-28 14:20:05 +03:00
Austin Kerbow	ba0d079c7a	[AMDGPU] Aggressively schedule to reduce RP in occupancy limited regions By not clustering loads and adjusting heuristics to more aggressively reduce register pressure we may be able to increase occupancy for the function if it was dropped in a first pass scheduling. Similarly, try to reduce spilling if register usage exceeds lower bound occupancy. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130329	2022-07-27 22:34:37 -07:00
Carl Ritson	dbda30e294	[AMDGPU][SIFoldOperands] Clear kills when folding COPY Clear all kill flags on source register when folding a COPY. This is necessary because the kills may now be out of order with the uses. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130622	2022-07-28 11:57:55 +09:00
Stanislav Mekhanoshin	68901fdbeb	[AMDGPU] Consider S_SETPRIO a scheduling boundary The instruction is used to modify wave priority with the intent to affect VALU execution and currently we can reschedule VALU around it since that VALU does not have side effects. Differential Revision: https://reviews.llvm.org/D130654	2022-07-27 11:50:23 -07:00
Eli Friedman	1a6d82b93f	Fix misc uses of "long" variables to use "int64_t". I don't have any evidence these particular uses are actually causing any issues, but we should avoid accidentally truncating immediate values depending on the host.	2022-07-27 09:47:19 -07:00
Dmitri Gribenko	b435da027d	[amdgpu][nfc] Fix build with a certan Clang version It errors out in the Bazel CI: AMDGPULowerModuleLDSPass.cpp:384:12: error: chosen constructor is explicit in copy-initialization return {SGV, std::move(Map)}; Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D130623	2022-07-27 17:29:36 +02:00

1 2 3 4 5 ...

7162 Commits