llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	961e4384f4	[AMDGPU] Support SCC on buffer atomics Differential Revision: https://reviews.llvm.org/D98731	2021-03-18 09:56:14 -07:00
Stanislav Mekhanoshin	3f37c28230	[AMDGPU] Remove unused template parameters of MUBUF_Real_AllAddr_vi Differential Revision: https://reviews.llvm.org/D98804	2021-03-18 09:02:38 -07:00
Jon Chesterfield	253f804deb	[amdgpu] Update med3 combine to skip i64 [amdgpu] Update med3 combine to skip i64 Fixes an assumption that a type which is not i32 will be i16. This asserts when trying to sign/zero extend an i64 to i32. Test case was cut down from an openmp application. Variations on it are hit by other combines before reaching the problematic one, e.g. replacing the immediate values with other function arguments changes the codegen path and misses this combine. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98872	2021-03-18 15:56:41 +00:00
Matt Arsenault	b9a0384983	GlobalISel: Preserve source value information for outgoing byval args Pass through the original argument IR value in order to preserve the aliasing information in the memcpy memory operands.	2021-03-18 09:16:54 -04:00
Carl Ritson	1a4bc3aba3	[AMDGPU] Avoid unnecessary graph visits during WQM marking Avoid revisiting nodes with the same set of defined lanes by using a unified visited set which integrates lanes into the key. This retains the intent of the original code by still revisiting a subgraph if a different set of lanes is defined and hence marking might progress differently. Note: default size of the visited set has been confirmed to cover >99% of invocations in large array of test shaders. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D98772	2021-03-18 10:00:41 +09:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
Jay Foad	967b64beb4	[AMDGPU] Split dot2-insts feature Split out some of the instructions predicated on the dot2-insts target feature into a new dot7-insts, in preparation for subtargets that have some but not all of these instructions. NFCI. Differential Revision: https://reviews.llvm.org/D98717	2021-03-17 09:42:21 +00:00
RamNalamothu	43f2d269b3	[AMDGPU, NFC] Refactor FP/BP spill index code in emitPrologue/emitEpilogue Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D98617	2021-03-16 19:19:45 +05:30
Dmitry Preobrazhensky	596db9934b	[AMDGPU][MC] Disabled lds_direct for GFX90a Fixed bug 49382. Differential Revision: https://reviews.llvm.org/D98626	2021-03-16 13:52:36 +03:00
Stanislav Mekhanoshin	bc27a31801	[AMDGPU] Fix copyPhysReg to not produce unalined vgpr access RA can insert something like a sub1_sub2 COPY of a wide VGPR tuple which results in the unaligned acces with v_pk_mov_b32 after the copy is expanded. This is regression after D97316. Differential Revision: https://reviews.llvm.org/D98549	2021-03-15 14:14:30 -07:00
Stanislav Mekhanoshin	c297709ee1	[AMDGPU] Fixed msan failure with uninitialized value	2021-03-15 13:58:19 -07:00
Stanislav Mekhanoshin	3bffb1cd0e	[AMDGPU] Use single cache policy operand Replace individual operands GLC, SLC, and DLC with a single cache_policy bitmask operand. This will reduce the number of operands in MIR and I hope the amount of code. These operands are mostly 0 anyway. Additional advantage that parser will accept these flags in any order unlike now. Differential Revision: https://reviews.llvm.org/D96469	2021-03-15 13:00:59 -07:00
Jon Chesterfield	13e49dcee4	[amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648	2021-03-15 15:24:01 +00:00
Carl Ritson	13877db2fa	[AMDGPU] Fix shortfalls in WQM marking When tracking defined lanes through phi nodes in the live range graph each branch of the phi must be handled independently. Also rewrite the marking algorithm to reduce unnecessary operations. Previously a shared set of defined lanes was used which caused marking to stop prematurely. This was observable in existing lit tests, but test patterns did not cover this detail. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D98614	2021-03-15 21:44:15 +09:00
Jay Foad	5d48b45ce3	[AMDGPU] Use depth first iterator instead of recursive DFS. NFCI. The reason for this is to avoid deep recursion in DFS() which can cause stack overflow on large CFGs, especially on Windows. Differential Revision: https://reviews.llvm.org/D98528	2021-03-15 10:32:55 +00:00
Stanislav Mekhanoshin	315ebe0df3	[AMDGPU] Fix getAlignedAGPRClassID Not all register classes were listed. Differential Revision: https://reviews.llvm.org/D98550	2021-03-12 14:41:50 -08:00
Matt Arsenault	6b76d82853	GlobalISel: Fix marking byval arguments as immutable byval arguments need to be assumed writable. Only implicitly stack passed arguments which aren't addressable in the IR can be assumed immutable. Mips is still broken since for some reason its doing its own thing with the ValueHandlers (and x86 doesn't actually handle byval arguments now, although some of the code is there).	2021-03-12 09:01:53 -05:00
Matt Arsenault	3231d2b581	AMDGPU/GlobalISel: Cleanup call lowering sequence Now that handleAssignments is handling all of the argument splitting, we don't have to move the insert point around.	2021-03-12 09:01:52 -05:00
Carl Ritson	f08dadd242	[AMDGPU] Do not annotate an else branch if there is a kill As llvm.amdgcn.kill is lowered to a terminator it can cause else branch annotations to end up in the wrong block. Do not annotate conditionals as else branches where there is a kill to avoid this. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D97427	2021-03-12 11:52:08 +09:00
Carl Ritson	c07f2025e4	[AMDGPU] Restrict image_msaa_load to MSAA dimension types This instruction is only valid on 2D MSAA and 2D MSAA Array surfaces. Remove intrinsic support for other dimension types, and block assembly for unsupported dimensions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D98397	2021-03-12 09:47:24 +09:00
Ruiling Song	e8e6817d00	[AMDGPU] Don't check hasStackObjects() when reserving VGPR We have amdgpu_gfx functions that have high register pressure. If we do not reserve VGPR for SGPR spill, we will fall into the path to spill the SGPR to memory, which does not only have correctness issue, but also have really bad performance. I don't know why there is the check for hasStackObjects(), in our case, we don't have stack objects at the time of finalizeLowering(). So just remove the check that we always reserve a VGPR for possible SGPR spill in non-entry functions. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D98345	2021-03-12 08:11:14 +08:00
Ruiling Song	4cee5cad28	[AMDGPU] Free reserved VGPR if no SGPR spill I met some code generation behavior change when I tried to remove the hasStackObject() check when reserving VGPR for SGPR spill. For example, the function `callee_no_stack_no_fp_elim_all` in the lit test file `callee-frame-setup.ll`. The generated code changed from: ``` s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) s_mov_b32 s4, s33 s_mov_b32 s33, s32 s_mov_b32 s33, s4 s_setpc_b64 s[30:31] ``` into something like: ``` s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) v_writelane_b32 v63, s33, 0 s_mov_b32 s33, s32 v_readlane_b32 s33, v63, 0 s_setpc_b64 s[30:31] ``` I think we still prefer the old version where only scalar instructions are needed. The idea here is free the reserved VGPR if no SGPR spills. So we will very likely to use a free SGPR for fp/sp spill. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D98344	2021-03-12 08:11:14 +08:00
Stanislav Mekhanoshin	6e8a0213a3	[AMDGPU] Remove dead MTBUF patterns These patterns are obviously dead, they are using format operand which is not selected and we have no corresponding SelectMUBUF() function. Differential Revision: https://reviews.llvm.org/D98451	2021-03-11 14:13:00 -08:00
Matt Arsenault	70cb57d7da	AMDGPU/GlobalISel: Improve private addressing mode matching This enables the look-through-copy to hack around not correctly regbankselecting constants to match the use bank.	2021-03-11 10:23:35 -05:00
Nikita Popov	46354bac76	[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC) Explicitly pass loaded type when creating loads, in preparation for the deprecation of these APIs. There are still a couple of uses left.	2021-03-11 14:40:57 +01:00
Ruiling Song	66340846b3	[AMDGPU] Always create Stack Object for reserved VGPR As we may overwrite inactive lanes of a caller-save-vgpr, we should always save/restore the reserved vgpr for sgpr spill. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D98319	2021-03-11 10:06:07 +08:00
Stanislav Mekhanoshin	9931b1f7a4	[AMDGPU] Disable SCC bit on fp atomics Differential Revision: https://reviews.llvm.org/D98221	2021-03-10 12:36:09 -08:00
Stanislav Mekhanoshin	574a9dabc6	[AMDGPU] Always expand system scope fp atomics on gfx90a FP atomics in system scope cannot be used and shall always be expanded in a CAS loop. Differential Revision: https://reviews.llvm.org/D98085	2021-03-10 12:35:23 -08:00
Jay Foad	70f013fd3b	[AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_* D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject V_MOVs with extra implicit operands, but it accidentally rejected all V_MOVs because of their implicit use of exec. Fix it but avoid adding a moderately expensive call to MI.getDesc().getNumImplicitUses(). In real graphics shaders this changes quite a few vgpr copies into move- immediates, which is good for avoiding stalls on GFX10. Differential Revision: https://reviews.llvm.org/D98347	2021-03-10 16:18:12 +00:00
Jay Foad	288ea820cf	[AMDGPU] Refactor AMDGPUTargetStreamer::EmitCodeEnd Refactor and add comments to explain where the magic numbers come from in terms of the instruction cache line size. NFC. Differential Revision: https://reviews.llvm.org/D98266	2021-03-09 19:02:18 +00:00
Christudasan Devadasan	24c0ad7143	[AMDGPU] Fix the dead frame indices during custom spill lowering. AMDGPU target tries to handle the SGPR and VGPR spills in a custom pass before the actual frame lowering pass. Once they are handled and the respective frames are eliminated in the custom pass, certain uses of them still remain. For instance, the DBG_VALUE instructions inserted by the allocator alongside the spill instruction will use the corresponding frame index. They become dead later during PEI and causes a crash while trying to replace the frame indices. We should possibly avoid this custom pass. For now, replacing such dead references with null register value. Reviewed By: arsenm, scott.linder Differential Revision: https://reviews.llvm.org/D98038	2021-03-09 23:22:49 +05:30
Ruiling Song	67a05f4e09	[AMDGPU] Remove unused function opcodeEmitsNoInsts() This was missed in the patch D97545, and cause buildbot failure. Reviewed by: critson Differential Revision: https://reviews.llvm.org/D98229	2021-03-09 10:48:30 +08:00
Ruiling Song	f0ccdde3c9	[AMDGPU] Remove SI_MASK_BRANCH This is already deprecated, so remove code working on this. Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97545	2021-03-09 09:13:23 +08:00
Jay Foad	99682bc039	Revert "Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030"" This reverts commit `e58d68fcd0`. This reinstates commit `fc28f600e5` with a fix to initialize HasShaderCyclesRegister. See https://reviews.llvm.org/D97928.	2021-03-06 09:00:01 +00:00
Mitch Phillips	e58d68fcd0	Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030" Broke the ASan/MSan buildbots. See more comments in the original patch, https://reviews.llvm.org/D97928. Build failure at http://lab.llvm.org:8011/#/builders/5/builds/5327 This reverts commit `fc28f600e5`.	2021-03-05 18:24:59 -08:00
Jay Foad	fc28f600e5	[AMDGPU] Restore the s_memtime instruction in gfx1030 gfx1030 added a new way to implement readcyclecounter using the SHADER_CYCLES hardware register, but the s_memtime instruction still exists, so the MC layer should still accept it and the llvm.amdgcn.s.memtime intrinsic should still work. Differential Revision: https://reviews.llvm.org/D97928	2021-03-05 20:19:11 +00:00
RamNalamothu	3998a8e797	[AMDGPU] Do not attempt sgpr spills to vgpr, when it is disabled This covers a path missed in https://reviews.llvm.org/D95768. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98013	2021-03-05 22:47:21 +05:30
Sebastian Neubauer	e0e73714fb	[AMDGPU] Keep skip branch for ds instructions Same as other memory instructions, ds instructions add latency even if exec is zero. Jumping over them if exec=0 is cheaper than executing them. With this change, the branch instruction that skips over a basic block if exec=0 is not removed when the block contains a ds instruction. Differential Revision: https://reviews.llvm.org/D97922	2021-03-05 12:34:09 +01:00
Petar Avramovic	36beaa3ba3	Reland AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect Recommit `bf5a582650`. Depends on `4c8fb7ddd6` which was reverted. RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-05 11:05:37 +01:00
Jay Foad	ed7458398a	[AMDGPU] Don't check for VMEM hazards on GFX10 The hazard where a VMEM reads an SGPR written by a VALU counts as a data dependency hazard, so no nops are required on GFX10. Tested with Vulkan CTS on GFX10.1 and GFX10.3. Differential Revision: https://reviews.llvm.org/D97926	2021-03-04 21:44:56 +00:00
Nico Weber	e68de60bc4	Revert "AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect" This reverts commit `bf5a582650`. Also depends on now-reverted `4c8fb7ddd6`	2021-03-04 10:16:11 -05:00
Petar Avramovic	bf5a582650	AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-04 15:05:24 +01:00
Stanislav Mekhanoshin	b70c483e04	[AMDGPU] Exclude always_inline from max bb threshold Honor always_inline attribute when processing -amdgpu-inline-max-bb. It was lost during the ports of the heuristic. There is no reason to honor inline hint, but not always inline. Differential Revision: https://reviews.llvm.org/D97790	2021-03-03 10:21:56 -08:00
Matt Arsenault	78dcff4841	GlobalISel: Add default implementation of assignValueToReg Refactor insertion of the asserting ops. This enables using them for AMDGPU. This code should essentially be the same for every target. Mips, X86 and ARM all have different code there now, but this seems to be an accident. The assignment functions are called with different types than they would be in the DAG, so this is all likely an assortment of hacks to get around that.	2021-03-03 09:29:53 -05:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Carl Ritson	2ddac69f98	[AMDGPU] Rename llvm.amdgcn.msaa.load to llvm.amdgcn.msaa.load.x While the underlying instruction is called image_msaa_load, the resource must be x component only. Rename the intrinsic for clarity. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97829	2021-03-03 17:30:39 +09:00
Matt Arsenault	fd82cbcf7d	GlobalISel: Merge and cleanup more AMDGPU call lowering code This merges more AMDGPU ABI lowering code into the generic call lowering. Start cleaning up by factoring away more of the pack/unpack logic into the buildCopy{To\|From}Parts functions. These could use more improvement, and the SelectionDAG versions are significantly more complex, and we'll eventually have to emulate all of those cases too. This is mostly NFC, but does result in some minor instruction reordering. It also removes some of the limitations with mismatched sizes the old code had. However, similarly to the merge on the input, this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we actually want, but SelectionDAG is stuck using the weird emergent ABI). This also changes the load/store size for stack passed EVTs for AArch64, which makes it consistent with the DAG behavior.	2021-03-02 17:31:13 -05:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
Joe Nash	5531f24cc2	[AMDGPU] Make OMod explicit for V_CVT_{U,I}* Make OMod explicit instead of implied by HasModifiers in the operand list. Requires explicitly setting HasOMod=1 for irregular OMod usage in instruction V_CVT_{U,I}* Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97587 Change-Id: I230e1476f529e816eec60e242531f23a99e3839f	2021-03-02 13:32:06 -05:00

1 2 3 4 5 ...

5803 Commits