llvm-project

Commit Graph

Author	SHA1	Message	Date
Ruiling Song	cf14c7caac	AMDGPU: Add a pass to rewrite certain undef in PHI For the pattern of IR (%if terminates with a divergent branch.), divergence analysis will report %phi as uniform to help optimal code generation. ``` %if \| \ \| %then \| / %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ] ``` In the backend, %phi and %uniform will be assigned a scalar register. But the %undef from %then will make the scalar register dead in %then. This will likely cause the register being over-written in %then. To fix the issue, we will rewrite %undef as %uniform. For details, please refer the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test changes shown, but this is mandatory for later changes. Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D133840	2022-09-26 09:54:47 +08:00
Eric Wang	d8a2d3f7d4	[NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132835	2022-09-08 07:50:03 -07:00
Matthias Gehre	6cc52d594f	Fix AMDGPU test failures due to "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64"	2022-09-06 16:18:14 +01:00
Jay Foad	e301e071ba	[AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline This pass seems to have very little effect because all it does is hoist some instructions, but it is followed later in the codegen pipeline by the IR CodeSinking pass which does the opposite. Differential Revision: https://reviews.llvm.org/D130258	2022-08-02 17:35:20 +01:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Jay Foad	0f94d2b385	[AMDGPU] GFX11: automatically release VGPRs at the end of the shader GFX11 has a new message type MSG_DEALLOC_VGPRS which can be used to release a shader's VGPRs. Sending this at the end of a shader (just before the s_endpgm) can help overall system performance in cases where the s_endpgm would have to wait for outstanding VMEM stores to complete before releasing the VGPRs. Differential Revision: https://reviews.llvm.org/D128442	2022-06-30 20:55:14 +01:00
Jay Foad	cfb7ffdec0	[AMDGPU] New AMDGPUInsertDelayAlu pass Differential Revision: https://reviews.llvm.org/D128270	2022-06-29 21:30:20 +01:00
Baptiste Saleil	79e77a9f39	[AMDGPU] Flush the vmcnt counter in loop preheaders when necessary waitcnt vmcnt instructions are currently generated in loop bodies before using values loaded outside of the loop. In some cases, it is better to flush the vmcnt counter in a loop preheader before entering the loop body. This patch detects these cases and generates waitcnt instructions to flush the counter. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D115747	2022-06-23 10:53:21 -04:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Matt Arsenault	203a1e36ed	Reapply "AMDGPU: Remove AMDGPUFixFunctionBitcasts pass" This reverts commit `8a85be807b`. The unrelated failure this exposed was fixed.	2022-04-11 19:43:37 -04:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Matt Arsenault	4622afa94c	AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass This is more precise in the face of indirect calls and aliases, still assuming the call target is defined somewhere in the current module. This sometimes changes the order the functions are printed, and also changes the point where context errors are printed relative to stdout. This also likely has negative consequences for compile time and memory usage.	2022-02-04 15:56:04 -05:00
Nico Weber	9122b5072a	[llvm] Remove an old bot cleanup command	2022-01-20 15:02:35 -05:00
Mircea Trofin	09103807e7	[NFC][regalloc] Introduce the RegAllocEvictionAdvisorAnalysis This patch introduces the eviction analysis and the eviction advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Differential Revision: https://reviews.llvm.org/D115707	2021-12-16 17:56:46 -08:00
Ron Lieberman	f4420f5224	Revert "AMDGPU: Update pass pipeline test" needed to match revert of rG2b4876157562: AMDGPU: Remove AMDGPUFixFunctionBitcasts pass This reverts commit `7ca355225d`.	2021-12-16 21:39:04 +00:00
Matt Arsenault	7ca355225d	AMDGPU: Update pass pipeline test	2021-12-15 18:37:18 -05:00
Jay Foad	58e7ec471c	[AMDGPU] Run SIShrinkInstructions before post-RA scheduling Run post-RA SIShrinkInstructions just before post-RA scheduling, instead of afterwards. After the fixes in D112305 and D112317 this seems to make no difference, but it paves the way for scheduler tweaks that are sensitive to the e32 vs e64 encoding of VALU instructions. Differential Revision: https://reviews.llvm.org/D112341	2021-10-22 20:24:03 +01:00
Jay Foad	e996cf7dce	[AMDGPU] Preserve MachineDominatorTree in SILowerControlFlow Updating the MachineDominatorTree is easy since SILowerControlFlow only splits and removes basic blocks. This should save a bit of compile time because previously we would recompute the dominator tree from scratch after this pass. Another reason for doing this is that SILowerControlFlow preserves LiveIntervals which transitively requires MachineDominatorTree. I think that means that SILowerControlFlow is obliged to preserve MachineDominatorTree too as explained here: https://lists.llvm.org/pipermail/llvm-dev/2020-November/146923.html although it does not seem to have caused any problems in practice yet. Differential Revision: https://reviews.llvm.org/D111313	2021-10-07 21:30:26 +01:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
hsmahesha	97688bfd3d	Revert "Revert "Disable ReplaceLDS pass, patch up tests to match"" This reverts commit `5ae6804d17`.	2021-09-01 21:52:50 +05:30
hsmahesha	5ae6804d17	Revert "Disable ReplaceLDS pass, patch up tests to match" This reverts commit `50ad3478bd`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D109062	2021-09-01 21:19:39 +05:30
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
Anshil Gandhi	29e11a1aa3	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `c4e5425aa5`.	2021-08-13 23:58:04 -06:00
Anshil Gandhi	c4e5425aa5	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-13 22:44:08 -06:00
Reshabh Sharma	5173854f19	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-06 15:53:33 +05:30
Reshabh Sharma	dce35ef104	Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list" This reverts commit `d42e70b3d3`.	2021-08-04 23:33:31 +05:30
Reshabh Sharma	d42e70b3d3	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-04 19:53:33 +05:30
Stanislav Mekhanoshin	d01b34ed31	[AMDGPU] Move perfhint analysis This is SCC pass, moving it to the end of SCC PM saves one Function PM. This needs the analysis to take into account memory access width since it is now places after the load/store optimizer (D105651). Differential Revision: https://reviews.llvm.org/D105652	2021-07-21 13:06:49 -07:00
Sebastian Neubauer	2b08f6af62	[AMDGPU] Improve register computation for indirect calls First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls. This is more accurate than guessing the maximum register usage without looking at the actual usage. As before, assume that indirect calls will hit a function in the current module. Differential Revision: https://reviews.llvm.org/D105839	2021-07-20 13:48:50 +02:00
Matt Arsenault	c9ec807b11	CodeGen: Make MachineOptimizationRemarkEmitterPass a CFG analysis This avoids rerunning it a few times.	2021-07-19 21:08:26 -04:00
Stanislav Mekhanoshin	c46d99e4ba	[AMDGPU] Refine -O0 and -O1 passes. Differential Revision: https://reviews.llvm.org/D105579	2021-07-15 09:51:54 -07:00
Jay Foad	372bb08252	[AMDGPU] Check llc-pipeline.ll with -match-full-lines -strict-whitespace This prevents breaking the indentation that shows the structure of the pass managers. Differential Revision: https://reviews.llvm.org/D105891	2021-07-14 16:33:50 +01:00
Djordje Todorovic	df686842bc	[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs This new MIR pass removes redundant DBG_VALUEs. After the register allocator is done, more precisely, after the Virtual Register Rewriter, we end up having duplicated DBG_VALUEs, since some virtual registers are being rewritten into the same physical register as some of existing DBG_VALUEs. Each DBG_VALUE should indicate (at least before the LiveDebugValues) variables assignment, but it is being clobbered for function parameters during the SelectionDAG since it generates new DBG_VALUEs after COPY instructions, even though the parameter has no assignment. For example, if we had a DBG_VALUE $regX as an entry debug value representing the parameter, and a COPY and after the COPY, DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets rewritten into $regX, we'd end up having redundant DBG_VALUE. This breaks the definition of the DBG_VALUE since some analysis passes might be built on top of that premise..., and this patch tries to fix the MIR with the respect to that. This first patch performs bacward scan, by trying to detect a sequence of consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one variable but the last one: For example: (1) DBG_VALUE $edi, !"var1", ... (2) DBG_VALUE $esi, !"var2", ... (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (1). By combining the forward scan that will be introduced in the next patch (from this stack), by inspecting the statistics, the RemoveRedundantDebugValues removes 15032 instructions by using gdb-7.11 as a testbed. Differential Revision: https://reviews.llvm.org/D105279	2021-07-14 04:29:42 -07:00
Matt Arsenault	eebe841a47	RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point. Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill. This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers. In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run. One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work. Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.	2021-07-13 18:49:29 -04:00
Stanislav Mekhanoshin	de5582be26	[AMDGPU] Fix more indention in llc-pipeline test. NFC.	2021-07-08 11:20:00 -07:00
Stanislav Mekhanoshin	9dae86ce56	[AMDGPU] Fix indention in llc-pipeline test. NFC.	2021-07-08 11:08:25 -07:00
Stanislav Mekhanoshin	74a5760d35	[AMDGPU] Set LoopInfo as preserved by SIAnnotateControlFlow The pass does not change loops, it just adds calls. Differential Revision: https://reviews.llvm.org/D105583	2021-07-08 09:34:43 -07:00
Stanislav Mekhanoshin	0fdb25cd95	[AMDGPU] Disable garbage collection passes Differential Revision: https://reviews.llvm.org/D105593	2021-07-07 15:47:57 -07:00
Stanislav Mekhanoshin	a0ab45799b	[AMDGPU] Move atomic expand past infer address spaces There are cases where infer address spaces pass cannot yet infer an address space in the opt pipeline and then in the llc pipeline it runs too late for atomic expand pass to benefit from a specific address space. Move atomic expand pass past the infer address spaces. Fixes: SWDEV-293410 Differential Revision: https://reviews.llvm.org/D105511	2021-07-06 15:53:32 -07:00
Stanislav Mekhanoshin	5915d33874	[AMDGPU] Do not run IR optimizations at -O0 Differential Revision: https://reviews.llvm.org/D105515	2021-07-06 15:29:52 -07:00
Stanislav Mekhanoshin	381ded345b	[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register. This gives 10-20% uplift in a set of huge apps heavily using double precession math. Fixes: SWDEV-292645 Differential Revision: https://reviews.llvm.org/D104874	2021-06-30 11:45:38 -07:00

1 2

61 Commits