llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	f9b68304a2	[AMDGPU] Enable machine verification after AMDGPUISelDAGToDAG This was introduced in D32628 but it does not seem to be required any more. At least it does not show any problems in check-llvm in an LLVM_ENABLE_EXPENSIVE_CHECKS build. Differential Revision: https://reviews.llvm.org/D110692	2021-09-29 18:47:19 +01:00
hsmahesha	c0735cb9f1	[AMDGPU] Do not internalize ASan device library functions. ASan device library functions (those starts with the prefix __asan_) are at the moment undergoing through undesired optimizations due to internalization. Hence, in order to avoid such undesired optimizations on ASan device library functions, do not internalize them in the first place. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D110468	2021-09-29 07:19:02 +05:30
Jacob Lambert	dc6e8dfdfe	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.	2021-09-20 14:48:50 -07:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
hsmahesha	97688bfd3d	Revert "Revert "Disable ReplaceLDS pass, patch up tests to match"" This reverts commit `5ae6804d17`.	2021-09-01 21:52:50 +05:30
hsmahesha	5ae6804d17	Revert "Disable ReplaceLDS pass, patch up tests to match" This reverts commit `50ad3478bd`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D109062	2021-09-01 21:19:39 +05:30
Daniil Fukalov	48958d02d2	[NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596	2021-08-25 12:01:55 +03:00
Reshabh Sharma	5173854f19	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-06 15:53:33 +05:30
Reshabh Sharma	dce35ef104	Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list" This reverts commit `d42e70b3d3`.	2021-08-04 23:33:31 +05:30
Reshabh Sharma	d42e70b3d3	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-04 19:53:33 +05:30
Tarindu Jayatilaka	7a797b2902	Take OptimizationLevel class out of Pass Builder Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D107025	2021-07-29 21:57:23 -07:00
Kuter Dinel	96709823ec	[AMDGPU] Deduce attributes with the Attributor This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes. Reviewed By: jdoerfert, arsenm Differential Revision: https://reviews.llvm.org/D104997	2021-07-24 06:07:15 +03:00
Stanislav Mekhanoshin	d01b34ed31	[AMDGPU] Move perfhint analysis This is SCC pass, moving it to the end of SCC PM saves one Function PM. This needs the analysis to take into account memory access width since it is now places after the load/store optimizer (D105651). Differential Revision: https://reviews.llvm.org/D105652	2021-07-21 13:06:49 -07:00
Sebastian Neubauer	2b08f6af62	[AMDGPU] Improve register computation for indirect calls First, collect the register usage in each function, then apply the maximum register usage of all functions to functions with indirect calls. This is more accurate than guessing the maximum register usage without looking at the actual usage. As before, assume that indirect calls will hit a function in the current module. Differential Revision: https://reviews.llvm.org/D105839	2021-07-20 13:48:50 +02:00
Stanislav Mekhanoshin	c46d99e4ba	[AMDGPU] Refine -O0 and -O1 passes. Differential Revision: https://reviews.llvm.org/D105579	2021-07-15 09:51:54 -07:00
Matt Arsenault	eebe841a47	RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point. Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill. This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers. In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run. One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work. Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.	2021-07-13 18:49:29 -04:00
Michael Liao	cc92833f8a	[amdgpu] Remove the GlobalDCE pass prior to the internalization pass. - In [D98783](https://reviews.llvm.org/D98783), an extra GlobalDCE pass is inserted before the internalization pass to ensure a global variable without users could be internalized even if there are dead users. Instead of inserting a dedicated optimization pass, the dead user checking, i.e. 'use_empty()', should be preceeded with constant dead user removal to ensure an accurate result. Differential Revision: https://reviews.llvm.org/D105590	2021-07-08 10:25:58 -04:00
Stanislav Mekhanoshin	0fdb25cd95	[AMDGPU] Disable garbage collection passes Differential Revision: https://reviews.llvm.org/D105593	2021-07-07 15:47:57 -07:00
Stanislav Mekhanoshin	b16400449f	[AMDGPU] isPassEnabled() helper to check cl::opt and OptLevel We have several checks for both cl::opt and OptLevel over our pass config, although these checks do not properly work if default value of a cl::opt will be false. Create a helper to use instead and properly handle it. NFC for now. Differential Revision: https://reviews.llvm.org/D105517	2021-07-06 21:53:35 -07:00
Stanislav Mekhanoshin	a0ab45799b	[AMDGPU] Move atomic expand past infer address spaces There are cases where infer address spaces pass cannot yet infer an address space in the opt pipeline and then in the llc pipeline it runs too late for atomic expand pass to benefit from a specific address space. Move atomic expand pass past the infer address spaces. Fixes: SWDEV-293410 Differential Revision: https://reviews.llvm.org/D105511	2021-07-06 15:53:32 -07:00
Stanislav Mekhanoshin	5915d33874	[AMDGPU] Do not run IR optimizations at -O0 Differential Revision: https://reviews.llvm.org/D105515	2021-07-06 15:29:52 -07:00
Stanislav Mekhanoshin	381ded345b	[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register. This gives 10-20% uplift in a set of huge apps heavily using double precession math. Fixes: SWDEV-292645 Differential Revision: https://reviews.llvm.org/D104874	2021-06-30 11:45:38 -07:00
Jon Chesterfield	50ad3478bd	Disable ReplaceLDS pass, patch up tests to match Most tests passed with an extra argument to explicitly enable the pass. One does not, deleted it as part of this change. I can't see why the codegen would be different between default on and default off but switched on. It can be retrieved from the project history. This would be a revert, but git revert was not clean. Disabling the pass and leaving it in tree is less likely to cause breakage elsewhere than patching up the git revert conflicts on unfamiliar code. It'll be landed without review, as @hsmhsm is believed unavailable at present. Differential Revision: https://reviews.llvm.org/D104962	2021-06-26 01:36:42 +01:00
Ruiling Song	208332de8a	[AMDGPU] Add Optimize VGPR LiveRange Pass. This pass aims to optimize VGPR live-range in a typical divergent if-else control flow. For example: def(a) if(cond) use(a) ... // A else use(a) As AMDGPU access vgpr with respect to active-mask, we can mark `a` as dead in region A. For details, please refer to the comments in implementation file. The pass is enabled by default, the frontend can disable it through "-amdgpu-opt-vgpr-liverange=false". Differential Revision: https://reviews.llvm.org/D102212	2021-06-21 15:25:55 +08:00
hsmahesha	80fd5fa526	[AMDGPU] Replace non-kernel function uses of LDS globals by pointers. The main motivation behind pointer replacement of LDS use within non-kernel functions is - to avoid subsequent LDS lowering pass from directly packing LDS (assume large LDS) into a struct type which would otherwise cause allocating huge memory for struct instance within every kernel. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103225	2021-06-21 11:51:49 +05:30
Baptiste Saleil	5885f1a4cb	[AMDGPU] Disable the SIFormMemoryClauses pass at -O1 This patch disables the SIFormMemoryClauses pass at -O1. This pass has a significant impact on compilation time, so we only want it to be enabled starting from -O2. Differential Revision: https://reviews.llvm.org/D101939	2021-05-12 11:51:59 -04:00
Piotr Sobczak	09fe84abb4	[AMDGPU] Move code sinking before structurizer Moving code sinking pass before structurizer creates more sinking opportunities. The extra flow edges introduced by the structurizer can have adverse effects on sinking, because the sinking pass prefers moving instructions to blocks with unique predecessors and the structurizer destroys that property in some cases. A notable example is moving high-latency image instructions across kills. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D101115	2021-05-11 14:07:23 +02:00
Arthur Eubanks	34a8a437bf	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
Sebastian Neubauer	98e5ede604	[AMDGPU] Serialize MFInfo::ScavengeFI Serialize ScavengeFI from SIMachineFunctionInfo into yaml. ScavengeFI is not used outside of the PrologEpilogInserter, so this shouldn't change anything. Differential Revision: https://reviews.llvm.org/D101367	2021-05-07 11:15:25 +02:00
Baptiste Saleil	6a17609157	[AMDGPU] Disable the scalar IR, SDWA and load store vectorizer passes at -O1 This patch disables some of the passes at -O1. These passes have a significant impact on compilation time, so we only want them to be enabled starting from -O2. Differential Revision: https://reviews.llvm.org/D101414	2021-05-04 16:44:39 -04:00
Baptiste Saleil	caf1294d95	[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pass and its associated test cases from the tree. Differential Revision: https://reviews.llvm.org/D101313 Change-Id: I0599169a7609c19a887f8d847a71e664030cc141	2021-04-26 17:21:49 -04:00
Jay Foad	ef443390a9	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Reapply after fixing dependent change D100188. Differential Revision: https://reviews.llvm.org/D100189	2021-04-19 12:08:02 +01:00
Yaxun (Sam) Liu	3597f02fd5	[AMDGPU] Add GlobalDCE before internalization pass The internalization pass only internalizes global variables with no users. If the global variable has some dead user, the internalization pass will not internalize it. To be able to internalize global variables with dead users, a global dce pass is needed before the internalization pass. This patch adds that. Reviewed by: Artem Belevich, Matt Arsenault Differential Revision: https://reviews.llvm.org/D98783	2021-04-17 11:25:25 -04:00
hsmahesha	4973b0c4e7	[AMDGPU] Disable forceful inline of non-kernel functions which use LDS. Now since LDS uses within non-kernel functions are being handled in the pass - LowerModuleLDS, we NO need to forcefully inline non-kernel functions just because they use LDS. Do forceful inlining only when the pass - LowerModuleLDS is not enabled. It is enabled by default. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100481	2021-04-15 09:12:56 +05:30
hsmahesha	e3070db0f7	[AMDGPU] Rename "LDS lowering" pass name. Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to `amdgpu-enable-lower-module-lds` as later is consistent and reads better. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100441	2021-04-14 20:19:53 +05:30
madhur13490	5682ae2fc6	[AMDGPU] Set implicit arg attributes for indirect calls This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347	2021-04-13 13:15:13 +00:00
Mitch Phillips	092f288d36	Revert "[AMDGPU] Remove MachineDCE after SIFoldOperands" This reverts commit `5a0117b2d0`. Reason: Dependent change `d19a42eba9` broke the ASan buildbots.	2021-04-09 15:47:44 -07:00
Jay Foad	5a0117b2d0	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Differential Revision: https://reviews.llvm.org/D100189	2021-04-09 20:41:09 +01:00
Stanislav Mekhanoshin	d5d412f2ae	[AMDGPU] Split GCNRegBankReassign Allow pass to work separately with SGPR, VGPR registers or both. This is NFC now but will be needed to split RA for separate SGPR and VGPR passes. Differential Revision: https://reviews.llvm.org/D100063	2021-04-07 14:45:13 -07:00
Jay Foad	fdc4f19e2f	[AMDGPU] Remove SIAddIMGInit pass which is now unused Differential Revision: https://reviews.llvm.org/D99748	2021-04-01 18:13:17 +01:00
Jay Foad	3d07a6d891	[AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic Doing this during instruction selection avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99670	2021-04-01 18:13:17 +01:00
Jay Foad	4af6251cea	[AMDGPU][SDag] Add IMG init in AdjustInstrPostInstrSelection Doing this in a post-isel hook avoids the cost of running SIAddIMGInit which is yet another pass over the MIR. Differential Revision: https://reviews.llvm.org/D99747	2021-04-01 18:13:17 +01:00
Carl Ritson	fe5f4c397f	[AMDGPU] Rename SIInsertSkips Pass Pass no longer handles skips. Pass now removes unnecessary unconditional branches and lowers early termination branches. Hence rename to SILateBranchLowering. Move code to handle returns to epilog from SIPreEmitPeephole into SILateBranchLowering. This means SIPreEmitPeephole only contains optional optimisations, and all required transforms are in SILateBranchLowering. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D98915	2021-03-20 11:48:04 +09:00
Carl Ritson	5df2af8b0e	[AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole SIRemoveShortExecBranches is an optimisation so fits well in the context of SIPreEmitPeephole. Test changes relate to early termination from kills which have now been lowered prior to considering branches for removal. As these use s_cbranch the execz skips are now retained instead. Currently either behaviour is valid as kill with EXEC=0 is a nop; however, if early termination is used differently in future then the new behaviour is the correct one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D98917	2021-03-20 11:26:42 +09:00
Jon Chesterfield	13e49dcee4	[amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648	2021-03-15 15:24:01 +00:00
Petar Avramovic	36beaa3ba3	Reland AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect Recommit `bf5a582650`. Depends on `4c8fb7ddd6` which was reverted. RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-05 11:05:37 +01:00
Nico Weber	e68de60bc4	Revert "AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect" This reverts commit `bf5a582650`. Also depends on now-reverted `4c8fb7ddd6`	2021-03-04 10:16:11 -05:00
Petar Avramovic	bf5a582650	AMDGPU/GlobalISel: Combine zext(trunc x) to x after RegBankSelect RegBankSelect creates zext and trunc when it selects banks for uniform i1. Add zext_trunc_fold from generic combiner to post RegBankSelect combiner. Differential Revision: https://reviews.llvm.org/D95432	2021-03-04 15:05:24 +01:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00

1 2 3 4 5 ...

376 Commits