llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	c799f873cb	[AMDGPU] Don't cluster stores Clustering loads has caching benefits, but as far as I know there is no advantage to clustering stores on any AMDGPU subtargets. The disadvantage is that it tends to increase register pressure and restricts scheduling freedom. Differential Revision: https://reviews.llvm.org/D85530	2020-09-14 13:40:17 +01:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Michael Liao	1f4e7463b5	[amdgpu] Run SROA after loop unrolling. Summary: - There are promotable `alloca`s after loop unrolling. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, nikic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84252	2020-09-01 16:09:56 -04:00
Craig Topper	aab90384a3	[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None) There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None. So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline. This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations. Differential Revision: https://reviews.llvm.org/D86744	2020-08-28 13:23:45 -07:00
Matt Arsenault	79298a5067	AMDGPU: Remove SIFixupVectorISel pass This was only used for matching the saddr addressing mode of global instructions, but this was not implemented correctly. The instruction definitions aren't even correct, and are defined as using a 64-bit VGPR component. Eliminate this pass to enable correcting the instruction definitions. A new matching implementation can work in GlobalISel or relying on DAG divergence information for the base address.	2020-08-15 12:11:51 -04:00
Matt Arsenault	57bd64ff84	Support addrspacecast initializers with isNoopAddrSpaceCast Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs with the DataLayout.	2020-07-31 10:42:43 -04:00
Christudasan Devadasan	d7a05698ef	[AMDGPU] Move LowerSwitch pass to CodeGenPrepare. It is possible that LowerSwitch pass leaves certain blocks unreachable from the entry. If not removed, these dead blocks can cause undefined behavior in the subsequent passes. It caused a crash in the AMDGPU backend after the instruction selection when a PHI node has its incoming values coming from these unreachable blocks. In the AMDGPU pass flow, the last invocation of UnreachableBlockElim precedes where LowerSwitch is currently placed and eventually missed out on the opportunity to get these blocks eliminated. This patch ensures that LowerSwitch pass get inserted earlier to make use of the existing unreachable block elimination pass. Reviewed By: sameerds, arsenm Differential Revision: https://reviews.llvm.org/D83584	2020-07-11 16:33:38 +05:30
Carl Ritson	a3daa3f75a	[AMDGPU] Unify early PS termination blocks Generate a single early exit block out-of-line and branch to this if all lanes are killed. This avoids branching if lanes are active. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D82641	2020-07-03 09:58:05 +09:00
Jay Foad	def2e4c47f	[AMDGPU] Simplify GCNPassConfig::addOptimizedRegAlloc. NFC.	2020-06-17 15:56:15 +01:00
Sameer Sahasrabuddhe	d8f651d3e8	[AMDGPU] Enable structurizer workarounds by default Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D81211	2020-06-09 13:14:15 +05:30
Stanislav Mekhanoshin	689e616ed0	[AMDGPU] Promote alloca to vector in opt Promote alloca to vector before SROA and loop unroll. If we manage to eliminate allocas before unroll we may choose to unroll less. Differential Revision: https://reviews.llvm.org/D80386	2020-05-21 13:49:51 -07:00
Jay Foad	42a5560503	[AMDGPU] New SIInsertHardClauses pass Enable clausing of memory loads on gfx10 by adding a new pass to insert the s_clause instructions that mark the start of each hard clause. Differential Revision: https://reviews.llvm.org/D79792	2020-05-14 18:54:49 +01:00
Carl Ritson	e3ffe7269b	[AMDGPU] Cluster shader exports Summary: Add DAG scheduling mutation to cluster export instructions. This avoids unnecessary waitcnts being added when computation ends up interspersed with exports. Reviewers: foad, arsenm, rampitec, nhaehnle Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79481	2020-05-07 19:05:38 +09:00
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Sameer Sahasrabuddhe	8c11bc0cd0	Introduce fix-irreducible pass An irreducible SCC is one which has multiple "header" blocks, i.e., blocks with control-flow edges incident from outside the SCC. This pass converts an irreducible SCC into a natural loop by introducing a single new header block and redirecting all the edges on the original headers to this new block. This is a useful workaround for a limitation in the structurizer which, which produces incorrect control flow in the presence of irreducible regions. The AMDGPU backend provides an option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D77198 This restores commit `2ada8e2525`. Originally reverted with commit `44e09b59b8`.	2020-04-15 15:05:51 +05:30
Sameer Sahasrabuddhe	44e09b59b8	Revert "Introduce fix-irreducible pass" This reverts commit `2ada8e2525`. Buildbots produced compilation errors which I was not able to quickly reproduce locally. Need more time to investigate.	2020-04-15 12:19:50 +05:30
Sameer Sahasrabuddhe	2ada8e2525	Introduce fix-irreducible pass An irreducible SCC is one which has multiple "header" blocks, i.e., blocks with control-flow edges incident from outside the SCC. This pass converts an irreducible SCC into a natural loop by introducing a single new header block and redirecting all the edges on the original headers to this new block. This is a useful workaround for a limitation in the structurizer which, which produces incorrect control flow in the presence of irreducible regions. The AMDGPU backend provides an option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D77198	2020-04-15 11:29:19 +05:30
Jay Foad	4970a1deca	[AMDGPU] Remove outdated comment	2020-04-09 10:36:00 +01:00
Matt Arsenault	0aa0d70067	MIR: Use Register	2020-04-08 22:07:26 -04:00
Konstantin Pyzhov	72e8754916	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 09:05:58 -04:00
Konstantin Pyzhov	51dc028314	Revert `e1730cfeb3`	2020-04-06 05:56:11 -04:00
Konstantin Pyzhov	e1730cfeb3	[AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU. Reviewers: sameerds, dstuttard Differential Revision: https://reviews.llvm.org/D77228	2020-04-06 05:10:37 -04:00
Sameer Sahasrabuddhe	3cbbded68c	Introduce unify-loop-exits pass. For each natural loop with multiple exit blocks, this pass creates a new block N such that all exiting blocks now branch to N, and then control flow is redistributed to all the original exit blocks. The bulk of the tranformation is a new function introduced in BasicBlockUtils that an redirect control flow from a set of incoming blocks to a set of outgoing blocks via a common "hub". This is a useful workaround for a limitation in the structurizer which incorrectly orders blocks when processing a nest of loops. This pass bypasses that issue by ensuring that each natural loop is recognized as a separate region. Since the structurizer is a region pass, it no longer sees a nest of loops in a single region, and instead processes each "level" in the nesting as a separate region. The AMDGPU backend provides a new option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewers: madhur13490, arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D75865	2020-03-30 13:23:56 -04:00
Matt Arsenault	348735b723	AMDGPU: Stop setting attributes based on TargetOptions Having arbitrary passes looking at the TargetOptions is pretty messy. This was also disregarding if a function already had an explicit attribute setting on it. opt/llc now add the attributes to functions that don't specify the attribute. clang and lld do not call the function to do this, which they maybe should. This was also treating unsafe-fp-math as implying the others, and setting the other attributes based on it. This is not done anywhere else, and I'm not sure is correct based on the current description of the option bit. Effectively reverts `1d8cf2be89`	2020-03-27 13:13:43 -07:00
Stanislav Mekhanoshin	6e00e3fcb0	[AMDGPU] Preserve original symbol during attribute propagation AMDGPUPropagateAttributes can swap names while cloning a function. Only do it if original symbol was not externally visible. Differential Revision: https://reviews.llvm.org/D76789	2020-03-25 15:26:30 -07:00
cdevadas	ce984129ea	[AMDGPU] Add SIPreEmitPeephole pass. This pass can handle all the optimization opportunities found just before code emission. Presently it includes the handling of vcc branch optimization that was handled earlier in SIInsertSkips. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D76712	2020-03-25 15:35:35 +00:00
Ram Nalamothu	24698e526f	Implement wave32 DWARF register mapping Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst. This enables generating appropriate DWARF register numbers for wave64 and wave32 modes.	2020-03-23 10:24:16 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Scott Linder	db099f994b	[AMDGPU][NFC] Refactor some uses of unsigned to Register Tags: #llvm Differential Revision: https://reviews.llvm.org/D76035	2020-03-19 15:35:16 -04:00
Matt Arsenault	015b640be4	AMDGPU: Add flag to used fixed function ABI Pass all arguments to every function, rather than only passing the minimum set of inputs needed for the call graph.	2020-03-13 13:27:05 -07:00
Matt Arsenault	fee41517fe	AMDGPU/GlobalISel: Introduce post-legalize combiner The current set of custom combines are only really useful after legalization, so move them there. There is a lot of overlap in the boilerplate here, but I think we do want a pretty different set of combines before and after legalize. I think we will want a lot of overlap between the post-legalize and a post-regbankselect combiner.	2020-02-24 22:12:12 -05:00
Matt Arsenault	54137bbaaf	GlobalISel: Allow running localizer earlier This required legal and regbankselected MIR for seemingly no reason. For AMDGPU this wouldn't see legalized G_GLOBAL_VALUEs.	2020-02-17 11:24:06 -08:00
Matt Arsenault	5fdc9851d0	AMDGPU/GlobalISel: Run the localizer pass While looking at the output on real sized programs, there is a lot of extra SGPR spilling compared to the DAG path. This seems to largely be from all constants being SGPRs in the entry block.	2020-02-17 07:38:12 -08:00
Jay Foad	e9900b1fbf	[AMDGPU] Add one more pass to LLVMInitializeAMDGPUTarget	2020-02-12 11:19:14 +00:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Stanislav Mekhanoshin	555d8f4ef5	[AMDGPU] Bundle loads before post-RA scheduler We are relying on atrificial DAG edges inserted by the MemOpClusterMutation to keep loads and stores together in the post-RA scheduler. This does not work all the time since it allows to schedule a completely independent instruction in the middle of the cluster. Removed the DAG mutation and added pass to bundle already clustered instructions. These bundles are unpacked before the memory legalizer because it does not work with bundles but also because it allows to insert waitcounts in the middle of a store cluster. Removing artificial edges also allows a more relaxed scheduling. Differential Revision: https://reviews.llvm.org/D72737	2020-01-24 11:33:38 -08:00
Matt Arsenault	a174f0da62	AMDGPU/GlobalISel: Add pre-legalize combiner pass Just copy the AArch64 pass as-is for now, except for removing the memcpy handling.	2020-01-22 10:16:39 -05:00
cdevadas	e53a9d96e6	Resubmit: [AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-22 13:18:32 +09:00
Nicolai Hähnle	a80291ce10	Revert "[AMDGPU] Invert the handling of skip insertion." This reverts commit `0dc6c249bf`. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.	2020-01-21 09:17:25 +01:00
cdevadas	0dc6c249bf	[AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-15 15:18:16 +05:30
Tom Stellard	0dbcb36394	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439	2020-01-14 19:46:52 -08:00
Michael Liao	01a4b83154	[codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU. Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709	2020-01-14 19:26:15 -05:00
Reid Kleckner	05da2fe521	Sink all InitializePasses.h includes This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211	2019-11-13 16:34:37 -08:00
Matt Arsenault	19e7f8a21d	AMDGPU: Add default denormal mode to MachineFunctionInfo The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineFunctionInfo to help move towards this goal.	2019-11-01 00:03:39 -07:00
Jay Foad	765055658c	Revert "[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs." Summary: This has been superseded by "[AMDGPU]: PHI Elimination hooks added for custom COPY insertion." This reverts the code changes from commit `53f967f2bd` but keeps the test case. Reviewers: hliao, arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68769 llvm-svn: 374347	2019-10-10 13:34:31 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Matt Arsenault	3b95986a32	AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts The mul24 matching could interfere with SLSR and the other addressing mode related passes. This probably is not the optimal placement, but is an intermediate step. This should probably be moved after all the generic IR passes, particularly LSR. Moving this after LSR seems to help in some cases, and hurts others. As-is in this patch, in idiv-licm, it saves 1-2 instructions inside some of the loop bodies, but increases the number in others. Moving this later helps these loops. In the new lsr tests in mul24-pass-ordering, the intrinsic prevents introducing more instructions in the loop preheader, so moving this later ends up hurting them. This shouldn't be any worse than before the intrinsics were introduced in r366094, and LSR should probably be smarter. I think it's because it doesn't know the and inside the loop will be folded away. llvm-svn: 369991	2019-08-27 00:08:31 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
Stanislav Mekhanoshin	4c9c98f36b	[AMDGPU] Printf runtime binding pass This pass is a port of the according pass from the HSAIL compiler. It parses printf calls and setup runtime printf buffer. After that it copies printf arguments to the buffer and fills in module metadata for runtime. Differential Revision: https://reviews.llvm.org/D24035 llvm-svn: 368592	2019-08-12 17:12:29 +00:00
Michael Liao	53f967f2bd	[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs. Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023	2019-07-25 14:50:18 +00:00

1 2 3 4 5 ...

300 Commits