llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	5f9707b796	AMDGPU: Fix redundant FP spilling/assert in some functions If a function has stack objects, and a call, we require an FP. If we did not initially have any stack objects, and only introduced them during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills would end up spilling the FP register as if it were a normal register. This would result in an assert in a debug build, or redundant handling of the FP register in a release build. Try to predict that we will have an FP later, although this is ugly.	2021-01-26 13:01:45 -05:00
Matt Arsenault	92d1195b5f	AMDGPU: Add assertion to determineCalleeSaves Make sure this isn't getting called multiple times. I was surprised we were modifying the function here, which I think is a bit questionable.	2021-01-26 13:01:45 -05:00
Simon Pilgrim	f82cff31d3	[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI. Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions	2021-01-26 16:09:39 +00:00
Simon Pilgrim	ee3da8958a	[AMDGPU] Fix null-dereference static analysis warnings. NFCI. Avoid repeated calls to isZeroValue() and check for a null pointer before dereferencing a dyn_cast<>.	2021-01-26 15:43:59 +00:00
Matt Arsenault	551a69e418	AMDGPU: Clear IsSSA property in SIFormMemoryClauses Fixes verifier error when writing MIR testcases	2021-01-26 10:40:41 -05:00
Mirko Brkusanin	608ac62540	[AMDGPU] Fix use of HasModifiers in VopProfile HasModifiers should be true if at least one modifier is used. This should make the use of this field bit more consistent. Differential Revision: https://reviews.llvm.org/D94795	2021-01-26 15:21:11 +01:00
Dmitry Preobrazhensky	745064e36b	[AMDGPU][MC] Refactored exp tgt handling Summary: - Separated tgt encoding from parsing; - Separated tgt decoding from printing; - Improved errors handling; - Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0 Reviewers: arsenm, rampitec, foad Differential Revision: https://reviews.llvm.org/D95216	2021-01-26 14:54:15 +03:00
Kazu Hirata	c85b6bf33c	[AMDGPU] Forward-declare MachineIRBuilder (NFC) AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward declaration of MachineIRBuilder in LegalizerInfo.h. This patch adds a forward declaration right in AMDGPULegalizerInfo.h. While we are at it, this patch removes the one in LegalizerInfo.h, where it is unnecessary.	2021-01-25 19:24:01 -08:00
Changpeng Fang	5b648df1a8	AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause Summary: RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge. We observed a long compilation time issue when RPT::reset() is called once for each cluster. In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get the register pressure for the later clusters in the same basic block. This could effectively reduce the number of the expensive calls and thus reduce the compile time. Reviewers: rampitec Fixes: SWDEV-239161 Differential Revision: https://reviews.llvm.org/D95273	2021-01-25 16:08:08 -08:00
Konstantin Zhuravlyov	2cdb34efda	Revert "[IndirectFunctions] Skip propagating attributes to address taken functions" This reverts commit `dd8ae42674`. This commit causes infinite loop when compiling rocThrust and hipCUB. Differential Revision: https://reviews.llvm.org/D95389	2021-01-25 15:58:06 -05:00
Dmitry Preobrazhensky	558b3bbb5b	[AMDGPU][MC] Improved errors handling for SDWA operands Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D95212	2021-01-25 19:02:53 +03:00
Carl Ritson	a80ebd0179	[AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization Frame-base materialization may insert vector instructions before EXEC is initialised. Fix this by moving lowering of llvm.amdgcn.init.exec later in backend. Also remove SI_INIT_EXEC_LO pseudo as this is not necessary. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D94645	2021-01-25 08:31:17 +09:00
Kazu Hirata	054444177b	[Target] Use llvm::append_range (NFC)	2021-01-24 12:18:56 -08:00
Kazu Hirata	e4847a7fcf	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit `cc7a238286`. The X86WinEHState.cpp hunk seems to break certain builds.	2021-01-23 11:25:27 -08:00
Kazu Hirata	cc7a238286	[Target] Use llvm::append_range (NFC)	2021-01-23 10:56:31 -08:00
Kazu Hirata	49231c1f80	[llvm] Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-01-22 23:25:05 -08:00
Stanislav Mekhanoshin	ca904b81e6	[AMDGPU] Fix FP materialization/resolve with flat scratch Differential Revision: https://reviews.llvm.org/D95266	2021-01-22 16:06:47 -08:00
Stanislav Mekhanoshin	607bec0bb9	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
Arthur Eubanks	42d682a217	[NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0 The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0. Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95250	2021-01-22 12:29:39 -08:00
Sebastian Neubauer	8214982b50	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Relands `ba7dcd8542`, which had memory leaks. Differential Revision: https://reviews.llvm.org/D95215	2021-01-22 11:24:08 +01:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
Arthur Eubanks	a11bf9a7fb	[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook Having a custom inliner doesn't really fit in with the new PM's pipeline. It's also extra technical debt. amdgpu-inline only does a couple of custom things compared to the normal inliner: 1) It disables inlining if the number of BBs in a function would exceed some limit 2) It increases the threshold if there are pointers to private arrays(?) These can all be handled as TTI inliner hooks. There already exists a hook for backends to multiply the inlining threshold. This way we can remove the custom amdgpu-inline pass. This caused inline-hint.ll to fail, and after some investigation, it looks like getInliningThresholdMultiplier() was previously getting applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it not applying at all, so some later inliner change must have fixed something), so I had to change the threshold in the test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94153	2021-01-21 20:29:17 -08:00
Sebastian Neubauer	4dbdff66fe	Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue" This reverts commit `ba7dcd8542`. (caused memory leaks)	2021-01-21 18:11:48 +01:00
Jay Foad	c0b3c5a064	[AMDGPU][GlobalISel] Run SIAddImgInit This pass is required to get correct codegen for image instructions with the tfe or lwe bits set. Differential Revision: https://reviews.llvm.org/D95132	2021-01-21 15:54:54 +00:00
Matt Arsenault	94375d1083	AMDGPU: Remove v_rsq_f64 patterns This isn't accurate enough without correction	2021-01-21 10:51:36 -05:00
Matt Arsenault	2a0db8d70e	AMDGPU: Use more accurate fast f64 fdiv A raw v_rcp_f64 isn't accurate enough, so start applying correction.	2021-01-21 10:51:36 -05:00
Sebastian Neubauer	ba7dcd8542	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Differential Revision: https://reviews.llvm.org/D94768	2021-01-21 16:32:17 +01:00
Matt Arsenault	20566a2ed8	AMDGPU: Add occupancy to serialized MachineFunctionInfo Not sure about the default value handling, but also not sure defaulting to a theoretically subtarget dependent value.	2021-01-21 09:21:00 -05:00
madhur13490	dd8ae42674	[IndirectFunctions] Skip propagating attributes to address taken functions In case of indirect calls or address taken functions, skip propagating any attributes to them. We just propagate features to such functions. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94585	2021-01-21 07:04:28 +00:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Mirko Brkusanin	a6a72dfdf2	[AMDGPU][GlobalISel] Avoid selecting S_PACK with constants If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT. For that purpose we extend getConstantVRegValWithLookThrough with option to handle G_ANYEXT same way as G_SEXT. Differential Revision: https://reviews.llvm.org/D92219	2021-01-20 11:54:53 +01:00
Petar Avramovic	4ab704d628	[AMDGPU][MC] Add tfe disassembler support MIMG opcodes With tfe on there can be a vgpr write to vdata+1. Add tablegen support for 5 register vdata store. This is required for 4 register vdata store with tfe. Differential Revision: https://reviews.llvm.org/D94960	2021-01-20 10:37:09 +01:00
Jay Foad	18cb7441b6	[AMDGPU] Simpler names for arch-specific ttmp registers. NFC. Rename the _gfx9_gfx10 ttmp registers to _gfx9plus for simplicity, and use the corresponding isGFX9Plus predicate to decide when to use them instead of the old *_vi versions. Differential Revision: https://reviews.llvm.org/D94975	2021-01-19 18:47:14 +00:00
Jay Foad	49dce85584	[AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC. Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00	2021-01-19 10:39:56 +00:00
Dmitry Preobrazhensky	30b8f55378	Fix for sanitizer issue in `55c557a`	2021-01-18 18:39:55 +03:00
Dmitry Preobrazhensky	55c557a5d2	[AMDGPU][MC] Refactored parsing of dpp ctrl Summary of changes: - simplified code to improve maintainability; - replaced lex() with higher level parser functions; - improved errors handling. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D94777	2021-01-18 18:14:19 +03:00
Dmitry Preobrazhensky	911961c9c1	[AMDGPU][MC][GFX10] Improved dpp8 errors handling Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D94756	2021-01-18 15:02:31 +03:00
Kazu Hirata	352fcfc697	[llvm] Use llvm::sort (NFC)	2021-01-17 10:39:45 -08:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Kazu Hirata	4707b21298	[AMDGPU] Use llvm::is_contained (NFC)	2021-01-15 21:00:54 -08:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Matt Arsenault	d55d592a92	GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper This fixes double printing of insertion debug messages in the legalizer. Try to cleanup usage of observers. Currently the use of observers is pretty hard to follow and it's not clear what is responsible for them. Observers are referenced in 3 places: 1. In the MachineFunction 2. In the MachineIRBuilder 3. In the LegalizerHelper The observers in the MachineFunction and MachineIRBuilder are both called only on insertions, and are redundant with each other. The source of the double printing was the same observer was added to both the MachineFunction, and the MachineIRBuilder. One of these references needs to be removed. Arguably observers in general should be fully removed from one or the other, but it may be useful to have a local observer in the MachineIRBuilder that is not added to the function's observers. Alternatively, the wrapper observer could manage a local observer in one place. The LegalizerHelper only ever calls the observer on changing/changed instructions, and never insertions. Logically these are two different types of observers, for changes and for insertions. Additionally, some places used the GISelObserverWrapper when they only needed a single observer they could use directly. Setting the observer in the LegalizerHelper constructor is not flexible enough if the LegalizerHelper is constructed anywhere outside the one used by the legalizer. AMDGPU calls the LegalizerHelper in RegBankSelect, and needs to use a local observer to apply the regbank to newly created instructions. Currently it accomplishes this by constructing a local MachineIRBuilder. I'm trying to move the MachineIRBuilder to be owned/maintained by the RegBankSelect pass itself, but the locally constructed LegalizerHelper would reset the observer. Mips also has a special case use of the LegalizationArtifactCombiner in applyMappingImpl; I think we do need to run the artifact combiner during RegBankSelect, but in a more consistent way outside of applyMappingImpl.	2021-01-13 10:44:31 -05:00
Carl Ritson	790c75c163	[AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader Add pseudo instruction to allow early termination of pixel shader anywhere based on the value of SCC. The intention is to use this when a mask of live lanes is updated, e.g. live lanes in WQM pass. This facilitates early termination of shaders even when EXEC is incomplete, e.g. in non-uniform control flow. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D88777	2021-01-13 13:29:05 +09:00
Hsiangkai Wang	914e2f5a02	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
Matt Arsenault	3d39709159	AMDGPU: Remove wrapper only call limitation This seems to only have overridden cold handling, which we probably shouldn't do. As far as I can tell the wrapper library functions are still inlined as appropriate.	2021-01-12 17:12:49 -05:00
Sebastian Neubauer	6a195491b6	[AMDGPU] Fix failing assert with scratch ST mode In ST mode, flat scratch instructions have neither an sgpr nor a vgpr for the address. This lead to an assertion when inserting hard clauses. Differential Revision: https://reviews.llvm.org/D94406	2021-01-12 09:54:02 +01:00
Kazu Hirata	8590a3e3ad	[llvm] Use *Set::contains (NFC)	2021-01-11 18:48:07 -08:00
Joe Nash	bcec0f27a2	[AMDGPU] Deduplicate VOP tablegen asm & ins VOP3 and VOP DPP subroutines to generate input operands and asm strings were essentially copy pasted several times. They are deduplicated to reduce the maintenance burden and allow faster development. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D94102 Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea	2021-01-11 13:49:26 -05:00

1 2 3 4 5 ...

5630 Commits