llvm-project

Commit Graph

Author	SHA1	Message	Date
Thomas Symalla	fce3230be2	Added early exit.	2021-02-02 09:14:52 +01:00
Thomas Symalla	d722924f20	Added comments.	2021-02-02 09:14:52 +01:00
Thomas Symalla	ec043967ec	clang-format	2021-02-02 09:14:52 +01:00
Thomas Symalla	62af0305b7	Added clamp i64 to i16 global isel pattern.	2021-02-02 09:14:52 +01:00
Matt Arsenault	41877b82f0	AMDGPU: Fix dbg_value handling when forming soft clause bundles DBG_VALUES placed between memory instructions would change codegen. Skip over these and re-insert them after the bundle instead of giving up on bundling.	2021-02-01 22:16:35 -05:00
Austin Kerbow	e068e236c3	[AMDGPU] Fix release build after `0397dca0`.	2021-02-01 08:55:14 -08:00
Austin Kerbow	0397dca021	[AMDGPU] Fix crash with sgpr spills to vgpr disabled This would assert with amdgpu-spill-sgpr-to-vgpr disabled when trying to spill the FP. Fixes: SWDEV-262704 Reviewed By: RamNalamothu Differential Revision: https://reviews.llvm.org/D95768	2021-02-01 08:35:25 -08:00
Dmitry Preobrazhensky	99b5631649	[AMDGPU][MC] Corrected error position for invalid operands Generic parser may report an incorrect error position when an offending operand is followed by a comma. See bug 48884 for details: https://bugs.llvm.org/show_bug.cgi?id=48884. Differential Revision: https://reviews.llvm.org/D95674	2021-02-01 14:31:08 +03:00
Matt Arsenault	8f14a08863	AMDGPU: Add missing consts	2021-01-31 10:47:57 -05:00
Kazu Hirata	627b5bda11	[llvm] Add missing header guards (NFC) Identified with llvm-header-guard.	2021-01-30 09:53:42 -08:00
Kazu Hirata	b4e780697d	[AMDGPU] Forward-declare AMDGPUTargetMachine (NFC) AMDGPUTargetTransformInfo.h needs AMDGPUTargetMachine but relies on a forward declaration of AMDGPUTargetMachine in AMDGPU.h. This patch adds a forward declaration right in AMDGPUTargetTransformInfo.h. While we are at it, this patch removes the one in AMDGPU.h, where it is unnecessary.	2021-01-30 09:53:40 -08:00
Stanislav Mekhanoshin	9dbe736cbd	[AMDGPU] Be more specific in needsFrameBaseReg A condition "mayLoadOrStore" is too broad for that function. Differential Revision: https://reviews.llvm.org/D95700	2021-01-29 14:40:25 -08:00
Kazu Hirata	7925aa091d	[llvm] Populate SmallVector at construction time (NFC)	2021-01-28 22:21:14 -08:00
Kazu Hirata	046cfb8565	[llvm] Forward-declare formatted_raw_ostream (NFC) Various TargetStreamer.h need formatted_raw_ostream but rely on a forward declaration of formatted_raw_ostream in MCStreamer.h. This patch adds forward declarations right in TargetStreamer.h. While we are at it, this patch removes the one in MCStreamer.h, where it is unnecessary.	2021-01-28 22:21:13 -08:00
Carl Ritson	0824694d68	[AMDGPU] Fix WMM Entry SCC preservation SCC was not correctly preserved when entering WWM. Current lit test was unable to detect this as entry block is handled differently. Additionally fix an issue where SCC was unnecessarily preserved when exiting from WWM to Exact mode. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D95500	2021-01-29 10:05:36 +09:00
Carl Ritson	0e8f50595e	[AMDGPU] Mark V_SET_INACTIVE as defining SCC V_SET_INACTIVE is implemented with S_NOT which clobbers SCC. Mark sure it is marked appropriately. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D95509	2021-01-29 09:46:41 +09:00
Simon Pilgrim	0805e40a94	AMDGPUPrintfRuntimeBinding - don't dereference a dyn_cast<> pointer. NFCI. We dereference the dyn_cast<> in all paths - use cast<> to silence the clang static analyzer warning.	2021-01-28 12:38:44 +00:00
Mirko Brkusanin	3c979ae9ec	[AMDGPU][GlobalISel] Remove redundant cmp when copying constant to vcc Differential Revision: https://reviews.llvm.org/D95540	2021-01-28 11:20:09 +01:00
Mirko Brkusanin	4b422708ba	[AMDGPU][GlobalISel] Handle G_PTR_ADD when looking for constant offset Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant offset for buffer stores. This also helps with merging of these instructions later on. Differential Revision: https://reviews.llvm.org/D95242	2021-01-28 11:20:09 +01:00
Piotr Sobczak	fc8e741121	[AMDGPU] Avoid an illegal operand in si-shrink-instructions Before the patch it was possible to trigger a constant bus violation when folding immediates into a shrunk instruction. The patch adds a check to enforce the legality of the new operand. Differential Revision: https://reviews.llvm.org/D95527	2021-01-28 08:49:21 +01:00
Stanislav Mekhanoshin	d91ee2f782	[AMDGPU] Do not reassign spilled registers We cannot call LRM::unassign() if LRM::assign() was never called before, these are symmetrical calls. There are two ways of assigning a physical register to virtual, via LRM::assign() and via VRM::assignVirt2Phys(). LRM::assign() will call the VRM to assign the register and then update LiveIntervalUnion. Inline spiller calls VRM directly and thus LiveIntervalUnion never gets updated. A call to LRM::unassign() then asserts about inconsistent liveness. We have to note that not all callers of the InlineSpiller even have LRM to pass, RegAllocPBQP does not have it, so we cannot always pass LRM into the spiller. The only way to get into that spiller LRE_DidCloneVirtReg() call is from LiveRangeEdit::eliminateDeadDefs if we split an LI. This patch refuses to reassign a LiveInterval created by a split to workaround the problem. In fact we cannot reassign a spill anyway as all registers of the needed class are occupied and we are spilling. Fixes: SWDEV-267996 Differential Revision: https://reviews.llvm.org/D95489	2021-01-27 16:29:05 -08:00
Kazu Hirata	6bde085366	[AMDGPU] Forward-declare TargetRegisterClass (NFC) AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a forward declaration of TargetRegisterClass in InstructionSelector.h. This patch adds a forward declaration right in AMDGPUInstructionSelector.h. While we are at it, this patch removes the one in InstructionSelector.h, where it is unnecessary.	2021-01-26 20:00:16 -08:00
Austin Kerbow	2291bd137d	[AMDGPU] Update subtarget features for new target ID support Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features. The possible settings are: - Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off". The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting. Differential Revision: https://reviews.llvm.org/D85882	2021-01-26 11:25:51 -08:00
Matt Arsenault	5f9707b796	AMDGPU: Fix redundant FP spilling/assert in some functions If a function has stack objects, and a call, we require an FP. If we did not initially have any stack objects, and only introduced them during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills would end up spilling the FP register as if it were a normal register. This would result in an assert in a debug build, or redundant handling of the FP register in a release build. Try to predict that we will have an FP later, although this is ugly.	2021-01-26 13:01:45 -05:00
Matt Arsenault	92d1195b5f	AMDGPU: Add assertion to determineCalleeSaves Make sure this isn't getting called multiple times. I was surprised we were modifying the function here, which I think is a bit questionable.	2021-01-26 13:01:45 -05:00
Simon Pilgrim	f82cff31d3	[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI. Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions	2021-01-26 16:09:39 +00:00
Simon Pilgrim	ee3da8958a	[AMDGPU] Fix null-dereference static analysis warnings. NFCI. Avoid repeated calls to isZeroValue() and check for a null pointer before dereferencing a dyn_cast<>.	2021-01-26 15:43:59 +00:00
Matt Arsenault	551a69e418	AMDGPU: Clear IsSSA property in SIFormMemoryClauses Fixes verifier error when writing MIR testcases	2021-01-26 10:40:41 -05:00
Mirko Brkusanin	608ac62540	[AMDGPU] Fix use of HasModifiers in VopProfile HasModifiers should be true if at least one modifier is used. This should make the use of this field bit more consistent. Differential Revision: https://reviews.llvm.org/D94795	2021-01-26 15:21:11 +01:00
Dmitry Preobrazhensky	745064e36b	[AMDGPU][MC] Refactored exp tgt handling Summary: - Separated tgt encoding from parsing; - Separated tgt decoding from printing; - Improved errors handling; - Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0 Reviewers: arsenm, rampitec, foad Differential Revision: https://reviews.llvm.org/D95216	2021-01-26 14:54:15 +03:00
Kazu Hirata	c85b6bf33c	[AMDGPU] Forward-declare MachineIRBuilder (NFC) AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward declaration of MachineIRBuilder in LegalizerInfo.h. This patch adds a forward declaration right in AMDGPULegalizerInfo.h. While we are at it, this patch removes the one in LegalizerInfo.h, where it is unnecessary.	2021-01-25 19:24:01 -08:00
Changpeng Fang	5b648df1a8	AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause Summary: RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge. We observed a long compilation time issue when RPT::reset() is called once for each cluster. In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get the register pressure for the later clusters in the same basic block. This could effectively reduce the number of the expensive calls and thus reduce the compile time. Reviewers: rampitec Fixes: SWDEV-239161 Differential Revision: https://reviews.llvm.org/D95273	2021-01-25 16:08:08 -08:00
Konstantin Zhuravlyov	2cdb34efda	Revert "[IndirectFunctions] Skip propagating attributes to address taken functions" This reverts commit `dd8ae42674`. This commit causes infinite loop when compiling rocThrust and hipCUB. Differential Revision: https://reviews.llvm.org/D95389	2021-01-25 15:58:06 -05:00
Dmitry Preobrazhensky	558b3bbb5b	[AMDGPU][MC] Improved errors handling for SDWA operands Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D95212	2021-01-25 19:02:53 +03:00
Carl Ritson	a80ebd0179	[AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization Frame-base materialization may insert vector instructions before EXEC is initialised. Fix this by moving lowering of llvm.amdgcn.init.exec later in backend. Also remove SI_INIT_EXEC_LO pseudo as this is not necessary. Reviewed By: ruiling Differential Revision: https://reviews.llvm.org/D94645	2021-01-25 08:31:17 +09:00
Kazu Hirata	054444177b	[Target] Use llvm::append_range (NFC)	2021-01-24 12:18:56 -08:00
Kazu Hirata	e4847a7fcf	Revert "[Target] Use llvm::append_range (NFC)" This reverts commit `cc7a238286`. The X86WinEHState.cpp hunk seems to break certain builds.	2021-01-23 11:25:27 -08:00
Kazu Hirata	cc7a238286	[Target] Use llvm::append_range (NFC)	2021-01-23 10:56:31 -08:00
Kazu Hirata	49231c1f80	[llvm] Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-01-22 23:25:05 -08:00
Stanislav Mekhanoshin	ca904b81e6	[AMDGPU] Fix FP materialization/resolve with flat scratch Differential Revision: https://reviews.llvm.org/D95266	2021-01-22 16:06:47 -08:00
Stanislav Mekhanoshin	607bec0bb9	Change materializeFrameBaseRegister() to return register The only caller of this function is in the LocalStackSlotAllocation and it creates base register of class returned by the target's getPointerRegClass(). AMDGPU wants to use a different reg class here so let materializeFrameBaseRegister to just create and return whatever it wants. Differential Revision: https://reviews.llvm.org/D95268	2021-01-22 15:51:06 -08:00
Arthur Eubanks	42d682a217	[NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0 The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0. Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95250	2021-01-22 12:29:39 -08:00
Sebastian Neubauer	8214982b50	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Relands `ba7dcd8542`, which had memory leaks. Differential Revision: https://reviews.llvm.org/D95215	2021-01-22 11:24:08 +01:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
Arthur Eubanks	a11bf9a7fb	[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook Having a custom inliner doesn't really fit in with the new PM's pipeline. It's also extra technical debt. amdgpu-inline only does a couple of custom things compared to the normal inliner: 1) It disables inlining if the number of BBs in a function would exceed some limit 2) It increases the threshold if there are pointers to private arrays(?) These can all be handled as TTI inliner hooks. There already exists a hook for backends to multiply the inlining threshold. This way we can remove the custom amdgpu-inline pass. This caused inline-hint.ll to fail, and after some investigation, it looks like getInliningThresholdMultiplier() was previously getting applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it not applying at all, so some later inliner change must have fixed something), so I had to change the threshold in the test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94153	2021-01-21 20:29:17 -08:00
Sebastian Neubauer	4dbdff66fe	Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue" This reverts commit `ba7dcd8542`. (caused memory leaks)	2021-01-21 18:11:48 +01:00
Jay Foad	c0b3c5a064	[AMDGPU][GlobalISel] Run SIAddImgInit This pass is required to get correct codegen for image instructions with the tfe or lwe bits set. Differential Revision: https://reviews.llvm.org/D95132	2021-01-21 15:54:54 +00:00
Matt Arsenault	94375d1083	AMDGPU: Remove v_rsq_f64 patterns This isn't accurate enough without correction	2021-01-21 10:51:36 -05:00
Matt Arsenault	2a0db8d70e	AMDGPU: Use more accurate fast f64 fdiv A raw v_rcp_f64 isn't accurate enough, so start applying correction.	2021-01-21 10:51:36 -05:00
Sebastian Neubauer	ba7dcd8542	[AMDGPU] Implement mir parseCustomPseudoSourceValue Allow parsing generated mir with custom pseudo source value tokens. Also rename pseudo source values to have more meaningful names. Differential Revision: https://reviews.llvm.org/D94768	2021-01-21 16:32:17 +01:00
Matt Arsenault	20566a2ed8	AMDGPU: Add occupancy to serialized MachineFunctionInfo Not sure about the default value handling, but also not sure defaulting to a theoretically subtarget dependent value.	2021-01-21 09:21:00 -05:00
madhur13490	dd8ae42674	[IndirectFunctions] Skip propagating attributes to address taken functions In case of indirect calls or address taken functions, skip propagating any attributes to them. We just propagate features to such functions. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94585	2021-01-21 07:04:28 +00:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Mirko Brkusanin	a6a72dfdf2	[AMDGPU][GlobalISel] Avoid selecting S_PACK with constants If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT. For that purpose we extend getConstantVRegValWithLookThrough with option to handle G_ANYEXT same way as G_SEXT. Differential Revision: https://reviews.llvm.org/D92219	2021-01-20 11:54:53 +01:00
Petar Avramovic	4ab704d628	[AMDGPU][MC] Add tfe disassembler support MIMG opcodes With tfe on there can be a vgpr write to vdata+1. Add tablegen support for 5 register vdata store. This is required for 4 register vdata store with tfe. Differential Revision: https://reviews.llvm.org/D94960	2021-01-20 10:37:09 +01:00
Jay Foad	18cb7441b6	[AMDGPU] Simpler names for arch-specific ttmp registers. NFC. Rename the _gfx9_gfx10 ttmp registers to _gfx9plus for simplicity, and use the corresponding isGFX9Plus predicate to decide when to use them instead of the old *_vi versions. Differential Revision: https://reviews.llvm.org/D94975	2021-01-19 18:47:14 +00:00
Jay Foad	49dce85584	[AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC. Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00	2021-01-19 10:39:56 +00:00
Dmitry Preobrazhensky	30b8f55378	Fix for sanitizer issue in `55c557a`	2021-01-18 18:39:55 +03:00
Dmitry Preobrazhensky	55c557a5d2	[AMDGPU][MC] Refactored parsing of dpp ctrl Summary of changes: - simplified code to improve maintainability; - replaced lex() with higher level parser functions; - improved errors handling. Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D94777	2021-01-18 18:14:19 +03:00
Dmitry Preobrazhensky	911961c9c1	[AMDGPU][MC][GFX10] Improved dpp8 errors handling Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D94756	2021-01-18 15:02:31 +03:00
Kazu Hirata	352fcfc697	[llvm] Use llvm::sort (NFC)	2021-01-17 10:39:45 -08:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Kazu Hirata	4707b21298	[AMDGPU] Use llvm::is_contained (NFC)	2021-01-15 21:00:54 -08:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Matt Arsenault	d55d592a92	GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper This fixes double printing of insertion debug messages in the legalizer. Try to cleanup usage of observers. Currently the use of observers is pretty hard to follow and it's not clear what is responsible for them. Observers are referenced in 3 places: 1. In the MachineFunction 2. In the MachineIRBuilder 3. In the LegalizerHelper The observers in the MachineFunction and MachineIRBuilder are both called only on insertions, and are redundant with each other. The source of the double printing was the same observer was added to both the MachineFunction, and the MachineIRBuilder. One of these references needs to be removed. Arguably observers in general should be fully removed from one or the other, but it may be useful to have a local observer in the MachineIRBuilder that is not added to the function's observers. Alternatively, the wrapper observer could manage a local observer in one place. The LegalizerHelper only ever calls the observer on changing/changed instructions, and never insertions. Logically these are two different types of observers, for changes and for insertions. Additionally, some places used the GISelObserverWrapper when they only needed a single observer they could use directly. Setting the observer in the LegalizerHelper constructor is not flexible enough if the LegalizerHelper is constructed anywhere outside the one used by the legalizer. AMDGPU calls the LegalizerHelper in RegBankSelect, and needs to use a local observer to apply the regbank to newly created instructions. Currently it accomplishes this by constructing a local MachineIRBuilder. I'm trying to move the MachineIRBuilder to be owned/maintained by the RegBankSelect pass itself, but the locally constructed LegalizerHelper would reset the observer. Mips also has a special case use of the LegalizationArtifactCombiner in applyMappingImpl; I think we do need to run the artifact combiner during RegBankSelect, but in a more consistent way outside of applyMappingImpl.	2021-01-13 10:44:31 -05:00
Carl Ritson	790c75c163	[AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader Add pseudo instruction to allow early termination of pixel shader anywhere based on the value of SCC. The intention is to use this when a mask of live lanes is updated, e.g. live lanes in WQM pass. This facilitates early termination of shaders even when EXEC is incomplete, e.g. in non-uniform control flow. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D88777	2021-01-13 13:29:05 +09:00
Hsiangkai Wang	914e2f5a02	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
Matt Arsenault	3d39709159	AMDGPU: Remove wrapper only call limitation This seems to only have overridden cold handling, which we probably shouldn't do. As far as I can tell the wrapper library functions are still inlined as appropriate.	2021-01-12 17:12:49 -05:00
Sebastian Neubauer	6a195491b6	[AMDGPU] Fix failing assert with scratch ST mode In ST mode, flat scratch instructions have neither an sgpr nor a vgpr for the address. This lead to an assertion when inserting hard clauses. Differential Revision: https://reviews.llvm.org/D94406	2021-01-12 09:54:02 +01:00
Kazu Hirata	8590a3e3ad	[llvm] Use *Set::contains (NFC)	2021-01-11 18:48:07 -08:00
Joe Nash	bcec0f27a2	[AMDGPU] Deduplicate VOP tablegen asm & ins VOP3 and VOP DPP subroutines to generate input operands and asm strings were essentially copy pasted several times. They are deduplicated to reduce the maintenance burden and allow faster development. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D94102 Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea	2021-01-11 13:49:26 -05:00
QingShan Zhang	7539c75bb4	[DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent. As the direction is to remove this global option, we need to remove the unsafe-fp-math check for sqrt and update the test with afn fast-math flags. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D93891	2021-01-11 02:25:53 +00:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00
Tony	2f499b9aff	[AMDGPU] Add volatile support to SIMemoryLegalizer Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed. A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand. Differential Revision: https://reviews.llvm.org/D94214	2021-01-09 00:52:33 +00:00
Christudasan Devadasan	ae25a397e9	AMDGPU/GlobalISel: Enable sret demotion	2021-01-08 10:56:35 +05:30
Kazu Hirata	b934160aaa	[Target] Use llvm::find_if (NFC)	2021-01-07 20:29:36 -08:00
Mehdi Amini	467e916d30	Fix gcc5 build failure (NFC) The loop index was shadowing the container name. It seems that we can just not use a for-range loop here since there is an induction variable anyway. Differential Revision: https://reviews.llvm.org/D94254	2021-01-07 20:11:57 +00:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Matt Arsenault	6b7d5a928f	AMDGPU/GlobalISel: Start cleaning up calling convention lowering There are various hacks working around limitations in handleAssignments, and the logical split between different parts isn't correct. Start separating the type legalization to satisfy going through the DAG infrastructure from the code required to split into register types. The type splitting should be moved to generic code.	2021-01-07 10:36:45 -05:00
Matt Arsenault	ab3a3f543b	AMDGPU/GlobalISel: Update fdiv lowering for denormal/ulp interaction Change the GlobalISel fast fdiv handling to match the changes in `2531535984` and `884acbb9e1`	2021-01-06 12:32:01 -05:00
Christudasan Devadasan	d68458bd56	[GlobalISel] Base implementation for sret demotion. If the return values can't be lowered to registers SelectionDAG performs the sret demotion. This patch contains the basic implementation for the same in the GlobalISel pipeline. Furthermore, targets should bring relevant changes during lowerFormalArguments, lowerReturn and lowerCall to make use of this feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92953	2021-01-06 10:30:50 +05:30
Changpeng Fang	cb5b52a06e	AMDGPU: Annotate amdgpu.noclobber for global loads only Summary: This is to avoid unnecessary analysis since amdgpu.noclobber is only used for globals. Reviewers: arsenm Fixes: SWDEV-239161 Differential Revision: https://reviews.llvm.org/D94107	2021-01-05 14:47:19 -08:00
Arthur Eubanks	28a326eba0	[NFC] Rename registerAliasAnalyses -> registerDefaultAliasAnalyses To clarify that this only affects the "default" AA. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D93980	2021-01-05 11:07:58 -08:00
Joe Nash	60466fad2d	[AMDGPU] Remove deprecated V_MUL_LO_I32 from GFX10 It was removed in GFX10 GPUs, but LLVM could generate it. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D94020 Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c	2021-01-05 11:59:57 -05:00
Jay Foad	3914bebe91	[AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands Convert it to v_fma_legacy_f32 if it is profitable to do so, just like other mac instructions that are converted to their mad equivalents. Differential Revision: https://reviews.llvm.org/D94010	2021-01-05 11:55:33 +00:00
Jay Foad	4e6054a86c	[AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC. Differential Revision: https://reviews.llvm.org/D94009	2021-01-05 11:54:48 +00:00
Arthur Eubanks	8e293fe6ad	[NewPM][AMDGPU] Pass TargetMachine to AMDGPUSimplifyLibCallsPass Missed in https://reviews.llvm.org/D93863.	2021-01-04 13:48:09 -08:00
Arthur Eubanks	191552344b	[NewPM][AMDGPU] Make amdgpu-aa work with NewPM An AMDGPUAA class already existed that was supposed to work with the new PM, but it wasn't tested and was a bit broken. Fix up the existing classes to have the right keys/parameters. Wire up AMDGPUAA inside AMDGPUTargetMachine. Add it to the list of alias analyses for the "default" AAManager since in adjustPassManager() amdgpu-aa is added into the pipeline at the beginning. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93914	2021-01-04 12:36:27 -08:00
Arthur Eubanks	4e838ba9ea	[NewPM][AMDGPU] Port amdgpu-always-inline And add to AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94025	2021-01-04 12:27:01 -08:00
Arthur Eubanks	fd323a897c	[NewPM][AMDGPU] Port amdgpu-printf-runtime-binding And add to AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94026	2021-01-04 12:25:50 -08:00
Arthur Eubanks	e1833e7493	[NewPM][AMDGPU] Port amdgpu-unify-metadata And add to AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94023	2021-01-04 11:57:46 -08:00
Arthur Eubanks	a5f863e076	[NewPM][AMDGPU] Port amdgpu-propagate-attributes-early/late And add to AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94022	2021-01-04 11:53:37 -08:00
Arthur Eubanks	b8f22f9d30	[NewPM][AMDGPU] Run InternalizePass when -amdgpu-internalize-symbols The legacy PM doesn't run EP_ModuleOptimizerEarly on -O0, so skip running it here when given O0. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93886	2021-01-04 11:34:40 -08:00
Kazu Hirata	0e219b6443	[Target] Construct SmallVector with iterator ranges (NFC)	2021-01-03 09:57:45 -08:00
Kazu Hirata	985f899bf2	[Target] Use llvm::append_range (NFC)	2021-01-03 09:57:43 -08:00
Roman Lebedev	7c8b8063b6	[SimplifyCFG][AMDGPU] AMDGPUUnifyDivergentExitNodes: SimplifyCFG isn't ready to preserve PostDomTree There is a number of transforms in SimplifyCFG that take DomTree out of DomTreeUpdater, and do updates manually. Until they are fixed, user passes are unable to claim that PDT is preserved. Note that the default for SimplifyCFG is still not to preserve DomTree, so this is still effectively NFC.	2021-01-03 01:45:46 +03:00
Roman Lebedev	4b80647367	[AMDGPU][SimplifyCFG] Teach AMDGPUUnifyDivergentExitNodes to preserve {,Post}DomTree This is a (last big?) part of the patch series to make SimplifyCFG preserve DomTree. Currently, it still does not actually preserve it, even thought it is pretty much fully updated to preserve it. Once the default is flipped, a valid DomTree must be passed into simplifyCFG, which means that whatever pass calls simplifyCFG, should also be smart about DomTree's. As far as i can see from `check-llvm` with default flipped, this is the last LLVM test batch (other than bugpoint tests) that needed fixes to not break with default flipped. The changes here are boringly identical to the ones i did over 42+ times/commits recently already, so while AMDGPU is outside of my normal ecosystem, i'm going to go for post-commit review here, like in all the other 42+ changes. Note that while the pass is taught to preserve {,Post}DomTree, it still doesn't do that by default, because simplifycfg still doesn't do that by default, and flipping default in this pass will implicitly flip the default for simplifycfg. That will happen, but not right now.	2021-01-02 01:01:20 +03:00
Juneyoung Lee	420d046d6b	clang-format, address warnings	2020-12-30 23:05:07 +09:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Arthur Eubanks	7ecbe0c7a0	[NewPM][AMDGPU] Port amdgpu-lower-kernel-attributes And add it to the AMDGPU opt pipeline. This is a function pass instead of a module pass (like the legacy pass) because it's getting added to a CGSCCPassManager, and you can't put a module pass in a CGSCCPassManager. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93885	2020-12-29 10:26:06 -08:00
Arthur Eubanks	c2ef06d3dd	[NewPM] Port infer-address-spaces And add it to the AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93880	2020-12-28 19:58:12 -08:00
Arthur Eubanks	0e9abcfc19	[AMDGPU][NewPM] Port amdgpu-promote-alloca(-to-vector) And add to AMDGPU opt pipeline. Don't pin an opt run to the legacy PM when -enable-new-pm=1 if these passes (or passes introduced in https://reviews.llvm.org/D93863) are in the list of passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93875	2020-12-28 17:52:31 -08:00
Arthur Eubanks	9abc457724	[NewPM][AMDGPU] Port amdgpu-simplifylib/amdgpu-usenative And add them to the pipeline via AMDGPUTargetMachine::registerPassBuilderCallbacks(), which mirrors AMDGPUTargetMachine::adjustPassManager(). These passes can't be unconditionally added to PassRegistry.def since they are only present when the AMDGPU backend is enabled. And there are no target-specific headers in llvm/include, so parsing these pass names must occur somewhere in the AMDGPU directory. I decided the best place was inside the TargetMachine, since the PassBuilder invokes TargetMachine::registerPassBuilderCallbacks() anyway. If we come up with a cleaner solution for target-specific passes in the future that's fine, but there aren't too many target-specific IR passes living in target-specific directories so it shouldn't be too bad to change in the future. Reviewed By: ychen, arsenm Differential Revision: https://reviews.llvm.org/D93863	2020-12-28 10:38:51 -08:00
alex-t	644da789e3	[AMDGPU] Split edge to make si_if dominate end_cf Basic block containing "if" not necessarily dominates block that is the "false" target for the if. That "false" target block may have another predecessor besides the "if" block. IR value corresponding to the Exec mask is generated by the si_if intrinsic and then used by the end_cf intrinsic. In this case IR verifier complains that 'Def does not dominate all uses'. This change split the edge between the "if" block and "false" target block to make it dominated by the "if" block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D91435	2020-12-28 17:14:02 +03:00
Dmitry Preobrazhensky	8c25bb3d0d	[AMDGPU][MC] Improved errors handling for v_interp* operands See bug 48596 (https://bugs.llvm.org/show_bug.cgi?id=48596) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D93757	2020-12-28 16:15:48 +03:00
Dmitry Preobrazhensky	5b17263b6b	[AMDGPU][MC][NFC] Parser refactoring See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D93756	2020-12-28 14:59:49 +03:00
Kazu Hirata	d6ff5cf995	[Target] Use llvm::any_of (NFC)	2020-12-24 19:43:26 -08:00
Praveen Velliengiri	61177943c9	[AMDGPU] Use MUBUF instructions for global address space access Currently, the compiler crashes in instruction selection of global load/stores in gfx600 due to the lack of FLAT instructions. This patch fix the crash by selecting MUBUF instructions for global load/stores in gfx600. Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com> Reviewed by: t-tye Differential revision: https://reviews.llvm.org/D92483	2020-12-24 10:13:04 +00:00
Stanislav Mekhanoshin	747f67e034	[AMDGPU] Fix adjustWritemask subreg handling If we happen to extract a non-dword subreg that breaks the logic of the function and it may shrink the dmask because it does not recognize the use of a lane(s). This bug is next to impossible to trigger with the current lowering in the BE, but it breaks in one of my future patches. Differential Revision: https://reviews.llvm.org/D93782	2020-12-23 14:43:31 -08:00
Sebastian Neubauer	221fdedc69	[AMDGPU][GlobalISel] Fold flat vgpr + constant addresses Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more vgpr+constant addresses. Differential Revision: https://reviews.llvm.org/D93692	2020-12-23 10:40:30 +01:00
Matt Arsenault	581d13f8ae	GlobalISel: Return APInt from getConstantVRegVal Returning int64_t was arbitrarily limiting for wide integer types, and the functions should handle the full generality of the IR. Also changes the full form which returns the originally defined vreg. Add another wrapper for the common case of just immediately converting to int64_t (arguably this would be useful for the full return value case as well). One possible issue with this change is some of the existing uses did break without conversion to getConstantVRegSExtVal, and it's possible some without adequate test coverage are now broken.	2020-12-22 22:23:58 -05:00
Matt Arsenault	8bf9cdeaee	AMDGPU: Use Register	2020-12-22 21:55:59 -05:00
Matt Arsenault	bac54639c7	AMDGPU: Add spilled CSR SGPRs to entry block live ins	2020-12-22 21:55:59 -05:00
Matt Arsenault	29ed846d67	AMDGPU: Fix assert when checking for implicit operand legality	2020-12-22 20:56:24 -05:00
Stanislav Mekhanoshin	d15119a02d	[AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670	2020-12-22 16:33:06 -08:00
Stanislav Mekhanoshin	ca4bf58e4e	[AMDGPU] Support unaligned flat scratch in TLI Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for unaligned flat scratch support. Mostly needed for global isel. Differential Revision: https://reviews.llvm.org/D93669	2020-12-22 16:12:31 -08:00
Stanislav Mekhanoshin	ae8f4b2178	[AMDGPU] Folding of FI operand with flat scratch Differential Revision: https://reviews.llvm.org/D93501	2020-12-22 10:48:04 -08:00
Dmitry Preobrazhensky	f4f49d9d0d	[AMDGPU][MC][NFC] Fix for sanitizer error in `8ab5770` Corrected to fix sanitizer error introduced by `8ab5770`	2020-12-21 20:42:35 +03:00
Dmitry Preobrazhensky	8ab5770a17	[AMDGPU][MC][NFC] Parser refactoring See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D93548	2020-12-21 20:21:07 +03:00
Carl Ritson	7722494834	[AMDGPU][NFC] Remove unused Hi16Elt definition	2020-12-18 20:38:54 +09:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Matt Arsenault	f333736757	AMDGPU: Remove SGPRSpillVGPRDefinedSet hack These VGPRs should be reserved and therefore do not need "correct" liveness. They should not have undef uses, which can still cause issues.	2020-12-16 21:33:35 -05:00
Roman Lebedev	49dac4aca0	[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Piotr Sobczak	c7afb698ca	[AMDGPU] Avoid calling copyFastMathFlags in wrong context Calling Instruction::copyFastMathFlags() assumes the caller is FPMathOperator. Avoid calling the function for instructions that are not instances of FPMathOperator.	2020-12-16 10:22:51 +01:00
Sebastian Neubauer	409a2f0f9e	[AMDGPU] Allow no saddr for global addtid insts I think the global_load/store_dword_addtid instructions support switching off the scalar address. Add assembler and disassembler support for this. Differential Revision: https://reviews.llvm.org/D93288	2020-12-16 10:01:40 +01:00
Stanislav Mekhanoshin	eb66bf0802	[AMDGPU] Print SCRATCH_EN field after the kernel Differential Revision: https://reviews.llvm.org/D93353	2020-12-15 22:44:30 -08:00
Matt Arsenault	97f51f0489	AMDGPU: Remove redundant CCAction for i1	2020-12-15 17:00:27 -05:00
Tony	d5ea8f7010	[AMDGPU] Clarify scratch initialization - Clarify documentation on initializing scratch. - Rename compute_pgm_rsrc2 field for enabling scratch from ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET to ENABLE_PRIVATE_SEGMENT to match hardware definition. Differential Revision: https://reviews.llvm.org/D93271	2020-12-15 20:14:20 +00:00
Sebastian Neubauer	91445979be	[AMDGPU] Unify flat offset logic Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into AMDGPUBaseInfo. Differential Revision: https://reviews.llvm.org/D93287	2020-12-15 14:59:59 +01:00
Changpeng Fang	ce0c0013d8	AMDGPU: If a store defines (alias) a load, it clobbers the load. Summary: If a store defines (must alias) a load, it clobbers the load. Fixes: SWDEV-258915 Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D92951	2020-12-14 16:34:32 -08:00
Stanislav Mekhanoshin	cf5845d6c4	[AMDGPU] Use multi-dword flat scratch for spilling Differential Revision: https://reviews.llvm.org/D93067	2020-12-14 14:19:29 -08:00
Michael Liao	1fd1f638b6	[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid. Differential Revision: https://reviews.llvm.org/D93174	2020-12-14 13:08:13 -05:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Jay Foad	07e92e6b60	[AMDGPU] Make use of HasSMemRealTime predicate. NFC. We have this subtarget feature so it makes sense to use it here. This is NFC because it's always defined by default on GFX8+. Differential Revision: https://reviews.llvm.org/D93202	2020-12-14 16:34:57 +00:00
Carl Ritson	62c246eda2	[AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0 These parameters set a default value of 0, so I believe they should include a 0 suffix. This allows for versions which do not set a default value in future. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93187	2020-12-14 20:01:56 +09:00
Carl Ritson	af4570cd3a	[AMDGPU][NFC] Remove unused VOP3Mods0Clamp This is unused and the selection function does not exist. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93188	2020-12-14 20:00:58 +09:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
Jay Foad	4f25e53982	[AMDGPU] Make use of emitRemovedIntrinsicError. NFC. Change-Id: I482bbf528255f2eacd3878ddfe7edb9a8f63d5c2	2020-12-11 14:02:14 +00:00
Mirko Brkusanin	0c7cce54eb	[AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2 Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned loads but the one that will be selected is determined either by the order in tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has priority. While ds_read2_b64 has lower alignment requirements, we cannot always restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode option. This was causing ds_read_b128 to be selected for 8-byte aligned loads regardles of chosen access mode. To resolve this we use two patterns for selecting ds_read_b128. One requires alignment of 16-byte and the other requires unaligned-access-mode option. Same goes for ds_write2_b64 and ds_write_b128. Differential Revision: https://reviews.llvm.org/D92767	2020-12-10 12:40:49 +01:00
Stanislav Mekhanoshin	4617cc68f6	[AMDGPU] Fix expansion of 192 bit spills in PEI Differential Revision: https://reviews.llvm.org/D92979	2020-12-09 16:36:29 -08:00
Scott Linder	9260a99999	[MC][AMDGPU] Consume EndOfStatement in asm parser Avoids spurious newlines showing up in the output when emitting assembly via MC. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D92690	2020-12-09 21:45:55 +00:00
Scott Linder	f5f4b8b60f	[AMDGPU][MC] Restore old error position for "too few operands" Revert part of https://reviews.llvm.org/D92084 to make it simpler to start consuming the EndOfStatement token within AMDGPU's ParseInstruction in a future patch. This also brings us back to what every other target currently does. A future change to move the position back to the end of the statement would likely need to audit all of the AMDGPUOperand SMLoc ranges, and determine the SMLoc for the last character of the last operand. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D92960	2020-12-09 21:09:47 +00:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Stanislav Mekhanoshin	dd89249498	[AMDGPU] Annotate vgpr<->agpr spills in asm Differential Revision: https://reviews.llvm.org/D92125	2020-12-07 11:25:25 -08:00
Petar Avramovic	3a042dcd2e	[AMDGPU] Fix default value of glc for mubuf rtn atomics Mubuf rtn atomics use GLC_1 thus default value for glc operand should be -1, see https://reviews.llvm.org/D90730. This allows us to report error when rtn atomic requires glc=1 but does not have glc operand in input. Differential Revision: https://reviews.llvm.org/D92654	2020-12-07 14:00:08 +01:00
Dmitry Preobrazhensky	a0b3a9391c	[AMDGPU][MC] Improved diagnostics message for sym/expr operands See bug 48295 (https://bugs.llvm.org/show_bug.cgi?id=48295) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D92088	2020-12-05 14:05:53 +03:00
Dmitry Preobrazhensky	e97dd11977	[AMDGPU][MC] Corrected error position for invalid MOVREL src See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D92084	2020-12-05 13:23:14 +03:00
Kazu Hirata	2dc4a14e4d	[AMDGPU] Use llvm::is_contained (NFC)	2020-12-04 21:42:55 -08:00
Paul C. Anagnostopoulos	415fab6f67	[TableGen] Eliminate the 'code' type Update the documentation. Rework various backends that relied on the code type. Differential Revision: https://reviews.llvm.org/D92269	2020-12-03 10:19:11 -05:00
Mircea Trofin	bab72dd5d5	[NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister. Typing the API appropriately. Differential Revision: https://reviews.llvm.org/D92341	2020-12-02 15:46:38 -08:00
Jay Foad	d28624a209	[AMDGPU] Stop adding an implicit def of vcc_hi for wave32 This doesn't seem to be needed for anything. Differential Revision: https://reviews.llvm.org/D92400	2020-12-02 10:11:42 +00:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Jay Foad	839c9635ed	[AMDGPU] Simplify some generation checks. NFC.	2020-12-01 10:15:32 +00:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Jay Foad	4f87d30a06	[AMDGPU] Introduce and use isGFX10Plus. NFC. It's more future-proof to use isGFX10Plus from the start, on the assumption that future architectures will be based on current architectures. Also make use of the existing isGFX9Plus in a few places. Differential Revision: https://reviews.llvm.org/D92092	2020-11-26 09:02:36 +00:00
Sebastian Neubauer	edd675643d	[AMDGPU] Emit stack frame size in metadata Add .shader_functions to pal metadata, which contains the stack frame size for all non-entry-point functions. Differential Revision: https://reviews.llvm.org/D90036	2020-11-25 16:30:02 +01:00
Jay Foad	4926eed59c	[AMDGPU] Add a TRANS bit to TSFlags. NFC. This is used to mark transcendental instructions that execute on a separate pipeline from the normal VALU pipeline. Differential Revision: https://reviews.llvm.org/D92042	2020-11-24 17:49:56 +00:00
Jay Foad	000400ca0a	Fix speling in comments. NFC.	2020-11-23 14:43:24 +00:00
Dmitry Preobrazhensky	ce44bf2cf2	[AMDGPU][MC] Improved diagnostic messages See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91794	2020-11-23 16:15:05 +03:00
Dmitry Preobrazhensky	e4effef330	[AMDGPU][MC] Improved diagnostic messages for invalid literals See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91793	2020-11-23 15:48:06 +03:00
Matt Arsenault	79f75468b4	AMDGPU: Fix counting kernel arguments towards register usage Also use DataLayout to get type size. Relying on the IR type size is also pretty broken here, since this won't perfectly capture how types are legalized.	2020-11-20 21:23:33 -05:00
Alex Richardson	51e09e1d5a	[AMDGPU] Set the default globals address space to 1 This will ensure that passes that add new global variables will create them in address space 1 once the passes have been updated to no longer default to the implicit address space zero. This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't already to present to ensure bitcode backwards compatibility. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D84345	2020-11-20 15:46:53 +00:00
Sebastian Neubauer	7a18bdb350	[AMDGPU] Implement flat scratch init for pal Extract the scratch offset from the scratch buffer descriptor that is stored in the global table. Differential Revision: https://reviews.llvm.org/D91701	2020-11-20 11:14:30 +01:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Duncan P. N. Exon Smith	5abf76fbe3	ADT: Add assertions to SmallVector::insert, etc., for reference invalidation `2c196bbc6b` asserted that `SmallVector::push_back` doesn't invalidate the parameter when it needs to grow. Do the same for `resize`, `append`, `assign`, `insert`, and `emplace_back`. Differential Revision: https://reviews.llvm.org/D91744	2020-11-18 17:36:28 -08:00
Sebastian Neubauer	72ccec1bbc	[AMDGPU] Fix v3f16 interaction with image store workaround In some cases, the wrong amount of registers was reserved. Also enable more v3f16 tests. Differential Revision: https://reviews.llvm.org/D90847	2020-11-18 18:21:04 +01:00
Jay Foad	7ecf19697e	[AMDGPU] Fix and extend vccz workarounds We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways: 1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary. 2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation. Differential Revision: https://reviews.llvm.org/D91636	2020-11-18 15:26:06 +00:00
Jay Foad	9f69c1bc54	[AMDGPU] Rename pseudo S_WAITCNT_IDLE to S_WAIT_IDLE. NFC.	2020-11-18 14:03:43 +00:00
Florian Hahn	b2f4c5fddc	[AsmWriter] Factor out mnemonic generation to accessible getMnemonic. This patch factors out the part of printInstruction that gets the mnemonic string for a given MCInst. This is intended to be used subsequently for the instruction-mix remarks to display the final mnemonic (D90040). Unfortunately making `getMnemonic` available to the AsmPrinter seems to require making it virtual. Not sure if there's a way around that with the current layering of the AsmPrinters. Reviewed By: Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D90039	2020-11-17 09:47:38 +00:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Jay Foad	a6ecb2eb3d	[AMDGPU] Add comments. NFC.	2020-11-16 16:34:13 +00:00
Dmitry Preobrazhensky	65f3e121fe	[AMDGPU][MC] Corrected error position for some operands and modifiers Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91412	2020-11-16 16:11:23 +03:00
Dmitry Preobrazhensky	0bee8c784b	[AMDGPU][MC] Corrected error position for swizzle() Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91408	2020-11-16 14:37:57 +03:00
Dmitry Preobrazhensky	89df8fc0d7	[AMDGPU][MC] Corrected error position for hwreg() and sendmsg() Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518) Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D91407	2020-11-16 14:25:07 +03:00
Stanislav Mekhanoshin	c9821cec74	[AMDGPU] Mark sin/cos load folding as modifying the function. When the load value is folded into the sin/cos operation, the AMDGPU library calls simplifier could still mark the function as unmodified. Instead ensure if there is an early return, return whether the load was folded into the sin/cos call. Authored by MJDSys Differential Revision: https://reviews.llvm.org/D91401	2020-11-13 14:49:33 -08:00
Jessica Paquette	b184a2eccf	[GlobalISel] Add matchers for specific constants and a matcher for negations It's fairly common to need matchers for a specific constant value, or for common idioms like finding a negated register. Add - `m_SpecificICst`, which returns true when matching a specific value.. - `m_ZeroInt`, which returns true when an integer 0 is matched. - `m_Neg`, which returns when a register is negated. Also update a few places which use idioms related to the new matchers. Differential Revision: https://reviews.llvm.org/D91397	2020-11-13 09:24:54 -08:00
Matt Arsenault	e722943e05	AMDGPU: Factor out large flat offset splitting	2020-11-13 11:22:13 -05:00
Matt Arsenault	0fd6a04ba4	AMDGPU: Refactor getBaseWithOffsetUsingSplitOR usage	2020-11-13 10:58:17 -05:00
Simon Pilgrim	a4d3691d55	Fix MSVC signed/unsigned comparison warning. NFCI.	2020-11-13 10:20:48 +00:00
Jay Foad	ad3ec08955	[AMDGPU] One more use of the new export target names. NFC.	2020-11-13 09:44:09 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Stanislav Mekhanoshin	5ab1702129	[AMDGPU] Remove scratch rsrc from spill pseudos Differential Revision: https://reviews.llvm.org/D91110	2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin	cf6565f6d0	[AMDGPU] Enable multi-dword flat scratch load/stores Differential Revision: https://reviews.llvm.org/D91384	2020-11-12 13:38:56 -08:00
Jay Foad	6881a82e8c	[AMDGPU] Fix scheduling of exp pos4 Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix has any effect in practice. Differential Revision: https://reviews.llvm.org/D91290	2020-11-12 19:57:14 +00:00
Jay Foad	d7d6ac5624	[AMDGPU] Define and use names for export targets. NFC. Differential Revision: https://reviews.llvm.org/D91289	2020-11-12 19:57:14 +00:00
Jay Foad	f23c4c6f8a	[AMDGPU] Separate out real exp instructions by subtarget. NFC. Differential Revision: https://reviews.llvm.org/D91247	2020-11-11 17:13:40 +00:00
Jay Foad	2b33ea6935	[AMDGPU] Split exp instructions out into their own tablegen file. NFC. Differential Revision: https://reviews.llvm.org/D91246	2020-11-11 17:13:40 +00:00
Jay Foad	f94fd1c8ca	[AMDGPU] Make use of SIInstrInfo::isEXP. NFC.	2020-11-11 17:01:20 +00:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit `8b08fa0103`. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Stanislav Mekhanoshin	544ef42e40	[AMDGPU] Set default op_sel_hi on accvgpr read/write These are opsel opcodes with op_sel actually being ignored. As a such op_sel_hi needs to be set to default 1 even though these bits are ignored. This is compatibility change. Differential Revision: https://reviews.llvm.org/D91202	2020-11-10 13:07:29 -08:00
Jay Foad	bb8d1437a6	[AMDGPU] Simplify multiclass EXP_m. NFC.	2020-11-10 17:28:36 +00:00
Jay Foad	0ad4d04002	[AMDGPU] Remove an unused return value. NFC. Differential Revision: https://reviews.llvm.org/D91063	2020-11-10 09:15:14 +00:00
Carl Ritson	fde8351743	[AMDGPU] Fix lowering of S_MOV_{B32,B64}_term If the source of S_MOV_{B32,B64}_term is an immediate then it cannot be lowered to a COPY. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D90451	2020-11-10 12:16:31 +09:00
Mircea Trofin	2ac3a7d0c4	[NFC] Use [MC]Register Differential Revision: https://reviews.llvm.org/D90795	2020-11-09 08:37:14 -08:00
Stanislav Mekhanoshin	d5a465866e	[AMDGPU] Omit buffer resource with flat scratch. Differential Revision: https://reviews.llvm.org/D90979	2020-11-09 08:05:20 -08:00
Paul C. Anagnostopoulos	91d2e5c81a	[TableGen] Add the !filter bang operator. Add a test. Update the Programmer's Reference. Use it in some TableGen files. Differential Revision: https://reviews.llvm.org/D91008	2020-11-09 10:56:55 -05:00

... 2 3 4 5 6 ...

5803 Commits