llvm-project

Commit Graph

Author	SHA1	Message	Date
Dmitry Preobrazhensky	a80116efec	[AMDGPU][MC][GFX11] Add a helper function for identification of VOPD instructions Differential Revision: https://reviews.llvm.org/D133608	2022-09-13 12:41:39 +03:00
Dmitry Preobrazhensky	815ba49068	[AMDGPU][MC] Add detection of mandatory literals in parser Differential Revision: https://reviews.llvm.org/D133606	2022-09-13 12:37:30 +03:00
Kazu Hirata	9606608474	[llvm] Use x.empty() instead of llvm::empty(x) (NFC) I'm planning to deprecate and eventually remove llvm::empty. I thought about replacing llvm::empty(x) with std::empty(x), but it turns out that all uses can be converted to x.empty(). That is, no use requires the ability of std::empty to accept C arrays and std::initializer_list. Differential Revision: https://reviews.llvm.org/D133677	2022-09-12 13:34:35 -07:00
Matt Arsenault	7834194837	TableGen: Introduce generated getSubRegisterClass function Currently there isn't a generic way to get a smaller register class that can be produced from a subregister of a larger class. Replaces a manually implemented version for AMDGPU. This will be used to improve subregister support in the allocator.	2022-09-12 09:03:37 -04:00
Johannes Doerfert	c922cac868	Revert "[Attributor] AAPointerInfo should allow "harmless" uses" Revert "[Attributor] Teach AAPointerInfo to look into aggregates" This reverts commit `844f6c5d03` and `4ed0a88cd8` as they broke the buildbots that run openmp/libomptarget/test/offloading/bug49021.cpp.	2022-09-11 21:37:54 -07:00
Johannes Doerfert	4ed0a88cd8	[Attributor] Teach AAPointerInfo to look into aggregates If we have a constant aggregate, e.g., as an initializer, we usually failed to extract the proper value/type from it. This patch provides the size and offset information necessary to extract the right part of the constant.	2022-09-11 20:16:11 -07:00
Jay Foad	8901f7cebc	[AMDGPU] Fix crash legalizing G_EXTRACT_VECTOR_ELT with negative index Fixes https://github.com/llvm/llvm-project/issues/57408 Differential Revision: https://reviews.llvm.org/D132938	2022-09-09 15:53:34 +01:00
Dmitry Preobrazhensky	6b79610fd5	[AMDGPU][MC][GFX11][NFC] Correct VOPD parsing Differential Revision: https://reviews.llvm.org/D133492	2022-09-09 13:03:29 +03:00
Jay Foad	afa0ed33df	[AMDGPU] Fix shrinking of F16 FMA on newer subtargets D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of the opcode which is used on GFX9+. Differential Revision: https://reviews.llvm.org/D133489	2022-09-08 16:41:04 +01:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Ivan Kosarev	57c943d581	[AMDGPU] Only raise wave priority if there is a long enough sequence of VALU instructions. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D124671	2022-09-08 15:21:30 +01:00
Justin Bogner	a81c7dbf0d	[AMDGPU] Drop _oneuse checks from med3 patterns We use _oneuse checks to make sure combines won't accidentally increase code size, but this prevents the optimization in cases where we happen to want to clamp multiple values to the same range It's safe to drop these checks for two reasons: 1. The pattern of max/min operations for med3 is complicated enough it's unlikely to come up by accident, so this will still only fire when appropriate to do so 2. Even if every intermediate is used and we don't save a single operation, we still won't end up with more operations since the med3 replaces the final max/min. In pathological cases we could potentially end up with a larger encoding size or possibly slightly increased vgpr pressure, but the risk of that is low, especially considering the upside. Differential Revision: https://reviews.llvm.org/D132621	2022-09-07 16:31:49 -07:00
Stanislav Mekhanoshin	fb28bf3fb4	[AMDGPU] Fix liveness verifier error in hazard recognizer After D133067 we are inserting swaps to use a new physical register. I have noticed verifier errors about undefined physical register uses if we are tracking liveness post RA. We have no access to LIS at this point, so mark new register uses as undef to calm down the verifier. Liveness should not matter at this point anyway. Note the description of the RegState::Undef: "Value of the register doesn't matter." I.e. it does not say it is strictly undefined. In fact that is what we really need: this value does not matter. I also had to modify the test a bit since with tracking enabled it does not pass verification even before the recognizer. Differential Revision: https://reviews.llvm.org/D133459	2022-09-07 16:30:36 -07:00
Stanislav Mekhanoshin	95d497ff2a	[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR In this case gfx90a uses v0 instead of the correct register. Swap the value temporarily with a lower register and then swap it back. Unfortunately hazard recognizer works after wait count insertion, so we cannot simply reuse an arbitrary register, hence w/a also includes a full waitcount. This can be avoided if we run it from expandPostRAPseudo, but that is a complete misplacement. Differential Revision: https://reviews.llvm.org/D133067	2022-09-07 14:23:49 -07:00
Jon Chesterfield	23f6c8d635	[amdgpu] Always, instead of mostly, remove unused LDS symbols Currently LDS variables are removed by the lower module pass if they have a use which is caught by the replace with struct control flow. This makes tests brittle to changes to that control flow which induces noise when trying to improve lowering. Some tests already check that variables are removed, while others checked that they are not removed. LDS variables are not (currently) externally accessible, and if that changes the machinery which makes them externally accessible will look like a use. This change therefore breaks no applications. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D133028	2022-09-07 18:28:16 +01:00
Jay Foad	96dfa523c2	[AMDGPU] Refactor SIFoldOperands. NFC. Refactor static functions into class methods so they have access to TII, MRI etc.	2022-09-07 11:05:01 +01:00
Jay Foad	5291c3dd36	[AMDGPU] Simplify mad/mac patterns. NFC. Simplify instruction selection patterns for mad/mac: - Use any_fmad consistently to make it clear that all patterns treat fmad and AMDGPUfmad_ftz identically. - For mad, put the patterns on the instruction definitions. For mac the patterns are still out-of-line because we want to set AddedComplexity and to have special handling of the source modifiers. Differential Revision: https://reviews.llvm.org/D133305	2022-09-07 09:58:28 +01:00
raghavmedicherla	57f01fee1e	[AMDGPU/Metadata] Rename HSAMD::MetadataStreamer classes Renamed all HSAMD::MetadataStreamer classes to improve readability of the code. Differential Revision: https://reviews.llvm.org/D133156	2022-09-06 16:46:37 -04:00
Ivan Kosarev	5db8d6fd2b	[AMDGPU][CodeGen] Support (base \| offset) SMEM loads. Prevents generation of unnecessary s_or_b32 instructions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132552	2022-09-05 14:22:06 +01:00
Ivan Kosarev	f33645301e	[AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130263	2022-09-05 12:53:05 +01:00
Kazu Hirata	7d8c2d17eb	[llvm] Use range-based for loops (NFC) Identified with modernize-loop-convert.	2022-09-03 23:27:25 -07:00
Kazu Hirata	32aa35b504	Drop empty string literals from static_assert (NFC) Identified with modernize-unary-static-assert.	2022-09-03 11:17:47 -07:00
Kazu Hirata	fedc59734a	[llvm] Use range-based for loops (NFC)	2022-09-03 11:17:40 -07:00
Juan Manuel MARTINEZ CAAMAÑO	ee761374f7	[AMDGPU][NFC] Fix typo in commment: replace SiMemOpInfo by SIMemOpInfo	2022-09-02 16:45:10 +02:00
Jon Chesterfield	a28bbd00c6	[amdgpu][nfc] Factor predicate out of findLDSVariablesToLower	2022-08-31 15:44:51 +01:00
Stanislav Mekhanoshin	fd1f8c85f2	[AMDGPU] Limit TID / wavefrontsize uniformness to 1D kernels If a kernel has uneven dimensions we can have a value of workitem-id-x divided by the wavefrontsize non-uniform. For example dimensions (65, 2) will have workitems with address (64, 0) and (0, 1) packed into a same wave which gives 1 and 0 after the division by 64 respectively. Unfortunately, this limits the optimization to OpenCL only and only if reqd_work_group_size attribute is set. This patch limits it to 1D kernels, although that shall be possible to perform this optimization is the size of the X dimension is a power of 2, we just do not currently have infrastructure to query it. Note that presence of amdgpu-no-workitem-id-y attribute does not help as it only hints the lack of the workitem-id-y query, but not the absence of the actual 2nd dimension, therefore affecting just the SGPR allocation. Differential Revision: https://reviews.llvm.org/D132879	2022-08-30 12:22:08 -07:00
Joe Nash	3e39ab25e6	[AMDGPU][GFX11] Fix dst register class for V_CVT_U32_U16 This instruction was referring to the wrong VOPProfile, likely due to a typo, leading to an incorrect destination register type. The MC layer will care about this change, but is NFC while 16-bit values actually use 32 bit registers. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132878	2022-08-30 14:01:25 -04:00
Joe Nash	70e7a1257c	[AMDGPU][NFC] Allow separate RC for VOP3 DPP Dst Create a field in VOPProfile called DstRCVOP3DPP to allow the VOP3 versions of DPP instructions to have a different destination register class than the non-VOP3 encoding. NFC for current instructions, but planned to be functional in upcoming ones. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D132673	2022-08-29 11:22:07 -04:00
Kazu Hirata	6ed2cb4ad5	Revert "[llvm] Use llvm::is_contained (NFC)" This reverts commit `ebf574f59a`. This patch seems to cause build failures on Windows.	2022-08-28 18:52:49 -07:00
Kazu Hirata	ebf574f59a	[llvm] Use llvm::is_contained (NFC)	2022-08-28 17:35:03 -07:00
Kazu Hirata	9861a68a7c	[Target] Qualify auto in range-based for loops (NFC)	2022-08-28 10:41:50 -07:00
Kazu Hirata	ce9f007c7c	[llvm] Use llvm::find_if (NFC)	2022-08-28 10:41:48 -07:00
Kazu Hirata	21de2888a4	Use llvm::is_contained (NFC)	2022-08-27 09:53:11 -07:00
Stanislav Mekhanoshin	813ae2871d	[AMDGPU] Detect uniformness of TID / wavefrontsize A value of 'workitemid / wavefrontize' or 'workitemid & (wavefrontize - 1)' is wave uniform. Differential Revision: https://reviews.llvm.org/D132511	2022-08-26 23:26:08 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Simon Pilgrim	3cf48963ff	[AMDGPU] Remove old isCheapToSpeculateCttz FIXME As confirmed on D132520 - this should always return true	2022-08-24 15:53:38 +01:00
Pierre van Houtryve	59cf9dd923	[AMDGPU][GISel] Enable Selection of ADD3 for G_PTR_ADD Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be simplified into a single ADD3 instruction instead of two adds. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D131254	2022-08-24 14:44:19 +00:00
Alex Richardson	38107171ed	[RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI This commit moves the information on whether a register is constant into the Tablegen files to allow generating the implementaiton of isConstantPhysReg(). I've marked isConstantPhysReg() as final in this generated file to ensure that changes are made to tablegen instead of overriding this function, but if that turns out to be too restrictive, we can remove the qualifier. This should be pretty much NFC, but I did notice that e.g. the AMDGPU generated file also includes the LO16/HI16 registers now. The new isConstant flag will also be used by D131958 to ensure that constant registers are marked as call-preserved. Differential Revision: https://reviews.llvm.org/D131962	2022-08-24 14:16:20 +00:00
Jay Foad	1bca81c12e	[AMDGPU] Remove unused S_ADD_U64_CO_PSEUDO and S_SUB_U64_CO_PSEUDO	2022-08-24 10:28:35 +01:00
Raghav	79d2529c10	AMDGPU/MetaData: Restrict address space key to only be emitted for "global_buffer" and "dynamic_shared_pointer" This matches .address_space docs at https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v3 Differential Revision: https://reviews.llvm.org/D132145	2022-08-23 14:01:01 -04:00
Thomas Symalla	5ee0fb7ed2	[NFC][AMDGPU] Some cleanups in the SIOptimizeExecMasking pass. Fix typos and remove an unused argument. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132292	2022-08-23 18:16:47 +02:00
Philip Reames	104fa367ee	[TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC] This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both. This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through. I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.	2022-08-22 15:16:39 -07:00
Simon Pilgrim	5263155d5b	[CostModel] Add CostKind argument to getShuffleCost Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future. Differential Revision: https://reviews.llvm.org/D132287	2022-08-21 10:54:51 +01:00
Kazu Hirata	8b1b0d1d81	Revert "Use std::is_same_v instead of std::is_same (NFC)" This reverts commit `c5da37e42d`. This patch seems to break builds with some versions of MSVC.	2022-08-20 23:00:39 -07:00
Kazu Hirata	c5da37e42d	Use std::is_same_v instead of std::is_same (NFC)	2022-08-20 22:36:26 -07:00
Thomas	e565e2fa5c	[NFC][AMDGPU] Fix typo.	2022-08-20 08:30:42 +02:00
Austin Kerbow	b0f4678b90	[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy Adds a builtin that serves as an optimization hint to apply specific optimized DAG mutations during scheduling. This also disables any other mutations or clustering that may interfere with the desired pipeline. The first optimization strategy that is added here is designed to improve the performance of small gemm kernels on gfx90a. Reviewed By: jrbyrnes Differential Revision: https://reviews.llvm.org/D132079	2022-08-19 15:38:36 -07:00
jeff	20cf170e68	[InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics Certain address space dependent optimizations, like SeperateConstOffsetFromGEP, assume agreement between the address space of the recursive uses and the address space of the def. If this assumption is invalid, then optimizations may or may not be correct depending on properties of an address space for a given target, the address spaces of recursive uses, and the optimization being done. This patch infers the previous address space for flat_atomic ptr arguments. As a result, the address spaces of the uses in flat_atomic cases will agree with the address space in recursive defs. If this results in non-flat address space, then isel may infer a different intrinsic. For example, if the result is a flat_atomic using global address space, then it will be lowered to the corresponding global_atomic intrinsic. Change-Id: Ifcd981709dc2ea94d4acbcb84efe7176593ec8c7	2022-08-19 11:37:20 -07:00
Joe Nash	063ee26ea3	[AMDGPU] Update comment on shrinking dpp. NFC	2022-08-18 11:29:32 -04:00
Jeffrey Byrnes	1c8d7ea973	[AMDGPU] Implement pipeline solver for non-trivial pipelines Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm. Differential Revision: https://reviews.llvm.org/D130797	2022-08-17 16:21:59 -07:00

1 2 3 4 5 ...

7214 Commits