llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	f0ca946bf9	[AMDGPU] New helper function SIInsertWaitcnts::getVmemWaitEventType This just commons up and simplifies some logic that was repeated in SIInsertWaitcnts::updateEventWaitcntAfter. NFCI. Differential Revision: https://reviews.llvm.org/D136253	2022-10-19 16:22:50 +01:00
Joe Nash	ad6698562c	[AMDGPU] V_LDEXP_F16 encoding fix and doc update. The amdgcn.ldexp.* intrinsics take an i32 value as src1. The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore src1 is implicitly truncated to 16 bits when lowering to that instruction from the intrinsic. This is unlikely to result in an error in practice because values that large are not useful. The operand class of src1 in the True16 version of the instruction has been corrected to encode correctly on GFX11. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D136195	2022-10-19 09:52:53 -04:00
Jay Foad	ea09a426a9	[AMDGPU] Assume getDefIgnoringCopies will succeed. NFC. getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on valid MIR, so don't bother to check for failure. Differential Revision: https://reviews.llvm.org/D136238	2022-10-19 11:10:00 +01:00
Juan Manuel MARTINEZ CAAMAÑO	bb24b2c610	[AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVGPRUseVMEMStore Reviewed By: jpages, arsenm Differential Revision: https://reviews.llvm.org/D134641	2022-10-19 04:38:16 -05:00
Kazu Hirata	7820a30a1b	[AMDGPU] Use llvm::any_of (NFC)	2022-10-16 09:19:09 -07:00
Dmitry Preobrazhensky	bf96703fb3	[AMDGPU][MC][GFX8+] Correct v_cndmask modifiers Correct v_cndmask_b32 to support abs/neg modifiers in dpp/sdwa/e64 variants. Correct v_cndmask_b16 for proper disassembly of abs/neg modifiers in e64_dpp variants. Differential Revision: https://reviews.llvm.org/D135900	2022-10-14 19:37:27 +03:00
Leon Clark	6370bc2435	Add f16 nearbyint support. Enable lowering of FNEARBYINT for f16 and extend existing tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135124	2022-10-14 08:05:24 +01:00
Matt Arsenault	838fd611b7	AMDGPU: Fix assertion on <1 x i16> vectors Fixes issue 58331.	2022-10-12 17:25:24 -07:00
Matt Arsenault	575eed3dac	AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs If inline asm has a VGPR def, it must have come from a VGPR write somewhere inside the asm. This should be further extended to all read after write hazards.	2022-10-12 17:25:24 -07:00
Abinav Puthan Purayil	3d9f011a9c	[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later Unfortunately, we have a broken handling of this in the runtime of rocm 5.3. The runtime is expected to handle this correctly when v5 becomes the default. Differential Revision: https://reviews.llvm.org/D134714	2022-10-11 23:28:43 +05:30
Joe Nash	3648fc5b42	[AMDGPU] Make disassembler convertFMAanyK call more generic Make support more generic to support future instructions. Currently NFC. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D135678	2022-10-11 11:22:25 -04:00
Dmitry Preobrazhensky	4e62d02db9	[AMDGPU][MC] Correct image_gather4h Correct encoding of image_gather4h for GFX9; disable this instruction for SI, CI and VI. Differential Revision: https://reviews.llvm.org/D135605	2022-10-11 14:41:27 +03:00
Joe Nash	8a7d4993b7	[AMDGPU] Fix True16 patterns for cmp on GFX11 These patterns should have a True16 version and a non-true16 version. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135609	2022-10-10 16:41:06 -04:00
Joe Nash	ebb258d3b0	[AMDGPU] Make V_SAT_PK_U8_I16 a True16 Instruction The return type is two u8 packed into a 16 bit VGPR, so this instruction should be True16. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D135478	2022-10-10 10:33:49 -04:00
Jessica Paquette	42cb2f8b12	[GlobalISel] Mark mi_match as nodiscard Typically when you match something, you want to check the result. Fix a couple warnings in the AMDGPUPostLegalizerCombiner which appear as a result of this. Differential Revision: https://reviews.llvm.org/D135491	2022-10-07 15:47:05 -07:00
Matt Arsenault	74ef03d38a	AMDGPU: Update SlotIndexes independently of LiveIntervals Apparently StackColoring depends on SlotIndexes, but not LiveIntervals. If regalloc fast were manually requested, LiveIntervals would be dropped before SILowerSGPRSpills but not SlotIndexes. SILowerSGPRSpills preserved SlotIndexes, but only through LiveIntervals. As a result, SILowerSGPRSpills was incorrectly reporting it preserved SlotIndexes. Start updating these directly, instead of depending on LiveIntervals also being available.	2022-10-07 13:15:15 -07:00
Kazu Hirata	7f90597be6	[AMDGPU] Fix a warning This patch fixes: llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp:800:17: error: unused variable 'DST_IDX' [-Werror,-Wunused-variable]	2022-10-07 08:27:02 -07:00
Dmitry Preobrazhensky	8f8e4e3b38	[AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp Differential Revision: https://reviews.llvm.org/D134961	2022-10-07 16:21:55 +03:00
Dmitry Preobrazhensky	1d1c7555e2	[AMDGPU][GFX11][NFC] Refactor VOPD handling in codegen Differential Revision: https://reviews.llvm.org/D135084	2022-10-07 16:13:05 +03:00
Dmitry Preobrazhensky	fd7b0eeaf6	[AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation Differential Revision: https://reviews.llvm.org/D134960	2022-10-07 15:52:59 +03:00
Nikita Popov	3d0b5f019e	[AA] Remove unused template argument from AAResultBase (NFC) After D94363, there is no more need to use CRTP here.	2022-10-06 10:21:17 +02:00
Pierre van Houtryve	bb71079e30	[AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135149	2022-10-06 06:48:53 +00:00
Joe Nash	203d0b0ee1	[AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type. For V_CMP_CLASS_F16_t16_e64 and V_CMPX_CLASS_F16_t16_e64, https://reviews.llvm.org/D133723 changed the value type of src1 from i32 to i16. These src1 operands are 16 bits, therefore need to be encoded as true16 operands. So the _e32 type was correctly set to VGPR_32_Lo128. In _e64 form the operand class went from VSrc_b32 to VSrc_b16. For some reason, we cannot encode inline literals for VSrc_b16, see `5f5f566b26`. In this phase of the true16 implementation, VSrc_b16 and VSrc_b32 are still similar, except from that quirk of inlines. So set the operand class to regain that function. Reviewed By: dp, arsenm Differential Revision: https://reviews.llvm.org/D134897	2022-10-05 11:15:40 -04:00
Juan Manuel MARTINEZ CAAMAÑO	fa2b1cb8c9	[NFC][AMDGPULowerKernelAttributes] Factorize repeated code into function Differential Revision: https://reviews.llvm.org/D135266	2022-10-05 09:26:39 -05:00
Dmitry Preobrazhensky	f4b1cfa1cb	[AMDGPU][MC][GFX11] Correct e64_dpp variants of v_movreld and v_movrelsd Differential Revision: https://reviews.llvm.org/D135079	2022-10-05 16:47:18 +03:00
Johannes Doerfert	477e8e10f0	[Attributor] Teach AAPointerInfo to look into aggregates If we have a constant aggregate, e.g., as an initializer, we usually failed to extract the proper value/type from it. This patch provides the size and offset information necessary to extract the right part of the constant.	2022-10-05 06:19:47 -07:00
Pierre van Houtryve	75b292cb14	[AMDGPU][DAG] Fix insert_vector_elt lowering for 8 bit elements The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135156	2022-10-04 14:48:15 +00:00
Pierre van Houtryve	c93104073c	[AMDGPU] Always lower SHUFFLE_VECTOR Make it illegal, remove InstructionSelector logic for it Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134967	2022-10-04 14:23:17 +00:00
jeff	f4e6149d82	[AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK) If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead. Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945	2022-10-03 12:58:29 -07:00
Amara Emerson	3daf7ddaef	[GlobalISel] Allow prelegalizer combiners to have access to LegalizerInfo. Before, the isPreLegalize() query in CombinerHelper only checked for the presence of a LegalizerInfo object. This is problematic when we want to have a combine actually check for legality in a pre-legalizer combine pass, since if we pass a LegalizerInfo object to the constructor it causes the combines to think that we're running post legalizer, which isn't true. This change fixes it to instead check an explicit bool that passes to signal whether the pass will be run before or after legalization. Doing so exposed a bug in the extending loads combine, which tried to check for legality of candidate extending loads if LegalizerInfo was present. Since we only ran it pre-legalizer and therefore with a null LegalizerInfo, it never actually ran. Also fixes the legality checks to keep the tests passing. Differential Revision: https://reviews.llvm.org/D135044	2022-10-03 07:36:18 +01:00
Carl Ritson	a35013bec6	[AMDGPU][GFX11] Mitigate VALU mask write hazard VALU use of an SGPR (pair) as mask followed by SALU write to the same SGPR can cause incorrect execution of subsequent SALU reads of the SGPR. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D134151	2022-10-01 16:21:24 +09:00
Pierre van Houtryve	d8258508d4	[AMDGPU][GISel] Update `isCanonicalized` Recognize more opcodes in the function. Fixes some regressions introduced in D134857 for fdiv.f16 too. Depends on D134857 Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D134862	2022-09-30 14:13:35 +00:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Ivan Kosarev	a964099ce5	[AMDGPU][SetWavePriority] Fix dealing with MBBInfo records. Happened earlier than I anticipated. :) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134726	2022-09-30 14:27:50 +01:00
Dmitry Preobrazhensky	485c539391	[AMDGPU][MC][GFX11] Disable non-null src0 for s_waitcnt_*cnt Differential Revision: https://reviews.llvm.org/D134809	2022-09-29 19:56:03 +03:00
Abinav Puthan Purayil	3759398b4b	[AMDGPU] Report minimum scratch size in code object v5 and later by default This change sets -amdgpu-assume-{external-call-stack-size \| dynamic-stack-object-size} options to zero by default for code object v5 and later. The runtime is expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit in the kernel descriptor is set. Differential Revision: https://reviews.llvm.org/D128346	2022-09-29 09:52:45 +05:30
Stanislav Mekhanoshin	5a3fe9a039	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-28 13:13:40 -07:00
Nico Weber	90f7f24b20	try to fix build yet more after `16544cbe64`	2022-09-28 15:40:52 -04:00
Baptiste	b556726ccc	[AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary One of the conditions to flush the vmcnt counter in loop preheaders is: The loop contains a use of a vgpr that is defined out of the loop. The code currently checks if a waitcnt is needed by looking at the score of that vgpr in the score brackets. This is not enough and may cause the generation of an unnecessary vmcnt flush. This patch fixes that case. Differential Revision: https://reviews.llvm.org/D130313	2022-09-28 13:05:50 -04:00
Jon Chesterfield	35f2584ef9	[amdgpu] Error, instead of miscompile, anonymous kernels using lds The association between kernel and struct is done by symbol name. This doesn't work robustly for anonymous kernels as shown by the modified test case. An alternative association between function and struct can be constructed if necessary, probably though metadata, but on the basis that we currently miscompile anonymous kernels and that they are difficult to construct from application code and difficult to call from the runtime, this patch makes it a fatal error for now. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134741	2022-09-28 16:30:04 +01:00
Matt Arsenault	7a84624079	AMDGPU: Make various vector undefs legal Surprisingly these were getting legalized to something zero initialized. This fixes an infinite loop when combining some vector types. Also fixes zero initializing some undef values. SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking for the legality of the output undefs they are replacing unused operations with. This resulted in turning vectors into undefs that were later re-legalized back into zero vectors.	2022-09-28 10:48:52 -04:00
Jon Chesterfield	80ba432821	[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic address by allocating it at the same time as the module scope instance. This is relatively likely to be where the instance was allocated anyway (~NFC) but will allow later patches to calculate where a given field can be found, which means a function which is only reachable from a single kernel will be able to access a LDS variable with zero overhead. That will be particularly helpful for applications that instantiate a function template containing LDS variables once per kernel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127052	2022-09-28 14:55:16 +01:00
Carl Ritson	266b5dbc5d	[AMDGPU] Add MIMG NSA threshold configuration attribute Make MIMG NSA minimum addresses threshold an attribute that can be set on a function or configured via command line. This enables frontend tuning which allows increased NSA usage where beneficial. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D134780	2022-09-28 20:03:18 +09:00
Vitaly Buka	20a80d60a8	Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI" Break msan bots. Details in D134666. This reverts commit `0ce96e06ee`.	2022-09-26 22:22:09 -07:00
Stanislav Mekhanoshin	0ce96e06ee	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-26 13:20:24 -07:00
Changpeng Fang	dee4bc4a4e	AMDGPU: Handle new address pattern in LowerKernelAttributes introduced by opaque pointers Summary: With opaque pointer support, the "ptr" type is introduced and thus BitCast is not necessary in some cases. This work takes care of this change, and recognizes the new address patterns to do appropriate optimizations. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D134596	2022-09-26 09:31:52 -07:00
Ruiling Song	bf25a48985	Add override for runOnFunction() Fix build-bot failure.	2022-09-26 10:19:35 +08:00
Ruiling Song	cf14c7caac	AMDGPU: Add a pass to rewrite certain undef in PHI For the pattern of IR (%if terminates with a divergent branch.), divergence analysis will report %phi as uniform to help optimal code generation. ``` %if \| \ \| %then \| / %endif: %phi = phi [ %uniform, %if ], [ %undef, %then ] ``` In the backend, %phi and %uniform will be assigned a scalar register. But the %undef from %then will make the scalar register dead in %then. This will likely cause the register being over-written in %then. To fix the issue, we will rewrite %undef as %uniform. For details, please refer the comment in AMDGPURewriteUndefForPHI.cpp. Currently there is no test changes shown, but this is mandatory for later changes. Reviewed by: sameerds Differential Revision: https://reviews.llvm.org/D133840	2022-09-26 09:54:47 +08:00
James Y Knight	a8c59bcc01	[AMDGPU] Fix useDeprecatedPositionallyEncodedOperands errors in R600. This is a follow-on to https://reviews.llvm.org/D134073. It renames a couple of fields to match their operands, as well as introducing sub-operand names where required. This change _only_ fixes the 'R600' half of the target, not the 'AMDGPU' half. Fixing the AMDGPU half will be a significantly more difficult change (which I've not yet attempted.) Differential Revision: https://reviews.llvm.org/D134078	2022-09-25 17:55:09 -04:00
Petar Avramovic	dcc756d03e	[AMDGPU] Pattern for flat atomic fadd f64 intrinsic with local addr Fix regression from clang opencl test in builtins-fp-atomics-gfx90a.cl test_flat_add_local_f64 caused by D130579 Revert `a3becb333d`. Differential Revision: https://reviews.llvm.org/D134568	2022-09-25 13:25:41 +02:00

1 2 3 4 5 ...

7293 Commits