llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Craig Topper	454256ef4f	[AMDGPU] Correct the known bits calculation for MUL_I24. I'm not entirely sure, but based on how ComputeNumSignBits handles ISD::MUL, I believe this code was miscounting the number of sign bits. As an example of an incorrect result let's say that countMinSignBits returned 1 for the left hand side and 24 for the right hand side. LHSValBits would be 23 and RHSValBits would be 0 and the sum would be 23. This would cause the code to set 9 high bits as zero/one. Now suppose the real values for the left side is 0x800000 and the right hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits not 9. The number of valid bits for the left and right operands is now the number of non-sign bits + 1. If the sum of the valid bits of the left and right sides exceeds 32, then the result may overflow and we can't say anything about the sign of the result. If the sum is 32 or less then it won't overflow and we know the result has at least 1 sign bit. For the previous example, the code will now calculate the left side valid bits as 24 and the right side as 1. The sum will be 25 and the sign bits will be 32 - 25 + 1 which is 8, the correct value. Differential Revision: https://reviews.llvm.org/D116469	2022-01-14 08:54:54 -08:00
Jay Foad	ff971873b3	[GlobalISel] Fix legality checks for G_UBFX combines 1. Fix CombinerHelper::matchBitfieldExtractFromAnd to check legality with the correct types for the G_UBFX that it builds. 2. Fix AMDGPUTargetLowering::isConstantUnsignedBitfieldExtractLegal to match the legality rules: result and first operand can be s32 or s64 but the "shift amount" operands are always s32. 3. Add AMDGPU tests where the post-legalizer combiner would create illegal MIR without the above fixes. Differential Revision: https://reviews.llvm.org/D116802	2022-01-08 09:20:44 +00:00
Jay Foad	3f3fe4a5cf	[GlobalISel] Fix typo Extact to Extract in function name. NFC.	2022-01-07 11:13:35 +00:00
Craig Topper	cbcbbd6ac8	[ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC This function returns an upper bound on the number of bits needed to represent the signed value. Use "Max" to match similar functions in KnownBits like countMaxActiveBits. Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old name around to keep this patch size down. Will do a bulk rename as follow up. Rename KnownBits::countMaxSignedBits->countMaxSignificantBits. Reviewed By: lebedev.ri, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D116522	2022-01-03 11:33:30 -08:00
Ron Lieberman	09b53296cf	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit `9075009d1f`. Failed amdgpu runtime buildbot # 3514	2021-12-22 11:39:28 -05:00
RamNalamothu	9075009d1f	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652	2021-12-22 20:51:12 +05:30
Jay Foad	d7e03df719	[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 Select SelectionDAG ops smul_lohi/umul_lohi to v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0. v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions so it is better to use one instruction than two. Further improvements are possible to make better use of the addend operand, but this is already a strict improvement over what we have now. Differential Revision: https://reviews.llvm.org/D113986	2021-11-24 11:25:02 +00:00
Kazu Hirata	efa896e5f7	[Target] Use SDNode::uses (NFC)	2021-11-12 21:23:04 -08:00
Jay Foad	a70bbb5f7a	[AMDGPU] Simplify 64-bit division/remainder expansion The old expansion open-coded a 64-bit addition in a strange way, by adding the high parts without carry-in from the low part, and then adding the carry back in later on. Fixing this saves a couple of instructions and makes the code much easier to understand. Differential Revision: https://reviews.llvm.org/D113679	2021-11-12 15:48:41 +00:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
Thomas Symalla	76cbe62262	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637	2021-11-04 21:50:18 +01:00
Jay Foad	21a1d4cf71	[AMDGPU] Change numBitsSigned for simplicity and document it. NFC. Change numBitsSigned to return the minimum size of a signed integer that can hold the value. This is different by one from the previous result but is more consistent with numBitsUnsigned. Update all callers. All callers are now more consistent between the signed and unsigned cases, and some callers get simpler, especially the ones that deal with quantities like numBitsSigned(LHS) + numBitsSigned(RHS). Differential Revision: https://reviews.llvm.org/D112813	2021-10-29 14:22:06 +01:00
Kazu Hirata	01b4789b62	[AMDGPU] Remove hasDefinedInitializer (NFC) The last use was removed on Sep 16, 2021 in commit `7a62a5b56d`.	2021-10-28 20:33:34 -07:00
Abinav Puthan Purayil	2da6ef3664	[AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine. mul24 intrinsic's operands are simplified by AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds the mul24hi intrinsics in the combine since its operands can be simplified like that of the mul24 intrinsics. Differential Revision: https://reviews.llvm.org/D112702	2021-10-28 16:57:48 +05:30
Roman Lebedev	684cbae89a	[KnownBits] Introduce `countMaxActiveBits()` and use it in a few places	2021-10-11 23:36:06 +03:00
Christudasan Devadasan	7a62a5b56d	[AMDGPU] Legalize initialized LDS variables We don't allow an initializer for LDS variables and there is an early abort during instruction selection. This patch legalizes them by ignoring the init values. During assembly emission, proper error reporting already exists for such instances. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109901	2021-09-23 22:53:20 -04:00
Jacob Lambert	dc6e8dfdfe	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.	2021-09-20 14:48:50 -07:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Michael Liao	05783e1cfe	[amdgpu] Revise the conversion from i64 to f32. - Replace 'cmp+sel' with 'umin' if possible. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107507	2021-08-06 17:01:47 -04:00
Jay Foad	2b63933115	[AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107566	2021-08-05 15:57:40 +01:00
Jay Foad	e6c364a624	[AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107546	2021-08-05 15:57:40 +01:00
Jay Foad	e790b2b744	[AMDGPU] Make more use of getHiHalf64 and split64BitValue. NFCI.	2021-08-05 09:36:13 +01:00
Michael Liao	5edc886e90	[amdgpu] Add an enhanced conversion from i64 to f32. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D107187	2021-08-04 15:33:12 -04:00
David Stuttard	b8173c3178	[AMDGPU] Stop mulhi from doing 24 bit mul for uniform values Added support to check if architecture supports s_mulhi which is used as part of the decision whether or not to use valu 24 bit mul (if the mulhi gets transformed to a valu op anyway, then may as well use it). This is an extension of the work in D97063 Differential Revision: https://reviews.llvm.org/D103321 Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14	2021-07-05 10:33:23 +01:00
Brendon Cahoon	f9f5d41545	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
Carl Ritson	98f48723f2	[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D104622	2021-06-24 12:41:22 +09:00
Matt Arsenault	a7786badb7	AMDGPU: Move zeroed FP high bits optimization to patterns	2021-06-22 12:47:56 -04:00
Michael Liao	940efa4f69	[amdgpu] Improve the from f32 to i64. - Take the same principle as the conversion from f64 to i64 with extra necessary pre- and post-processing. It helps to reduce that conversion sequence by half compared to legacy one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D104427	2021-06-19 12:46:48 -04:00
Matt Arsenault	d6467e00df	AMDGPU: Fix infinite loop in DAG combine with fneg + fma We were not reporting isFNegFree for v2f32, although it is effectively free after legalization. The generic combine was pulling fneg out of the fma source operands, and the AMDGPU combine was doing the opposite.	2021-06-18 19:09:03 -04:00
Julien Pagès	37821155c9	[AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16 In this particular example, we had a crash when compiling it for several architectures. This patch extends the legalization of extract_subvector to avoid this problem. Differential Revision: https://reviews.llvm.org/D103344	2021-06-03 16:40:18 -04:00
Sebastian Neubauer	690f5b7a01	[AMDGPU] Fix function calls with flat scratch When flat scratch is used, the stack pointer needs to be added when writing arguments to the stack. For buffer instructions, this is done in SelectMUBUFScratchOffen and SelectMUBUFScratchOffset. Move that to call argument lowering, like it is done in GlobalISel. Differential Revision: https://reviews.llvm.org/D103166	2021-05-28 11:22:13 +02:00
Stanislav Mekhanoshin	748db5bfac	[AMDGPU] Fix module LDS selection Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address. Differential Revision: https://reviews.llvm.org/D102882	2021-05-20 15:59:01 -07:00
Julien Pagès	a1ed39df96	[AMDGPU] Select V_CVT_*16_F16 more often Improve the code generation of fp_to_sint and fp_to_uint for integer on 16-bits. Differential Revision: https://reviews.llvm.org/D101481 Patch by Julien Pagès!	2021-05-05 08:57:51 +01:00
Nicolai Hähnle	52bc2e7577	[AMDGPU][SelectionDAG] Don't combine uniform multiplies to MUL_[UI]24 Prefer to keep uniform (non-divergent) multiplies on the scalar ALU when possible. This significantly improves some game cases by eliminating v_readfirstlane instructions when the result feeds into a scalar operation, like the address calculation for a scalar load or store. Since isDivergent is only an approximation of whether a value is in SGPRs, it can potentially regress some situations where a uniform value ends up in a VGPR. These should be rare in real code, although the test changes do contain a number of examples. Most of the test changes are just using s_mul instead of v_mul/mad which is generally better for both register pressure and latency (at least on GFX10 where sgpr pressure doesn't affect occupancy and vector ALU instructions have significantly longer latency than scalar ALU). Some R600 tests now use MULLO_INT instead of MUL_UINT24. GlobalISel appears to handle more scenarios in the desirable way, although it can also be thrown off and fails to select the 24-bit multiplies in some cases. Alternative solution considered and rejected was to allow selecting MUL_[UI]24 to S_MUL_I32. I've rejected this because the definition of those SD operations works is don't-care on the most significant 8 bits, and this fact is used in some combines via SimplifyDemandedBits. Based on a patch by Nicolai Hähnle. Differential Revision: https://reviews.llvm.org/D97063	2021-02-23 15:39:19 +00:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Jay Foad	c8be7e96bb	[AMDGPU] Rename simplifyI24 to simplifyMul24 Also simplify one of its call sites. NFC.	2021-02-17 11:33:49 +00:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
Sebastian Neubauer	a022b1ccd8	[AMDGPU] Add amdgpu_gfx calling convention Add a calling convention called amdgpu_gfx for real function calls within graphics shaders. For the moment, this uses the same calling convention as other calls in amdgpu, with registers excluded for return address, stack pointer and stack buffer descriptor. Differential Revision: https://reviews.llvm.org/D88540	2020-11-09 16:51:44 +01:00
Christudasan Devadasan	d6aa4aa29a	[AMDGPU] Some refactoring after D90404. NFC.	2020-11-01 13:18:53 +05:30
Piotr Sobczak	7ae0033ca8	[AMDGPU] Fix expansion of i16 MULH This commit marks i16 MULH as expand in AMDGPU backend, which is necessary after the refactoring in D80485. Differential Revision: https://reviews.llvm.org/D89965	2020-10-22 17:05:06 +02:00
Jay Foad	56f6bf1a8d	[AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24 These were introduced in r279902 on the grounds that using separate MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple uses of the operands, which would prevent SimplifyDemandedBits from simplifying the operands. This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations" No functional change intended. At least it has no effect on lit tests. Differential Revision: https://reviews.llvm.org/D89706	2020-10-19 19:15:34 +01:00
Jay Foad	1417abe54c	[AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic Differential Revision: https://reviews.llvm.org/D89558	2020-10-16 17:10:21 +01:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Jay Foad	0e0a0c8d2c	[AMDGPU] Reformat AMDGPUTargetLowering::isSDNodeAlwaysUniform. NFC.	2020-09-28 16:24:16 +01:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Ruiling Song	18b1e67523	[AMDGPU] Fix crash when dag-combining bitcast From the code after the 'break', they are processing 64bit scalar and vector bitcast. So I think the break-condition should be (cond1 \|\| cond2) This means we only execute following code if (64bit and dest-is-vector). Also remove a previous fix which is not needed with this new fix. (introduced in: `1349a04ef5`) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85804	2020-08-13 10:23:13 +08:00

1 2 3 4 5 ...

459 Commits