llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	8d7d89b081	[AMDGPU] Add alias.scope metadata to lowered LDS struct Alias analysis is unable to disambiguate accesses to the structure fields without it unlike distinct variables. As a result we cannot combine ds_read and ds_write operations in a case of any store in between which always considered clobbering. Differential Revision: https://reviews.llvm.org/D108315	2021-08-19 11:40:30 -07:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
Joe Nash	9dbc968ed9	[AMDGPU] Fix atomic float max/min intrinsics Hooked up raw.buffer.atomic.fmin/max.f64 This instruction should be available on GFX6, GFX7, and GFX10. It was implemented for GFX90a with a different name. Added intrinsic def for image_atomic_fmin/fmax; the instruction defs were already there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D108208 Change-Id: I473f98d28b2afbeeb2c27822d9686b5e86634e2f	2021-08-18 14:12:42 -04:00
Petr Hosek	2d4470ab89	Revert "Allow rematerialization of virtual reg uses" This reverts commit `877572cc19` which introduced PR51516.	2021-08-18 00:12:41 -07:00
Simon Pilgrim	fb81271e8b	[AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32 As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code. Differential Revision: https://reviews.llvm.org/D108210	2021-08-17 18:09:57 +01:00
Sebastian Neubauer	fbae34635d	[GlobalISel] Add combine for PTR_ADD with regbanks Combine two G_PTR_ADDs, but keep the register bank of the constant. That way, the combine can be used in post-regbank-select combines. Introduce two helper methods in CombinerHelper, getRegBank and setRegBank that get and set an optional register bank to a register. That way, they can be used before and after register bank selection. Differential Revision: https://reviews.llvm.org/D103326	2021-08-17 13:58:16 +02:00
Christudasan Devadasan	686607676f	[AMDGPU] Skip pseudo MIs in hazard recognizer Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE are only placeholders to prevent certain unwanted transformations and will get discarded during assembly emission. They should not be counted during nop insertion. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108022	2021-08-16 23:11:14 -04:00
Anshil Gandhi	f22ba51873	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-16 14:56:01 -06:00
Stanislav Mekhanoshin	877572cc19	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-16 12:42:42 -07:00
Stanislav Mekhanoshin	b9e433b02a	Prevent machine licm if remattable with a vreg use Check if a remateralizable nstruction does not have any virtual register uses. Even though rematerializable RA might not actually rematerialize it in this scenario. In that case we do not want to hoist such instruction out of the loop in a believe RA will sink it back if needed. This already has impact on AMDGPU target which does not check for this condition in its isTriviallyReMaterializable implementation and have instructions with virtual register uses enabled. The other targets are not impacted at this point although will be when D106408 lands. Differential Revision: https://reviews.llvm.org/D107677	2021-08-16 12:09:00 -07:00
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
Anshil Gandhi	29e11a1aa3	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `c4e5425aa5`.	2021-08-13 23:58:04 -06:00
Anshil Gandhi	c4e5425aa5	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-13 22:44:08 -06:00
Matt Arsenault	a77ae4aa6a	AMDGPU: Stop attributor adding attributes to intrinsic declarations	2021-08-13 20:51:48 -04:00
Matt Arsenault	152ceec1ae	AMDGPU: Add indirect and extern calls to attributor test	2021-08-13 20:45:53 -04:00
Matt Arsenault	5beb9a0e6a	AMDGPU: Respect compute ABI attributes with unknown OS Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL, so we still have the awkward unknown OS case to deal with. Previously if the HSA ABI intrinsics appeared, we we would not add the ABI registers to the function. We would emit an error later, but we still need to produce some compile result. Start adding the registers to any compute function, regardless of the OS. This keeps the internal state more consistent, and will help avoid numerous test crashes in a future patch which starts assuming the ABI inputs are present on functions by default.	2021-08-13 20:44:46 -04:00
Ruiling Song	e1beebbac5	SplitKit: Don't further split subrange mask in buildCopy We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829	2021-08-13 07:36:38 +08:00
Johannes Doerfert	a420f80bf1	[Attributor] Do not delete volatile stores to null/undef See D106309. Differential Revision: https://reviews.llvm.org/D107906	2021-08-12 10:39:52 -05:00
Matt Arsenault	d719f1c3cc	AMDGPU: Add alloc priority to global ranges The requested register class priorities weren't respected globally. Not sure why this is a target option, and not just the expected behavior (recently added in `1a6dc92be7`). This avoids an allocation failure when many wide tuple spills are introduced. I think this is a workaround since I would not expect the allocation priority to be required, and only a performance hint. The allocator should be smarter about when only a subregister needs to be spilled and restored. This does regress a couple of degenerate store stress lit tests which shouldn't be too important.	2021-08-10 13:12:34 -04:00
Matt Arsenault	d84c4e3857	AMDGPU: Add baseline register allocation failure test	2021-08-10 13:12:34 -04:00
Konstantin Schwarz	64bef13f08	[GlobalISel] Look through truncs and extends in narrowScalarShift If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be shifted individually by the constant shift amount. However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization code was used, producing intermediate shifts that are potentially illegal on some targets. This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D89100	2021-08-10 13:49:22 +02:00
Tony Tye	53eb469195	[AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer C++20 no longer requires the failure memory ordering to be no stronger than the success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge instruction memory orderings Add common operation to merge memory orders that allows non strict memory orderings to be combined. Use it in SIMemoryLegalizer and MachineMemOperand::getMergedOrdering. Reviewed By: efriedma, rampitec Differential Revision: https://reviews.llvm.org/D106729	2021-08-10 08:43:03 +00:00
Stanislav Mekhanoshin	1962b33d3f	[AMDGPU] Added test for MachineLICM reg pressure. NFC. The test shows excessive register pressure after the MachineLICM. This is a pre-commit for the patch fixing it. Differential Revision:	2021-08-06 16:00:35 -07:00
Michael Liao	05783e1cfe	[amdgpu] Revise the conversion from i64 to f32. - Replace 'cmp+sel' with 'umin' if possible. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107507	2021-08-06 17:01:47 -04:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Reshabh Sharma	5173854f19	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-06 15:53:33 +05:30
Jay Foad	83610d4eb0	[AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107474	2021-08-06 09:40:48 +01:00
Jay Foad	24b67a9024	[AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef We can improve on the generic splitting by using ffbh/ffbl, which have a defined result when the input is zero. Differential Revision: https://reviews.llvm.org/D107442	2021-08-06 09:40:48 +01:00
Jay Foad	d77b43c385	[AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32 This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These instructions have a defined result of -1 when the input is zero. Differential Revision: https://reviews.llvm.org/D107441	2021-08-06 09:40:48 +01:00
Jay Foad	cd2594e1c6	[GlobalISel] Improve legalization of narrow CTTZ Differential Revision: https://reviews.llvm.org/D107457	2021-08-06 09:40:48 +01:00
Amara Emerson	1577c41090	[GlobalISel] Allow the ArtifactValueFinder to return the best available register on failure. In some cases, like with inserts, we may have a matching size register already, but still decide to try to look further. This change adds a CurrentBest register to the value finder state, and any time a method fails to make progress, returns that register (which may just be an empty Register). To facilitate this, add a new entry point to the findValueFromDef() function which initializes this state. Also fix the build vector finder to return the current build_vector if all sources are being requested. Differential Revision: https://reviews.llvm.org/D107017	2021-08-05 17:37:30 -07:00
Stanislav Mekhanoshin	d71924fbfe	[AMDGPU] Improve v2i32/v2f32 insertelt patterns Using REG_SEQUENCE produces better code than INSERT_SUBREG, we can omit one move instruction in many cases. Fixes: SWDEV-298028 Differential Revision: https://reviews.llvm.org/D107602	2021-08-05 16:13:39 -07:00
Stanislav Mekhanoshin	42b9c2a17a	[AMDGPU] add v2i32 and v2f32 insert_vector_elt tests. NFC.	2021-08-05 14:28:32 -07:00
Craig Topper	f7076cfd3a	[DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding. We don't have real demanded bits support for MULHU, but we can still use the known bits based constant folding support at the end of SimplifyDemandedBits to simplify a MULHU. This helps with cases where we know the LHS and RHS have enough leading zeros so that the high multiply result is always 0. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106471	2021-08-05 08:31:26 -07:00
Jay Foad	2b63933115	[AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107566	2021-08-05 15:57:40 +01:00
Jay Foad	e6c364a624	[AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107546	2021-08-05 15:57:40 +01:00
Petar Avramovic	66de26b1f9	GlobalISel: Fix matchEqualDefs for instructions with multiple defs Instructions that produceSameValue produce same values for operands with same index. matchEqualDefs used to return true for any two values from different instructions that produce same values. Fix this by checking if values are defined by operands with the same index. Differential Revision: https://reviews.llvm.org/D107362	2021-08-05 15:05:45 +02:00
Jay Foad	7217b01481	[AMDGPU] Add globalisel checks for ctlz_zero_undef/cttz_zero_undef	2021-08-05 13:47:54 +01:00
Dominik Montada	cc947e29ea	[GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107330	2021-08-05 13:52:10 +02:00
Jay Foad	9bd78932c7	[AMDGPU] Generate checks for ctlz_zero_undef/cttz_zero_undef	2021-08-05 10:38:06 +01:00
Michael Liao	5edc886e90	[amdgpu] Add an enhanced conversion from i64 to f32. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D107187	2021-08-04 15:33:12 -04:00
Craig Topper	c23405174a	[DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS. This allows special constants like to 0 to be recognized. It's also expected by isel patterns if a target had a mulh with immediate instructions. The commuting done by tablegen won't commute patterns with immediates since it expects DAGCombine to have done it. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107486	2021-08-04 11:39:23 -07:00
Reshabh Sharma	dce35ef104	Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list" This reverts commit `d42e70b3d3`.	2021-08-04 23:33:31 +05:30
Jay Foad	ba5c4ac600	[AMDGPU] Add cttz tests and globalisel checks for ctlz	2021-08-04 15:57:14 +01:00
Jay Foad	027d3b747e	[AMDGPU] Generate checks for i64 to fp conversions Differential Revision: https://reviews.llvm.org/D107429	2021-08-04 15:39:46 +01:00
Reshabh Sharma	d42e70b3d3	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-04 19:53:33 +05:30
hsmahesha	596e61c332	[AMDGPU] Ignore call graph node which does not have function info. While collecting reachable callees (from kernels), ignore call graph node which does not have associated function or associated function is not a definition. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D107329	2021-08-04 10:22:33 +05:30
Jay Foad	40202b13b2	[AMDGPU] Legalize operands of V_ADDC_U32_e32 and friends These instructions have an implicit use of vcc which counts towards the constant bus limit. Pre gfx10 this means that the explicit operands cannot be sgprs. Use the custom inserter hook to call legalizeOperands to enforce that restriction. Fixes https://bugs.llvm.org/show_bug.cgi?id=51217 Differential Revision: https://reviews.llvm.org/D106868	2021-08-03 09:04:52 +01:00
Carl Ritson	675c942373	[AMDGPU] Disable NSA for BVH instructions when appropriate Check maximum NSA size when selecting NSA or non-NSA BVH instructions. Differential Revision: https://reviews.llvm.org/D103230	2021-08-02 20:09:26 +09:00

1 2 3 4 5 ...

4784 Commits