llvm-project

Commit Graph

Author	SHA1	Message	Date
Brendon Cahoon	f9f5d41545	[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX Adds legalizer, register bank select, and instruction select support for G_SBFX and G_UBFX. These opcodes generate scalar or vector ALU bitfield extract instructions for AMDGPU. The instructions allow both constant or register values for the offset and width operands. The 32-bit scalar version is expanded to a sequence that combines the offset and width into a single register. There are no 64-bit vgpr bitfield extract instructions, so the operations are expanded to a sequence of instructions that implement the operation. If the width is a constant, then the 32-bit bitfield extract instructions are used. Moved the AArch64 specific code for creating G_SBFX to CombinerHelper.cpp so that it can be used by other targets. Only bitfield extracts with constant offset and width values are handled currently. Differential Revision: https://reviews.llvm.org/D100149	2021-06-28 09:06:44 -04:00
Sander de Smalen	c9acd2f32e	[GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104453	2021-06-25 15:54:00 +01:00
Sander de Smalen	968980ef08	[GlobalISel] NFC: Change LLT::scalarOrVector to take ElementCount. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104452	2021-06-25 11:26:16 +01:00
Sander de Smalen	d5e14ba88c	[GlobalISel] NFC: Change LLT::vector to take ElementCount. This also adds new interfaces for the fixed- and scalable case: * LLT::fixed_vector * LLT::scalable_vector The strategy for migrating to the new interfaces was as follows: * If the new LLT is a (modified) clone of another LLT, taking the same number of elements, then use LLT::vector(OtherTy.getElementCount()) or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2) or operator. That is because there is no reason to specifically restrict the types to 'fixed_vector'. If the algorithm works on the number of elements (as unsigned), then just use fixed_vector. This will need to be fixed up in the future when modifying the algorithm to also work for scalable vectors, and will need then need additional tests to confirm the behaviour works the same for scalable vectors. * If the test used the '/Scalable=/true` flag of LLT::vector, then this is replaced by LLT::scalable_vector. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D104451	2021-06-24 11:26:12 +01:00
Jay Foad	dfb8c08739	[AMDGPU] Stop using LegacyLegalizerInfo. NFCI. Differential Revision: https://reviews.llvm.org/D103684	2021-06-23 10:50:32 +01:00
Fangrui Song	521d373274	Fix -Wunused-variable and -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC	2021-06-20 11:09:07 -07:00
Michael Liao	940efa4f69	[amdgpu] Improve the from f32 to i64. - Take the same principle as the conversion from f64 to i64 with extra necessary pre- and post-processing. It helps to reduce that conversion sequence by half compared to legacy one. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D104427	2021-06-19 12:46:48 -04:00
Matt Arsenault	6dd54dada3	AMDGPU/GlobalISel: Fix indentation	2021-06-11 13:45:25 -04:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Mirko Brkusanin	35ef4c940b	[AMDGPU][GlobalISel] Legalize G_ABS Legalize and select G_ABS so that we can use llvm.abs intrinsic Differential Revision: https://reviews.llvm.org/D102391	2021-06-04 14:46:43 +02:00
Daniel Sanders	aaac268285	[globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one It's still in use in a few places so we can't delete it yet but there's not many at this point. Differential Revision: https://reviews.llvm.org/D103352	2021-06-01 13:23:48 -07:00
Matt Arsenault	e892705d74	GlobalISel: Do not change register types in lowerLoad Adjusting the load register type is a widenScalar type action, not a lowering. lowerLoad should be reserved for operations that change the memory access size, such as unaligned load decomposition. With this trying to adjust the register type, it was hard to avoid infinite loops in the legalizer. Adds a bandaid to avoid regressing a few AArch64 tests, but I'm not sure what the exact condition is and there's probably a cleaner way to do this. For AMDGPU this regresses handling of some cases for unaligned loads, but the way this is currently working is a pretty ugly hack.	2021-05-27 11:49:37 -04:00
Matt Arsenault	ee35900089	AMDGPU/GlobalISel: Lower constant-32-bit zextload/sextload consistently We were accidentally leaning on code in lowerLoad which expands extending loads which should be removed.	2021-05-27 09:49:13 -04:00
Matt Arsenault	8a203ac6d2	AMDGPU/GlobalISel: Remove redundant parameter from function	2021-05-27 09:49:13 -04:00
Christudasan Devadasan	90d784053f	AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100726	2021-05-25 10:51:07 +05:30
Stanislav Mekhanoshin	748db5bfac	[AMDGPU] Fix module LDS selection Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address. Differential Revision: https://reviews.llvm.org/D102882	2021-05-20 15:59:01 -07:00
David Stuttard	31b62aa162	[AMDGPU] Fix codegen of image intrinsics for g16 and a16 For gfx10 gradient (g16) and address (a16) can be independent. Previous implementation assumed that a16 implied g16. There are some other changes that fix the verification (as well as asm/disasm) that are required for the included test to pass - the XFAIL will be removed in those changes. This also includes required fixes for GlobalISel Differential Revision: https://reviews.llvm.org/D102066 Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09	2021-05-14 09:28:15 +01:00
Matt Arsenault	1cb8a9d595	AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers	2021-04-20 11:13:29 -04:00
Christudasan Devadasan	97618522dc	[AMDGPU] Remove dead dcode (NFC).	2021-04-16 23:03:31 +05:30
Jay Foad	5d0e9ddfa5	[AMDGPU][GlobalISel] Add support for global atomicrmw fadd This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously implemented for selecting from the llvm.amdgcn.global.atomic.fadd intrinsic. Differential Revision: https://reviews.llvm.org/D97767	2021-03-31 11:13:00 +01:00
Konstantin Zhuravlyov	f4ace63737	AMDGPU: Add target id and code object v4 support - Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names Differential Revision: https://reviews.llvm.org/D95638	2021-03-24 11:54:05 -04:00
Matt Arsenault	b24436ac96	GlobalISel: Lower funnel shifts	2021-03-23 09:11:17 -04:00
Pushpinder Singh	d0e5422eb8	[GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO Reviewed By: foad Differential Revision: https://reviews.llvm.org/D93963	2021-03-23 05:45:43 +00:00
Jay Foad	3ad5216ed8	[AMDGPU] Better codegen for i64 bitreverse Differential Revision: https://reviews.llvm.org/D97547	2021-02-26 15:51:36 +00:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Mirko Brkusanin	4b422708ba	[AMDGPU][GlobalISel] Handle G_PTR_ADD when looking for constant offset Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant offset for buffer stores. This also helps with merging of these instructions later on. Differential Revision: https://reviews.llvm.org/D95242	2021-01-28 11:20:09 +01:00
Matt Arsenault	2a0db8d70e	AMDGPU: Use more accurate fast f64 fdiv A raw v_rcp_f64 isn't accurate enough, so start applying correction.	2021-01-21 10:51:36 -05:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Matt Arsenault	ab3a3f543b	AMDGPU/GlobalISel: Update fdiv lowering for denormal/ulp interaction Change the GlobalISel fast fdiv handling to match the changes in `2531535984` and `884acbb9e1`	2021-01-06 12:32:01 -05:00
Matt Arsenault	581d13f8ae	GlobalISel: Return APInt from getConstantVRegVal Returning int64_t was arbitrarily limiting for wide integer types, and the functions should handle the full generality of the IR. Also changes the full form which returns the originally defined vreg. Add another wrapper for the common case of just immediately converting to int64_t (arguably this would be useful for the full return value case as well). One possible issue with this change is some of the existing uses did break without conversion to getConstantVRegSExtVal, and it's possible some without adequate test coverage are now broken.	2020-12-22 22:23:58 -05:00
Stanislav Mekhanoshin	d15119a02d	[AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670	2020-12-22 16:33:06 -08:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
Sebastian Neubauer	72ccec1bbc	[AMDGPU] Fix v3f16 interaction with image store workaround In some cases, the wrong amount of registers was reserved. Also enable more v3f16 tests. Differential Revision: https://reviews.llvm.org/D90847	2020-11-18 18:21:04 +01:00
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit `8b08fa0103`. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Jay Foad	0ad4d04002	[AMDGPU] Remove an unused return value. NFC. Differential Revision: https://reviews.llvm.org/D91063	2020-11-10 09:15:14 +00:00
Carl Ritson	be2afbd019	[AMDGPU] Remove fix up operand from SI_ELSE Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instructions during pre-RA optimisation. This facilitates passes (i.e. SIWholeQuadMode) adding exec mask manipulation post control flow lowering, and pre control flow lower passes do not need to be aware of SI_ELSE handling. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D89644	2020-10-20 19:15:21 +09:00
Fangrui Song	2c4c2dc2d9	[MCRegister] Simplify isStackSlot & isPhysicalRegister and delete isPhysical. NFC	2020-10-08 22:08:33 -07:00
Rodrigo Dominguez	f71f5f39f6	[AMDGPU] Implement hardware bug workaround for image instructions Summary: This implements a workaround for a hardware bug in gfx8 and gfx9, where register usage is not estimated correctly for image_store and image_gather4 instructions when D16 is used. Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81172	2020-10-07 07:39:52 -04:00
Sebastian Neubauer	9fc535f987	[AMDGPU] Fix gcc warnings uint8_t types are implicitly promoted to int, leading to a unsigned-signed comparison. Thanks for the heads-up @uabelho. Differential Revision: https://reviews.llvm.org/D88876	2020-10-06 10:55:08 +02:00
Sebastian Neubauer	6a089ce0e4	[AMDGPU] Use tablegen for argument indices Use tablegen generic tables to get the index of image intrinsic arguments. Before, the computation of which image intrinsic argument is at which index was scattered in a few places, tablegen, the SDag instruction selection and GlobalISel. This patch changes that, so only tablegen contains code to compute indices and the ImageDimIntrinsicInfo table provides these information. Differential Revision: https://reviews.llvm.org/D86270	2020-10-05 11:50:52 +02:00
Mirko Brkusanin	8b08fa0103	Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access" This reverts commit `f5cd7ec9f3`. Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.	2020-09-29 15:33:34 +02:00
Stanislav Mekhanoshin	27a62f6317	[AMDGPU] global-isel support for RT Differential Revision: https://reviews.llvm.org/D87847	2020-09-24 10:29:45 -07:00
Pushpinder Singh	41d6669f1f	[GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D85653	2020-09-23 22:25:29 -04:00
Jay Foad	4bdab2e86a	[AMDGPU] Fix offset for REL32_HI relocs The addend in a REL32 reloc needs to be adjusted to account for the offset from the PC value returned by the s_getpc instruction to the point where the reloc is applied. This was being done correctly for (GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a difference if the target symbol happens to get loaded almost exactly a multiple of 4G away from the relocated instructions. Differential Revision: https://reviews.llvm.org/D86938	2020-09-02 10:55:55 +01:00
Matt Arsenault	21ccedc24f	AMDGPU/GlobalISel: Tolerate negated control flow intrinsic outputs If the condition output is negated, swap the branch targets. This is similar to what SelectionDAG does for when SelectionDAGBuilder decides to invert the condition and swap the branches. This is leaving behind a dead constant def for some reason.	2020-08-26 08:58:54 -04:00
Matt Arsenault	0d2fe90063	AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge Most notably, we were incorrectly reporting <3 x s16> as a legal type for these. Make sure these aren't legal to help make progress on fixing the artifact combiner and vector legalizer rules. Unfortunately, this means spreading the -global-isel-abort=0 hack, although this doesn't change the legalizer result in any situation.	2020-08-25 09:40:20 -04:00
Matt Arsenault	ef8f3b5a78	AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors The selection patterns will currently fail on these.	2020-08-25 09:37:41 -04:00
Matt Arsenault	62d1fb828f	AMDGPU/GlobalISel: Use unmerge instead of extract in addrspace queries This is a bit more consistent with regular operation legalization.	2020-08-24 11:07:51 -04:00

1 2 3 4 5 ...

473 Commits