llvm-project

Commit Graph

Author	SHA1	Message	Date
Abinav Puthan Purayil	17a81ecf85	[AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection This change replaces the C++ predicates with the HasNoUse builtin predicate that would enable the no-ret atomic op selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D125213	2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil	7504c7a877	[AMDGPU] Use AddedComplexity for ret and noret atomic ops selection This patch removes the predicate for return atomic ops and uses AddedComplexity to distinguish its selection from its no return variant. This will produce better matchers that doesn't unnecessarily check for the negated predicate if the initial predicate failed. Also, it simplifies the enabling of no return atomic ops selection in GlobalISel. Differential Revision: https://reviews.llvm.org/D128241	2022-07-08 09:47:33 +05:30
Joe Nash	07b7fada73	[AMDGPU] gfx11 VOPD instructions MC support VOPD is a new encoding for dual-issue instructions for use in wave32. This patch includes MC layer support only. A VOPD instruction is constituted of an X component (for which there are 13 possible opcodes) and a Y component (for which there are the 13 X opcodes plus 3 more). Most of the complexity in defining and parsing a VOPD operation arises from the possible different total numbers of operands and deferred parsing of certain operands depending on the constituent X and Y opcodes. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D128218	2022-06-24 11:08:39 -04:00
Joe Nash	e243ead6fc	Reland [AMDGPU] gfx11 vop3dpp instructions There was an issue with encoding wide (>64 bit) instructions on BigEndian hosts, which is fixed in D127195. Therefore reland this. gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Differential Revision: https://reviews.llvm.org/D126483	2022-06-07 14:49:13 -04:00
Joe Nash	eaed07eb7e	Revert "[AMDGPU] gfx11 vop3dpp instructions" This reverts commit `99a83b1286`.	2022-06-06 17:12:09 -04:00
Joe Nash	99a83b1286	[AMDGPU] gfx11 vop3dpp instructions gfx11 adds the ability to use dpp modifiers on vop3 instructions. This patch adds machine code layer support for that. The MCCodeEmitter is changed to use APInt instead of uint64_t to support these wider instructions. Patch 16/N for upstreaming of AMDGPU gfx11 architecture Depends on D126475 Reviewed By: rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D126483	2022-06-06 09:34:59 -04:00
Matt Arsenault	0ecbb683a2	TableGen/GlobalISel: Make address space/align predicates consistent The builtin predicate handling has a strange behavior where the code assumes that a PatFrag is a stack of PatFrags, and each level adds at most one predicate. I don't think this particularly makes sense, especially without a diagnostic to ensure you aren't trying to set multiple at once. This wasn't followed for address spaces and alignment, which could potentially fall through to report no builtin predicate was added. Just switch these to follow the existing convention for now.	2022-04-22 15:48:07 -04:00
Abinav Puthan Purayil	45ca94334e	[AMDGPU] Select no-return atomic intrinsics in tblgen This is to avoid relying on the post-isel hook. This change also enable the saddr pattern selection for atomic intrinsics in GlobalISel. Differential Revision: https://reviews.llvm.org/D123583	2022-04-22 09:37:40 +05:30
Abinav Puthan Purayil	b7df71524e	[AMDGPU][GlobalISel] Force return atomic selection for now	2022-04-20 16:00:08 +05:30
Abinav Puthan Purayil	f4e8cf25af	[AMDGPU] Select no-return ds_* atomic ops in tblgen. SelectionDAG relies on MachineInstr's HasPostISelHook for selecting the no-return atomic ops. GlobalISel, at the moment, doesn't handle HasPostISelHook. This change adds the selection for no-return ds_* atomic ops in tblgen so that it can work with both GlobalISel and SelectionDAG. I couldn't add the predicates for GlobalISel in this change since there's a restriction in GlobalISelEmitter that disallows selecting generic atomics ops that return with instructions that doesn't return. We can't remove the HasPostISelHook code that selects the no return atomic ops in SelectionDAG yet since we still need to cover selections in FLATInstructions.td, BUFInstructions.td. Differential Revision: https://reviews.llvm.org/D115881	2022-02-10 09:26:37 +05:30
Abinav Puthan Purayil	d8b690409d	[AMDGPU] Set MemoryVT for truncstores in tblgen. GlobalISelEmitter was skipping these patterns when its predicates were checked. This patch should allow us to select d16_hi stores in GlobalISel. Differential Revision: https://reviews.llvm.org/D117762	2022-01-20 19:05:12 +05:30
Matt Arsenault	9392b40d4b	AMDGPU/GlobalISel: Fix selection of constant 32-bit addrspace loads Unfortunately the selection patterns still rely on the address space from the memory operand instead of using the pointer type. Add this address space to the list of cases supported by global-like loads. Alternatively we would have to adjust the address space of the memory operand to deviate from the underlying IR value, which looks ugly and is more work in the legalizer. This doesn't come up in the DAG path because it uses a different selection strategy where the cast is inserted during the addressing mode matching.	2022-01-17 10:06:33 -05:00
Mateja Marjanovic	ca57b80cd6	Code quality: Combine V_RSQ Combine V_RCP and V_SQRT into V_RSQ on AMDGPU for GlobalISel. Change-Id: I93c5dcb412483156a6e8b68c4085cbce83ac9703	2021-11-30 17:17:15 +01:00
Abinav Puthan Purayil	14c4051122	[AMDGPU][NFC] Remove unused defvar in AMDGPUInstructions.td.	2021-11-30 17:03:37 +05:30
Abinav Puthan Purayil	078da26b1c	[AMDGPU] Check for unneeded shift mask in shift PatFrags. The existing constrained shift PatFrags only dealt with masked shift from OpenCL front-ends. This change copies the X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in the shift PatFrag predicates. Differential Revision: https://reviews.llvm.org/D113448	2021-11-24 10:53:12 +05:30
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Piotr Sobczak	d869921004	[AMDGPU] Add patterns for i8/i16 local atomic load/store Add patterns for i8/i16 local atomic load/store. Added tests for new patterns. Copied atomic_[store/load]_local.ll to GlobalISel directory. Differential Revision: https://reviews.llvm.org/D111869	2021-10-18 11:23:10 +02:00
Jay Foad	ce098ccc1c	[AMDGPU] Simplify tablegen files. NFC. There is no need to cast records to strings before comparing them.	2021-07-07 09:19:23 +01:00
Julien Pagès	46adccc5cc	[AMDGPU] Improve Codegen for build_vector Improve the code generation of build_vector. Use the v_pack_b32_f16 instruction instead of v_and_b32 + v_lshl_or_b32 Differential Revision: https://reviews.llvm.org/D98081 Patch by Julien Pagès!	2021-05-12 14:17:44 +01:00
Jay Foad	d92b4956d6	[AMDGPU] Inline FSHRPattern into its only use. NFC.	2021-03-26 09:32:02 +00:00
Paul C. Anagnostopoulos	91d2e5c81a	[TableGen] Add the !filter bang operator. Add a test. Update the Programmer's Reference. Use it in some TableGen files. Differential Revision: https://reviews.llvm.org/D91008	2020-11-09 10:56:55 -05:00
Jay Foad	0d5989bb24	[AMDGPU] Split R600 and GCN bfe patterns This is in preparation for making the GCN patterns divergence-aware. NFC. Differential Revision: https://reviews.llvm.org/D88579	2020-10-05 09:55:10 +01:00
Jay Foad	286d3fc750	[AMDGPU] Split R600 and GCN bfi patterns This is in preparation for making the GCN patterns divergence-aware. NFC. Differential Revision: https://reviews.llvm.org/D88244	2020-09-28 10:16:51 +01:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Jay Foad	ecac951be9	[AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83382	2020-07-08 19:14:49 +01:00
Matt Arsenault	9e03bdebc1	AMDGPU: Add llvm.amdgcn.sqrt intrinsic I spread the GlobalISel test into the regular one, which I've been avoiding so far.	2020-06-26 15:07:07 -04:00
Matt Arsenault	89c8c80bd5	AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul If f32 denormals were enabled pre-gfx9, we would still try to implement this with v_max_f32. Pre-gfx9, these instructions ignored the denormal mode and did not flush. Switch to the multiply form for f32 as a workaround which should always work in any case. This fixes conformance failures when the library implementation of fmin/fmax were accidentally not inlined, forcing the assumption of no flushing on targets where denormals are not enabled by default. This is a workaround, since really we should not be mixing code with different FP mode expectations, but prefer the lowering that will work in any mode. Now this will always use max to implement canonicalize on gfx9+. This is only really beneficial for f64. For f32/f16 it's a neutral choice (and worse in terms of code size in 1 case), but possibly worse for the compiler since it does add an extra register use operand. Leave this change for later.	2020-04-23 15:24:13 -04:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Simon Pilgrim	e91feeed21	[AMDGPU] Add ISD::FSHR -> ALIGNBIT support This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions. This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default. Differential Revision: https://reviews.llvm.org/D76070	2020-03-12 20:16:57 +00:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Matt Arsenault	84e035d8f1	AMDGPU: Don't check constant address space for atomic stores We define a separate list for storable address spaces. This saves entry in the matcher table address space list.	2020-01-24 12:15:09 -08:00
Matt Arsenault	9ffd0ed838	AMDGPU/GlobalISel: Fix import of integer med3 This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.	2020-01-09 10:29:32 -05:00
Matt Arsenault	c66b2e1c87	AMDGPU: Eliminate more legacy codepred address space PatFrags These should now be limited to R600 code.	2020-01-09 10:29:32 -05:00
Matt Arsenault	3766f4bacc	AMDGPU: Use new PatFrag system for d16 stores	2020-01-09 10:29:32 -05:00
Matt Arsenault	22700f68e1	AMDGPU: Annotate EXTRACT_SUBREGs with source register classes This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.	2020-01-07 21:56:16 -05:00
Matt Arsenault	bd8d696c14	AMDGPU: Use ImmLeaf	2020-01-07 15:10:07 -05:00
Matt Arsenault	e4464bf3d4	AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR	2020-01-06 11:19:33 -05:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
Stanislav Mekhanoshin	4312c4afd4	[AMDGPU] deduplicate tablegen predicates We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815	2019-11-04 12:19:17 -08:00
Matt Arsenault	171cf5302f	AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.	2019-10-25 13:11:09 -07:00
Matt Arsenault	ae87b9f2c2	AMDGPU/GlobalISel: Select local atomic cmpxchg llvm-svn: 367511	2019-08-01 03:41:41 +00:00
Matt Arsenault	e6ce48422c	AMDGPU: Start redefining atomic PatFrags Start migrating to a form that will be compatible with the global isel emitter. Also should fix some overly lax checks on the memory type, which allowed mis-selecting some illegal atomics. llvm-svn: 367506	2019-08-01 03:25:52 +00:00
Matt Arsenault	3baf4d3418	AMDGPU/GlobalISel: Select simple local stores llvm-svn: 367504	2019-08-01 03:09:15 +00:00
Matt Arsenault	3594011de0	AMDGPU/GlobalISel: Select local loads llvm-svn: 367498	2019-08-01 00:53:38 +00:00
Matt Arsenault	52c262484f	TableGen: Add MinAlignment predicate AMDGPU uses some custom code predicates for testing alignments. I'm still having trouble comprehending the behavior of predicate bits in the PatFrag hierarchy. Any attempt to abstract these properties unexpectdly fails to apply them. llvm-svn: 367373	2019-07-31 00:14:43 +00:00
Matt Arsenault	57ef94fb06	AMDGPU: Avoid emitting "true" predicates Empty condition strings are considerde always true. This removes a lot of clutter from the generated matcher tables. This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to 6.1M. llvm-svn: 367326	2019-07-30 15:56:43 +00:00
Matt Arsenault	e3401a9b86	AMDGPU: Redefine setcc condition PatLeafs Avoid using custom code predicates. llvm-svn: 366609	2019-07-19 20:24:40 +00:00
Matt Arsenault	8f8d07e93b	AMDGPU: Replace store PatFrags Convert the easy cases to formats understood for GlobalISel. llvm-svn: 366240	2019-07-16 18:21:25 +00:00
Matt Arsenault	c6fd5abecc	AMDGPU: Redefine load PatFrags Rewrite PatFrags using the new PatFrag address space matching in tablegen. These will now work with both SelectionDAG and GlobalISel. llvm-svn: 366234	2019-07-16 17:38:50 +00:00

1 2 3

127 Commits