Commit Graph

127 Commits

Author SHA1 Message Date
Abinav Puthan Purayil 17a81ecf85 [AMDGPU] Use the HasNoUse predicate for no-ret atomic op selection
This change replaces the C++ predicates with the HasNoUse builtin
predicate that would enable the no-ret atomic op selection in
GlobalISel.

Differential Revision: https://reviews.llvm.org/D125213
2022-07-08 09:47:33 +05:30
Abinav Puthan Purayil 7504c7a877 [AMDGPU] Use AddedComplexity for ret and noret atomic ops selection
This patch removes the predicate for return atomic ops and uses
AddedComplexity to distinguish its selection from its no return variant.
This will produce better matchers that doesn't unnecessarily check for
the negated predicate if the initial predicate failed. Also, it
simplifies the enabling of no return atomic ops selection in GlobalISel.

Differential Revision: https://reviews.llvm.org/D128241
2022-07-08 09:47:33 +05:30
Joe Nash 07b7fada73 [AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32.
This patch includes MC layer support only.

A VOPD instruction is constituted of an X component (for which there are
13 possible opcodes) and a Y component (for which there are the 13 X
opcodes plus 3 more). Most of the complexity in defining and parsing
a VOPD operation arises from the possible different total numbers of
operands and deferred parsing of certain operands depending on the
constituent X and Y opcodes.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D128218
2022-06-24 11:08:39 -04:00
Joe Nash e243ead6fc Reland [AMDGPU] gfx11 vop3dpp instructions
There was an issue with encoding wide (>64 bit) instructions on
BigEndian hosts, which is fixed in D127195. Therefore reland this.

gfx11 adds the ability to use dpp modifiers on vop3 instructions.
This patch adds machine code layer support for that. The MCCodeEmitter
is changed to use APInt instead of uint64_t to support these wider
instructions.

Patch 16/N for upstreaming of AMDGPU gfx11 architecture

Differential Revision: https://reviews.llvm.org/D126483
2022-06-07 14:49:13 -04:00
Joe Nash eaed07eb7e Revert "[AMDGPU] gfx11 vop3dpp instructions"
This reverts commit 99a83b1286.
2022-06-06 17:12:09 -04:00
Joe Nash 99a83b1286 [AMDGPU] gfx11 vop3dpp instructions
gfx11 adds the ability to use dpp modifiers on vop3 instructions.
This patch adds machine code layer support for that. The MCCodeEmitter
is changed to use APInt instead of uint64_t to support these wider
instructions.

Patch 16/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126475

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126483
2022-06-06 09:34:59 -04:00
Matt Arsenault 0ecbb683a2 TableGen/GlobalISel: Make address space/align predicates consistent
The builtin predicate handling has a strange behavior where the code
assumes that a PatFrag is a stack of PatFrags, and each level adds at
most one predicate. I don't think this particularly makes sense,
especially without a diagnostic to ensure you aren't trying to set
multiple at once.

This wasn't followed for address spaces and alignment, which could
potentially fall through to report no builtin predicate was
added. Just switch these to follow the existing convention for now.
2022-04-22 15:48:07 -04:00
Abinav Puthan Purayil 45ca94334e [AMDGPU] Select no-return atomic intrinsics in tblgen
This is to avoid relying on the post-isel hook.

This change also enable the saddr pattern selection for atomic
intrinsics in GlobalISel.

Differential Revision: https://reviews.llvm.org/D123583
2022-04-22 09:37:40 +05:30
Abinav Puthan Purayil b7df71524e [AMDGPU][GlobalISel] Force return atomic selection for now 2022-04-20 16:00:08 +05:30
Abinav Puthan Purayil f4e8cf25af [AMDGPU] Select no-return ds_* atomic ops in tblgen.
SelectionDAG relies on MachineInstr's HasPostISelHook for selecting the
no-return atomic ops. GlobalISel, at the moment, doesn't handle
HasPostISelHook.

This change adds the selection for no-return ds_* atomic ops in tblgen
so that it can work with both GlobalISel and SelectionDAG. I couldn't
add the predicates for GlobalISel in this change since there's a
restriction in GlobalISelEmitter that disallows selecting generic
atomics ops that return with instructions that doesn't return.

We can't remove the HasPostISelHook code that selects the no return
atomic ops in SelectionDAG yet since we still need to cover selections
in FLATInstructions.td, BUFInstructions.td.

Differential Revision: https://reviews.llvm.org/D115881
2022-02-10 09:26:37 +05:30
Abinav Puthan Purayil d8b690409d [AMDGPU] Set MemoryVT for truncstores in tblgen.
GlobalISelEmitter was skipping these patterns when its predicates were
checked. This patch should allow us to select d16_hi stores in
GlobalISel.

Differential Revision: https://reviews.llvm.org/D117762
2022-01-20 19:05:12 +05:30
Matt Arsenault 9392b40d4b AMDGPU/GlobalISel: Fix selection of constant 32-bit addrspace loads
Unfortunately the selection patterns still rely on the address space
from the memory operand instead of using the pointer type. Add this
address space to the list of cases supported by global-like loads.

Alternatively we would have to adjust the address space of the memory
operand to deviate from the underlying IR value, which looks ugly and
is more work in the legalizer.

This doesn't come up in the DAG path because it uses a different
selection strategy where the cast is inserted during the addressing
mode matching.
2022-01-17 10:06:33 -05:00
Mateja Marjanovic ca57b80cd6 Code quality: Combine V_RSQ
Combine V_RCP and V_SQRT into V_RSQ on AMDGPU for GlobalISel.

Change-Id: I93c5dcb412483156a6e8b68c4085cbce83ac9703
2021-11-30 17:17:15 +01:00
Abinav Puthan Purayil 14c4051122 [AMDGPU][NFC] Remove unused defvar in AMDGPUInstructions.td. 2021-11-30 17:03:37 +05:30
Abinav Puthan Purayil 078da26b1c [AMDGPU] Check for unneeded shift mask in shift PatFrags.
The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.

Differential Revision: https://reviews.llvm.org/D113448
2021-11-24 10:53:12 +05:30
Abinav Puthan Purayil 61e3b9fefe [AMDGPU] Add constrained shift pattern matches.
The motivation for this is due to clang's conformance to
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift
which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b`
in OpenCL where a is an int of bit width <width>.

Differential revision: https://reviews.llvm.org/D110231
2021-10-26 19:07:19 +05:30
Piotr Sobczak d869921004 [AMDGPU] Add patterns for i8/i16 local atomic load/store
Add patterns for i8/i16 local atomic load/store.

Added tests for new patterns.

Copied atomic_[store/load]_local.ll to GlobalISel directory.

Differential Revision: https://reviews.llvm.org/D111869
2021-10-18 11:23:10 +02:00
Jay Foad ce098ccc1c [AMDGPU] Simplify tablegen files. NFC.
There is no need to cast records to strings before comparing them.
2021-07-07 09:19:23 +01:00
Julien Pagès 46adccc5cc [AMDGPU] Improve Codegen for build_vector
Improve the code generation of build_vector.
Use the v_pack_b32_f16 instruction instead of
v_and_b32 + v_lshl_or_b32

Differential Revision: https://reviews.llvm.org/D98081

Patch by Julien Pagès!
2021-05-12 14:17:44 +01:00
Jay Foad d92b4956d6 [AMDGPU] Inline FSHRPattern into its only use. NFC. 2021-03-26 09:32:02 +00:00
Paul C. Anagnostopoulos 91d2e5c81a [TableGen] Add the !filter bang operator.
Add a test. Update the Programmer's Reference.

Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D91008
2020-11-09 10:56:55 -05:00
Jay Foad 0d5989bb24 [AMDGPU] Split R600 and GCN bfe patterns
This is in preparation for making the GCN patterns divergence-aware.
NFC.

Differential Revision: https://reviews.llvm.org/D88579
2020-10-05 09:55:10 +01:00
Jay Foad 286d3fc750 [AMDGPU] Split R600 and GCN bfi patterns
This is in preparation for making the GCN patterns divergence-aware.
NFC.

Differential Revision: https://reviews.llvm.org/D88244
2020-09-28 10:16:51 +01:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Mirko Brkusanin d17ea67b92 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Jay Foad ecac951be9 [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM
Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32.

Differential Revision: https://reviews.llvm.org/D83382
2020-07-08 19:14:49 +01:00
Matt Arsenault 9e03bdebc1 AMDGPU: Add llvm.amdgcn.sqrt intrinsic
I spread the GlobalISel test into the regular one, which I've been
avoiding so far.
2020-06-26 15:07:07 -04:00
Matt Arsenault 89c8c80bd5 AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul
If f32 denormals were enabled pre-gfx9, we would still try to
implement this with v_max_f32. Pre-gfx9, these instructions ignored
the denormal mode and did not flush. Switch to the multiply form for
f32 as a workaround which should always work in any case.

This fixes conformance failures when the library implementation of
fmin/fmax were accidentally not inlined, forcing the assumption of no
flushing on targets where denormals are not enabled by default. This
is a workaround, since really we should not be mixing code with
different FP mode expectations, but prefer the lowering that will work
in any mode.

Now this will always use max to implement canonicalize on gfx9+. This
is only really beneficial for f64. For f32/f16 it's a neutral choice
(and worse in terms of code size in 1 case), but possibly worse for
the compiler since it does add an extra register use operand. Leave
this change for later.
2020-04-23 15:24:13 -04:00
Matt Arsenault 5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Simon Pilgrim e91feeed21 [AMDGPU] Add ISD::FSHR -> ALIGNBIT support
This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions.

This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default.

Differential Revision: https://reviews.llvm.org/D76070
2020-03-12 20:16:57 +00:00
Matt Arsenault 1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Matt Arsenault 84e035d8f1 AMDGPU: Don't check constant address space for atomic stores
We define a separate list for storable address spaces. This saves
entry in the matcher table address space list.
2020-01-24 12:15:09 -08:00
Matt Arsenault 9ffd0ed838 AMDGPU/GlobalISel: Fix import of integer med3
This isn't too useful now, since nothing is currently trying to form
min/max from cmp+select.
2020-01-09 10:29:32 -05:00
Matt Arsenault c66b2e1c87 AMDGPU: Eliminate more legacy codepred address space PatFrags
These should now be limited to R600 code.
2020-01-09 10:29:32 -05:00
Matt Arsenault 3766f4bacc AMDGPU: Use new PatFrag system for d16 stores 2020-01-09 10:29:32 -05:00
Matt Arsenault 22700f68e1 AMDGPU: Annotate EXTRACT_SUBREGs with source register classes
This partially fixes GlobalISel import of the patterns, but removes a
lot of entriess from the end of the skipped pattern log.
2020-01-07 21:56:16 -05:00
Matt Arsenault bd8d696c14 AMDGPU: Use ImmLeaf 2020-01-07 15:10:07 -05:00
Matt Arsenault e4464bf3d4 AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR 2020-01-06 11:19:33 -05:00
Matt Arsenault db0ed3e429 AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.

This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
2019-11-19 19:55:43 +05:30
Stanislav Mekhanoshin 4312c4afd4 [AMDGPU] deduplicate tablegen predicates
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.

We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.

This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886

Differential Revision: https://reviews.llvm.org/D69815
2019-11-04 12:19:17 -08:00
Matt Arsenault 171cf5302f AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
2019-10-25 13:11:09 -07:00
Matt Arsenault ae87b9f2c2 AMDGPU/GlobalISel: Select local atomic cmpxchg
llvm-svn: 367511
2019-08-01 03:41:41 +00:00
Matt Arsenault e6ce48422c AMDGPU: Start redefining atomic PatFrags
Start migrating to a form that will be compatible with the global isel
emitter. Also should fix some overly lax checks on the memory type,
which allowed mis-selecting some illegal atomics.

llvm-svn: 367506
2019-08-01 03:25:52 +00:00
Matt Arsenault 3baf4d3418 AMDGPU/GlobalISel: Select simple local stores
llvm-svn: 367504
2019-08-01 03:09:15 +00:00
Matt Arsenault 3594011de0 AMDGPU/GlobalISel: Select local loads
llvm-svn: 367498
2019-08-01 00:53:38 +00:00
Matt Arsenault 52c262484f TableGen: Add MinAlignment predicate
AMDGPU uses some custom code predicates for testing alignments.

I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.

llvm-svn: 367373
2019-07-31 00:14:43 +00:00
Matt Arsenault 57ef94fb06 AMDGPU: Avoid emitting "true" predicates
Empty condition strings are considerde always true. This removes a lot
of clutter from the generated matcher tables.

This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to
6.1M.

llvm-svn: 367326
2019-07-30 15:56:43 +00:00
Matt Arsenault e3401a9b86 AMDGPU: Redefine setcc condition PatLeafs
Avoid using custom code predicates.

llvm-svn: 366609
2019-07-19 20:24:40 +00:00
Matt Arsenault 8f8d07e93b AMDGPU: Replace store PatFrags
Convert the easy cases to formats understood for GlobalISel.

llvm-svn: 366240
2019-07-16 18:21:25 +00:00
Matt Arsenault c6fd5abecc AMDGPU: Redefine load PatFrags
Rewrite PatFrags using the new PatFrag address space matching in
tablegen. These will now work with both SelectionDAG and GlobalISel.

llvm-svn: 366234
2019-07-16 17:38:50 +00:00