Commit Graph

82 Commits

Author SHA1 Message Date
Dmitry Preobrazhensky bb901dcc5a [AMDGPU][MC][GFX940] Correct disassembly of MFMA opcodes
Add a decoder table for GFX940 MFMA opcodes.

Differential Revision: https://reviews.llvm.org/D130759
2022-08-01 16:00:47 +03:00
Dmitry Preobrazhensky fa7fd8ec31 [AMDGPU][MC][GFX11] Disable SGPRs for src1 of v_fma_mix*_dpp opcodes
Differential Revision: https://reviews.llvm.org/D130634
2022-07-28 14:20:05 +03:00
Stanislav Mekhanoshin 523a99c0eb [AMDGPU] Support for gfx940 fp8 smfmac
Differential Revision: https://reviews.llvm.org/D129908
2022-07-18 12:12:41 -07:00
Stanislav Mekhanoshin 2695f0a688 [AMDGPU] Support for gfx940 fp8 mfma
Differential Revision: https://reviews.llvm.org/D129906
2022-07-18 11:49:56 -07:00
Piotr Sobczak b6ef36a1c4 [AMDGPU] Update WMMA intrinsics with explicit f16 types
Update intrinsics to use n x f16 and n x i16 instead
of 32-bit types. This may avoid the need for a bitcast
and is probably less confusing.

Depends on making v16f16 and v16i16 types legal.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D128951
2022-07-01 08:55:25 +02:00
Piotr Sobczak 4874838a63 [AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756
2022-06-30 11:13:45 -04:00
Matt Arsenault ffd6aaf5b6 AMDGPU: Make packed 32-bit instructions rematerializable 2022-06-29 11:57:54 -04:00
Matt Arsenault 4c400dc103 AMDGPU: Make 16-bit pk instructions rematerializable 2022-06-29 11:57:53 -04:00
Joe Nash 2d43de13df [AMDGPU] gfx11 new dot instruction codegen support
Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D127904
2022-06-16 14:19:34 -04:00
Joe Nash 40f35cef89 [AMDGPU] gfx11 VOP3P instruction MC support
Includes dpp versions of VOP3P instructions.

Patch 18/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126917

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126978
2022-06-08 13:32:01 -04:00
Dmitry Preobrazhensky 44673278e0 [AMDGPU][MC][GFX940] Add SMFMAC aliases
Differential Revision: https://reviews.llvm.org/D125888
2022-05-19 13:40:48 +03:00
Dmitry Preobrazhensky 32ca9bd7b5 [AMDGPU][MC][GFX940] Correct tied operand decoding for smfmac opcodes
Differential Revision: https://reviews.llvm.org/D125790
2022-05-18 15:39:30 +03:00
Stanislav Mekhanoshin 64838ba365 [AMDGPU] Use GenericTable to classify DGEMM
Since there is a table introduced for MAI instructions extend it
to use for DGEMM classification.

Differential Revision: https://reviews.llvm.org/D122337
2022-03-24 13:00:37 -07:00
Stanislav Mekhanoshin cad9de71d7 [AMDGPU] gfx940 MAI hazard recognizer
Differential Revision: https://reviews.llvm.org/D122263
2022-03-24 12:59:52 -07:00
Stanislav Mekhanoshin 6e3e14f600 [AMDGPU] Support gfx940 smfmac instructions
Differential Revision: https://reviews.llvm.org/D122191
2022-03-24 12:40:42 -07:00
Stanislav Mekhanoshin 27439a7642 [AMDGPU] New gfx940 mfma instructions
Differential Revision: https://reviews.llvm.org/D122044
2022-03-24 12:12:52 -07:00
Stanislav Mekhanoshin 4570527e72 [AMDGPU] Disable some MFMA instructions on gfx940
Differential Revision: https://reviews.llvm.org/D121956
2022-03-18 13:19:12 -07:00
Stanislav Mekhanoshin d9ac55fab2 [AMDGPU] New MFMA names for existing instructions
Old names are supported as aliases.
_1k MFMA got new opcodes.

Differential Revision: https://reviews.llvm.org/D121741
2022-03-17 13:05:36 -07:00
Stanislav Mekhanoshin 522b259976 [AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940
Differential Revision: https://reviews.llvm.org/D121843
2022-03-17 12:12:06 -07:00
Stanislav Mekhanoshin c4500de255 [AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
2022-03-14 17:02:00 -07:00
Sebastian Neubauer 6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Stanislav Mekhanoshin aeaf85b9c2 [AMDGPU] Select VGPR versions of MFMA if possible
We can select _vgprcd versions of MAI instructions and have no
AGPRs with the whole budget left for VGPRs if:

1. This is a kernel;
2. It has no calls;
3. It runs at least on 2 waves thus having not more that 256 VGPRs.
4. There is no inline asm requesting AGPRs.

Differential Revision: https://reviews.llvm.org/D117253
2022-02-08 10:19:41 -08:00
Stanislav Mekhanoshin dbf278b984 [AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.

Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.

This patch separates MAC and FMA versions of MFMA instructions using either
tied dst and src2 or earlyclobber dst.

Fixes: SWDEV-318900

Differential Revision: https://reviews.llvm.org/D117844
2022-01-26 14:48:20 -08:00
Abinav Puthan Purayil 61e3b9fefe [AMDGPU] Add constrained shift pattern matches.
The motivation for this is due to clang's conformance to
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift
which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b`
in OpenCL where a is an int of bit width <width>.

Differential revision: https://reviews.llvm.org/D110231
2021-10-26 19:07:19 +05:30
Jay Foad 128a49727a [AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC.
The warning is implemented by D109359 which is still in review.

Differential Revision: https://reviews.llvm.org/D109826
2021-09-16 09:07:18 +01:00
Jay Foad 66234ce49f [AMDGPU] Set VOP3P flag on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 15:00:45 +01:00
Joe Nash 7cc4a02fa2 [AMDGPU] Refactor VOP3P Profile and AsmParser, NFC
Refactors VOP3P tablegen and the AsmParser for VOP3P
for better extensibility. NFC intended

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100602

Change-Id: I038e3a772ac348bb18979cdf3e3ae2e9476dd411
2021-04-16 13:06:50 -04:00
Dmitry Preobrazhensky 0f5ebbcc7f [AMDGPU][MC] Added flag to identify VOP instructions which have a single variant
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.

This fix simplifies handling of single VOP instructions by adding a dedicated flag.

Differential Revision: https://reviews.llvm.org/D99408
2021-04-01 13:53:12 +03:00
Jay Foad 967b64beb4 [AMDGPU] Split dot2-insts feature
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.

Differential Revision: https://reviews.llvm.org/D98717
2021-03-17 09:42:21 +00:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin c0d7a8bc62 [AMDGPU] Allow accvgpr_read/write decode with opsel
These two instructions are VOP3P and have op_sel_hi bits,
however do not use op_sel_hi. That is recommended to set
unused op_sel_hi bits to 1. However, we cannot decode
both representations with 1 and 0 if bits are set to
default value 1. If bits are set to be ignored with '?'
initializer then encoding defaults them to 0.

The patch is a hack to force ignored '?' bits to 1 on
encoding for these instructions.

There is still canonicalization happens on disasm print
if incoming values are non-default, so that disasm output
does not match binary input, but this is pre-existing
problem for all instructions with '?' bits.

Fixes: SWDEV-272540

Differential Revision: https://reviews.llvm.org/D96543
2021-02-12 10:04:47 -08:00
Mirko Brkusanin 608ac62540 [AMDGPU] Fix use of HasModifiers in VopProfile
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.

Differential Revision: https://reviews.llvm.org/D94795
2021-01-26 15:21:11 +01:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
Carl Ritson 62c246eda2 [AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0
These parameters set a default value of 0, so I believe they
should include a 0 suffix. This allows for versions which do not
set a default value in future.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93187
2020-12-14 20:01:56 +09:00
Stanislav Mekhanoshin 544ef42e40 [AMDGPU] Set default op_sel_hi on accvgpr read/write
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.

Differential Revision: https://reviews.llvm.org/D91202
2020-11-10 13:07:29 -08:00
Jay Foad f6a5699c6c [AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC. 2020-10-21 09:56:43 +01:00
Stanislav Mekhanoshin 722d792499 [AMDGPU] Reorganize VOP3P encoding
This changes width of encoding and opcode fields to match the
documentation.

Differential Revision: https://reviews.llvm.org/D88619
2020-09-30 15:27:06 -07:00
Matt Arsenault a4edc04693 AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat
We also have never handled this for SelectionDAG, which needs
additional work.
2020-07-28 11:18:05 -04:00
Matt Arsenault d2e74fad20 AMDGPU: Set more mov flags on V_ACCVGPR_{READ|WRITE}_B32
This fixes extra copies when materializing constants in AGPRs. This
made it a lot harder to trigger the spilling in spill-agpr.ll
2020-07-01 18:58:59 -04:00
Matt Arsenault 651c36b508 AMDGPU: Select strict_fmul 2020-06-04 17:49:00 -04:00
Matt Arsenault 483d4daa5e AMDGPU: Select strict_fma
Like with strict_fadd, the legalization is scalarizing the v4f16 when
it should split.
2020-06-04 17:49:00 -04:00
Matt Arsenault ae26c064ce AMDGPU: Select strict_fadd 2020-06-04 17:49:00 -04:00
Matt Arsenault 4b4496312e AMDGPU: Start adding MODE register uses to instructions
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.

Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.

Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
2020-05-27 14:47:00 -04:00
Fangrui Song 2cb48d620f [TableGen] Drop deprecated leading # operation (NOP) and replace ## with # 2020-04-25 16:26:45 -07:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Matt Arsenault db06870dbd AMDGPU: Move dot intrinsic patterns to instruction def
I tried to use some of the new tablegen features to avoid creating
different operand list permutations, but I still don't see a way to
programmatically build a source pattern dag.

Also add GlobalISel tests, which now all import successfully.

Some of the fneg fold tests are incorrect, which need to be fixed in a
future commit
2020-02-21 13:35:40 -05:00
Matt Arsenault 4c1c9422a3 AMDGPU/GlobalISel: Select llvm.amdgcn.fdot2
I'm slighly worried about the generated checks, since they won't catch
incorrect modifiers being added at the end of the line.
2020-02-21 13:35:40 -05:00
Matt Arsenault 60023e3471 AMDGPU: Use default operand for VOP3P clamp
We don't use this, and matching from the def doesn't make much sense.

There are multiple tablegen bugs with default operand
handling. undef_tied_input should work to handle the vdst_in
correctly, but this breaks the operand register class constraint which
it should be able to infer.
2020-02-21 12:14:18 -05:00
Matt Arsenault 7f3280ecdd AMDGPU/GlobalISel: Select permlane16/permlanex16 2020-01-29 17:55:31 -05:00
Stanislav Mekhanoshin 4312c4afd4 [AMDGPU] deduplicate tablegen predicates
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.

We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.

This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886

Differential Revision: https://reviews.llvm.org/D69815
2019-11-04 12:19:17 -08:00