Update intrinsics to use n x f16 and n x i16 instead
of 32-bit types. This may avoid the need for a bitcast
and is probably less confusing.
Depends on making v16f16 and v16i16 types legal.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D128951
Includes dpp versions of VOP3P instructions.
Patch 18/N for upstreaming of AMDGPU gfx11 architecture
Depends on D126917
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D126978
Since there is a table introduced for MAI instructions extend it
to use for DGEMM classification.
Differential Revision: https://reviews.llvm.org/D122337
We can select _vgprcd versions of MAI instructions and have no
AGPRs with the whole budget left for VGPRs if:
1. This is a kernel;
2. It has no calls;
3. It runs at least on 2 waves thus having not more that 256 VGPRs.
4. There is no inline asm requesting AGPRs.
Differential Revision: https://reviews.llvm.org/D117253
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.
Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.
This patch separates MAC and FMA versions of MFMA instructions using either
tied dst and src2 or earlyclobber dst.
Fixes: SWDEV-318900
Differential Revision: https://reviews.llvm.org/D117844
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.
This fix simplifies handling of single VOP instructions by adding a dedicated flag.
Differential Revision: https://reviews.llvm.org/D99408
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.
Differential Revision: https://reviews.llvm.org/D98717
These two instructions are VOP3P and have op_sel_hi bits,
however do not use op_sel_hi. That is recommended to set
unused op_sel_hi bits to 1. However, we cannot decode
both representations with 1 and 0 if bits are set to
default value 1. If bits are set to be ignored with '?'
initializer then encoding defaults them to 0.
The patch is a hack to force ignored '?' bits to 1 on
encoding for these instructions.
There is still canonicalization happens on disasm print
if incoming values are non-default, so that disasm output
does not match binary input, but this is pre-existing
problem for all instructions with '?' bits.
Fixes: SWDEV-272540
Differential Revision: https://reviews.llvm.org/D96543
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.
Differential Revision: https://reviews.llvm.org/D94795
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only the mir.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D94341
Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
These parameters set a default value of 0, so I believe they
should include a 0 suffix. This allows for versions which do not
set a default value in future.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D93187
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.
Differential Revision: https://reviews.llvm.org/D91202
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.
Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.
Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
I tried to use some of the new tablegen features to avoid creating
different operand list permutations, but I still don't see a way to
programmatically build a source pattern dag.
Also add GlobalISel tests, which now all import successfully.
Some of the fneg fold tests are incorrect, which need to be fixed in a
future commit
We don't use this, and matching from the def doesn't make much sense.
There are multiple tablegen bugs with default operand
handling. undef_tied_input should work to handle the vdst_in
correctly, but this breaks the operand register class constraint which
it should be able to infer.
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.
We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.
This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886
Differential Revision: https://reviews.llvm.org/D69815