Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D108117
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103800
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are
required for a MIMG address operand.
Maintain _V8 instruction variants of pseudo instructions allowing
assembly prior to GFX10 to work as-is. Currently the validator
can tell for GFX10 what the correct size is, so will disallow
oversize address registers.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103672
This looks like a mistake when the tests were committed in r363946.
There were two sets of tests for the f32 variant of these instructions,
instead of one set for f16 and one set for f32.
Differential Revision: https://reviews.llvm.org/D103699
A16 support for image instructions assembly/disassembly (gfx10) was missing
Also refactor MIMG op addr size calcs to common function
We'd got 3 places where the same operation was being done.
One test is now marked XFAIL until a related codegen patch is in place
Differential Revision: https://reviews.llvm.org/D102231
Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
Update the list of s_sendmsg messages known to the assembler and
disassembler and validate the ones that were added or removed in gfx9
and gfx10.
Differential Revision: https://reviews.llvm.org/D97295
Enabled "bound_ctrl:1" and disabled "bound_ctrl:-1" syntax.
Corrected printer to output "bound_ctrl:1" instead of "bound_ctrl:0".
See bug 35397 for detailed issue description.
Differential Revision: https://reviews.llvm.org/D97048
These two instructions are VOP3P and have op_sel_hi bits,
however do not use op_sel_hi. That is recommended to set
unused op_sel_hi bits to 1. However, we cannot decode
both representations with 1 and 0 if bits are set to
default value 1. If bits are set to be ignored with '?'
initializer then encoding defaults them to 0.
The patch is a hack to force ignored '?' bits to 1 on
encoding for these instructions.
There is still canonicalization happens on disasm print
if incoming values are non-default, so that disasm output
does not match binary input, but this is pre-existing
problem for all instructions with '?' bits.
Fixes: SWDEV-272540
Differential Revision: https://reviews.llvm.org/D96543
Add mimgopc object to represent the opcode allowing different
opcodes for different hardware variants.
This enables image_atomic_fcmpswap, image_atomic_fmin, and
image_atomic_fmax on GFX10
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D96309
With tfe on there can be a vgpr write to vdata+1.
Add tablegen support for 5 register vdata store.
This is required for 4 register vdata store with tfe.
Differential Revision: https://reviews.llvm.org/D94960
I think the global_load/store_dword_addtid instructions support
switching off the scalar address.
Add assembler and disassembler support for this.
Differential Revision: https://reviews.llvm.org/D93288
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.
Differential Revision: https://reviews.llvm.org/D91202
By setting up the AsmStrings correctly we can remove some special cases
from AMDGPUInstPrinter::printOffset.
Differential Revision: https://reviews.llvm.org/D90307
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.
Differential Revision: https://reviews.llvm.org/D89501
This instruction was introduced in GFX10.3, reusing the opcode of
v_mac_legacy_f32 from GFX10.1.
Differential Revision: https://reviews.llvm.org/D89247
The VGPR component is a 32-bit offset, not 64-bits.
I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3.
op dst, addr, rsrc, FORMAT, soffset
This change adds support for SP3 syntax:
op dst, addr, rsrc, soffset SP3FORMAT
In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants.
format:<expression>
format:[<numeric-format-name>,<data-format-name>]
format:[<data-format-name>,<numeric-format-name>]
format:[<data-format-name>]
format:[<numeric-format-name>]
format:[<unified-format-name>]
The last syntax variant is supported for GFX10 only.
See llvm bug 37738
Reviewers: arsenm, rampitec, vpykhtin
Differential Revision: https://reviews.llvm.org/D84026
16-bit literals are encoded as 32-bit values. If high 16-bits of the value is 0xFFFF, the decoded instruction cannot be reassembled.
For example, the following code
0xff,0x04,0x04,0x52,0xcd,0xab,0xff,0xff
was decoded as
v_mul_lo_u16_e32 v2, 0xffffabcd, v2
However this literal is actually a 64-bit constant 0x00000000ffffabcd which violates requirements described in the documentation - the truncation is not safe.
This change corrects decoding to make reassembly possible.
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D84098
MTBUF implementation has many issues and this change addresses most of these:
- refactored duplicated code;
- hardcoded constants moved out of high-level code;
- fixed a decoding error when nfmt or dfmt are zero (bug 36932);
- corrected parsing of operand separators (bug 46403);
- corrected handling of missing operands (bug 46404);
- corrected handling of out-of-range modifiers (bug 46421);
- corrected default value (bug 46467).
Reviewers: arsenm, rampitec, vpykhtin, artem.tamazov, kzhuravl
Differential Revision: https://reviews.llvm.org/D83760
It seems to be a hardware defect that the half inline constants do not
work as expected for the 16-bit integer operations (the inverse does
work correctly). Experimentation seems to show these are really
reading the 32-bit inline constants, which can be observed by writing
inline asm using op_sel to see what's in the high half of the
constant. Theoretically we could fold the high halves of the 32-bit
constants using op_sel.
The *_asm_all.s MC tests are broken, and I don't know where the script
to autogenerate these are. I started manually fixing it, but there's
just too many cases to fix. This also does break the
assembler/disassembler support for these values, and I'm not sure what
to do about it. These are still valid encodings, so it seems like you
should be able to use them in some way. If you wrote assembly using
them, you could have really meant it (perhaps to read the high bits
with op_sel?). The disassembler will print the invalid literal
constant which will fail to re-assemble. The behavior is also
different depending on the use context. Consider this example, which
was previously accepted and encoded using the inline constant:
v_mad_i16 v5, v1, -4.0, v3
; encoding: [0x05,0x00,0xec,0xd1,0x01,0xef,0x0d,0x04]
In contexts where an inline immediate is required (such as on gfx8/9),
this will now be rejected. For gfx10, this will produce the literal
encoding and change the printed format:
v_mad_i16 v5, v1, 0xc400, v3
; encoding: [0x05,0x00,0x5e,0xd7,0x01,0xff,0x0d,0x04,0x00,0xc4,0x00,0x00]
This is just another variation of the issue that we don't perfectly
handle round trip assembly/disassembly due to not tracking how
immediates were encoded. This doesn't matter much in practice, since
compilers don't emit the suboptimal encoding. I doubt any users are
relying on this behavior (although I did make use of the old behavior
to figure out what was wrong).
Fixes bug 46302.