and, also update the function indirectCopyToAGPR() to ensure that it is called only on GFX908 sub-target.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D122286
First, add code to reserve all required special purpose registers,
followed by code to reserve SGPRs, followed by code to reserve
VGPRs/AGPRs.
This patch is prepared as a pre-requisite to fix an issue related to
GFX90A hardware.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122219
In the frame index lowering we have to insert shift and add
instructions to adjust stack object access. We need to take care of the stack
object user kind and use scalar shift/add for scalar users.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121524
In the frame index lowering we have to insert shift and add
instructions to adjust stack object access. We need to take care of the stack
object user kind and use scalar shift/add for scalar users.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121524
BUILD_VECTOR of i16 and undef gets expanded to the COPY_TO_REGCLASS.
The latter is further lowererd to the copy instructions.
We need to provide the correct register class for the uniform and divergent BUILD_VECTOR nodes
to avoid VGPR to SGPR copies.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D122068
Simplify some for loops. Don't bother checking src2 operand for
writelane because it doesn't have one. Check all VALU instructions,
not just VOP1/2/3/C/SDWA.
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.
An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.
This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.
It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.
Reviewed By: sebastian-ne, critson
Differential Revision: https://reviews.llvm.org/D119696
Summary:
Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.
Reviewers: arsenm, sameerds, yaxunl
Differential Revision: https://reviews.llvm.org/D120265
When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.
Differential Revision: https://reviews.llvm.org/D121874
NFCI. The motivation for this is avoid problems in future if we add new
classes containing only a subset of all VGPRs, or a subset of all SGPRs.
getMinimalPhysRegClass would favour these smaller classes, which is not
what we want here.
Differential Revision: https://reviews.llvm.org/D121914
This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFInstructions.td. This allows us to select the return and
no-return variants in tblgen.
Differential Revision: https://reviews.llvm.org/D121770
The fp32 packed math instructions are introduced in gfx90a.
If their vector register operands are not properly aligned, the
verifier should flag them. Currently, the verifier failed to
report it and the compiler ended up emitting a broken assembly.
This patch fixes that missed case in TII::verifyInstruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D121794
This is by analogy with HasFlatScratchSTMode and is slightly more
informative than using isGFX940Plus.
Differential Revision: https://reviews.llvm.org/D121804
NFC. Switch from calculations based on dwords to bits, to be more
flexible.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D121730
The original design of custom operands support assumed that most GPUs
have the same or very similar operand names end encodings. This is
no longer the case. As a result the support code becomes over-complicated
and difficult to maintain.
This change implements a different design with the following benefits:
- support of aliases;
- support of operands with overlapped encodings;
- identification of defined but unsupported operands.
Differential Revision: https://reviews.llvm.org/D121696
I met the issue here when working on something else.
Actually we have already reserved EXEC, but it looks
like the register coalescer is causing the sub-register
of EXEC appears in LiveIntervals. I have not looked
deeper why register coalscer have such behavior, but
removeAllRegUnitsForPhysReg() is the right way.
Reviewed By: critson, foad, arsenm
Differential Revision: https://reviews.llvm.org/D117014
The namespaces of HWREGs is now overlapping with gfx10. Thus the
patch is longer than necessary to just support new names. It also
need to handle proper error messages, i.e. to issue a "specified
hardware register is not supported on this GPU" message.
This may need a major refactoring in the future.
Differential Revision: https://reviews.llvm.org/D121418