Before D114230, indirect moves used regular MOV opcodes and were
identified by having an implicit use of M0. Since D114230 they use
dedicated opcodes instead, so remove some old code that checks for
implicit uses of M0. NFCI.
Differential Revision: https://reviews.llvm.org/D138308
isLiteralConstant and isLiteralConstantLike were similar to
!isInlineConstant with slight differences like handling isReg operands.
To avoid a profusion of similar functions with undocumented differences,
this patch removes all the isLiteralConstant* variants. Callers are responsible
for handling the isReg case.
Differential Revision: https://reviews.llvm.org/D125759
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D137540
I've been trying to understand the backend better and decided to read the code of this pass.
While doing so, I noticed parts that could be refactored to be a tiny bit clearer.
I tried to keep the changes minimal, a non-exhaustive list of changes is:
- Stylistic changes to better fit LLVM's coding style
- Removing dead/useless functions (e.g. FoldCandidate had getters, but it's a public struct!)
- Saving regs/opcodes in variables if they're going to be used multiple times in the same condition
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D137539
There was quite a bit of logic there that was just in the middle of core loop. I think it makes it easier to follow when it's split off in a separate helper like the others.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D137538
Due to the encoding changes in GFX11, we had a hack in place that
disables the use of VGPRs above 128. This patch removes the need for
that hack.
We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
operands of VOP1, VOP2, and VOPC instructions. This register class only has the
low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
VOP2, and VOPC instructions are correctly limited to use the first 128
VGPRs, while the other instructions can freely use all 256.
We introduce new pseduo-instructions used on GFX11 which have the suffix
t16 (True 16) to use the VGPR_32_Lo128 register class.
Reviewed By: foad, rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D133723
Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.
Clear all kill flags on source register when folding a COPY.
This is necessary because the kills may now be out of order with the uses.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D130622
Previously SIFoldOperands::foldInstOperand would only fold a
non-inlinable immediate into a single user, so as not to increase code
size by adding the same 32-bit literal operand to many instructions.
This patch removes that restriction, so that a non-inlinable immediate
will be folded into any number of users. The rationale is:
- It reduces the number of registers used for holding constant values,
which might increase occupancy. (On the other hand, many of these
registers are SGPRs which no longer affect occupancy on GFX10+.)
- It reduces ALU stalls between the instruction that loads a constant
into a register, and the instruction that uses it.
- The above benefits are expected to outweigh any increase in code size.
Differential Revision: https://reviews.llvm.org/D114643
Use TII::getRegClass to return a valid regclass or a nullptr
if the RC is unknown for a given OpIdx. This fixes a potential
crash occurred while getting the RC from a variadic instruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D120813
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.
Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.
This patch separates MAC and FMA versions of MFMA instructions using either
tied dst and src2 or earlyclobber dst.
Fixes: SWDEV-318900
Differential Revision: https://reviews.llvm.org/D117844
Expanding on D109750.
Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.
The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D112852
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.
Also, added the missing AV register classes from
192b to 1024b.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D109300
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D113493
This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D110229
Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible VOP3-
only mad/fma form.
With this change, the only way we should emit VOP3-encoded mac/fmac is
if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs
for both src0 and src1. In all other cases the mac/fmac should either be
converted to mad/fma or shrunk to VOP2 encoding.
Differential Revision: https://reviews.llvm.org/D110156
Prevent SIFoldOperands from creating SALU instructions with a constant
and a frame index. Previously, only one operand was checked to be a
frame index, leading to too many constants when flat scratch is enabled
and stack offsets are large.
Differential Revision: https://reviews.llvm.org/D108368
These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.
We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.
First clean up the strange API of tryConstantFoldOp where it took an
immediate operand value, but no indication of which operand it was the
value for.
Second clean up the loop that calls tryConstantFoldOp so that it does
not have to restart from the beginning every time it folds an
instruction.
This is NFCI but there are some minor changes caused by the order in
which things are folded.
Differential Revision: https://reviews.llvm.org/D100031
This is fairly cheap to implement and means less work for future
passes like MachineDCE.
Reapply with a fix for using InstToErase after it had been erased.
Differential Revision: https://reviews.llvm.org/D100188
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.
Differential Revision: https://reviews.llvm.org/D100117
Look through copies to find more cases where the two values being
selected are identical. The motivation for this is just to be able to
remove the weird special case where tryFoldCndMask was called from
foldInstOperand, part way through folding a move-immediate into its
users, without regressing any lit tests.