isLiteralConstant and isLiteralConstantLike were similar to
!isInlineConstant with slight differences like handling isReg operands.
To avoid a profusion of similar functions with undocumented differences,
this patch removes all the isLiteralConstant* variants. Callers are responsible
for handling the isReg case.
Differential Revision: https://reviews.llvm.org/D125759
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D137540
Create new VALU instructions in moveToVALU instead of mutating the
existing SALU instruction. This makes it easier to add extra operands so
we can convert to the VOP3 form of VALU instructions.
NFCI but it does have the minor side effect of removing duplicate
implicit operands that were present on the original SALU if they are
default implicit operands for the VALU.
Differential Revision: https://reviews.llvm.org/D137324
This patch fixes:
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:7383: warning: enumerated
mismatch in conditional expression:
‘llvm::AMDGPU::UfmtGFX11::UnifiedFormat’ vs
‘llvm::AMDGPU::UfmtGFX10::UnifiedFormat’
Strict WQM does not require a WQM transistion if it occurs within
an existing WQM section.
This occurs heavily in GFX11 pixel shaders with LDS_PARAM_LOAD.
Which leads to unnecessary EXEC mask manipulation.
To avoid these transitions, detect WQM -> Strict WQM -> WQM
and substitute new ENTER_PSEUDO_WM/EXIT_PSEUDO_WM markers instead.
These are treat similarly by WWM register pre-allocation pass,
but do not manipulate EXEC or use registers to save EXEC state.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D136813
Due to the encoding changes in GFX11, we had a hack in place that
disables the use of VGPRs above 128. This patch removes the need for
that hack.
We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
operands of VOP1, VOP2, and VOPC instructions. This register class only has the
low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
VOP2, and VOPC instructions are correctly limited to use the first 128
VGPRs, while the other instructions can freely use all 256.
We introduce new pseduo-instructions used on GFX11 which have the suffix
t16 (True 16) to use the VGPR_32_Lo128 register class.
Reviewed By: foad, rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D133723
This is helpful for detecting whether a block ends with divergent branch
in passes before lowering the pseudo control flow instructions.
Differential Revision: https://reviews.llvm.org/D133184
Currently there isn't a generic way to get a smaller register class
that can be produced from a subregister of a larger class. Replaces a
manually implemented version for AMDGPU. This will be used to improve
subregister support in the allocator.
D125803 introduced shrinking of F16 FMA to FMAAK/FMAMK in
SIShrinkInstructions (useful on GFX10+ where VOP3 instructions may have
a literal operand) but failed to handle the V_FMA_F16_gfx9_e64 form of
the opcode which is used on GFX9+.
Differential Revision: https://reviews.llvm.org/D133489
This improves a corner case where v_fmac can be converted to v_fma on
GFX10+ even if it has a literal operand.
Differential Revision: https://reviews.llvm.org/D130992
The instruction is used to modify wave priority with the intent
to affect VALU execution and currently we can reschedule VALU
around it since that VALU does not have side effects.
Differential Revision: https://reviews.llvm.org/D130654
For most DPP instructions, the old operand stores the value that was in
the current lane before the DPP operation, and is tied to the
destination. For VOPC DPP, this is unnecessary and incorrect.
There appears to have been a latent bug related to D122737 with
SIInstrInfo::isOperandLegal. If you checked if a register operand was legal
when the InstructionDesc expected an immediate, it reported that is valid.
Its fix is necessary for and tested in this patch.
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D130040
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
D114999 added code to kill an immediate def if it was folded into its
only use by convertToThreeAddress. This patch updates LiveVariables when
that happens in order to fix verification failures exposed by D129213.
Differential Revision: https://reviews.llvm.org/D129661
We form VOPD instructions in the GCNCreateVOPD pass by combining
back-to-back component instructions. There are strict register
constraints for creating a legal VOPD, namely that the matching operands
(e.g. src0x and src0y, src1x and src1y) must be in different register
banks. We add a PostRA scheduler
mutation to put possible VOPD components back-to-back.
Depends on D128442, D128270
Reviewed By: #amdgpu, rampitec
Differential Revision: https://reviews.llvm.org/D128656
On gfx10+ null register can be used as both 32 and 64 bit operand.
Define a 64 bit version of the register to use during codegen.
Differential Revision: https://reviews.llvm.org/D127527
This uses rotating reminder of division by 3 to select another
temp vgpr each next time in a sequence of several agpr copies.
Therefore, temp vgpr selection depends on the generated agpr
number. This number could change with any unrelated change to
the register definitions.
Stabilize the selection by using a real agpr number.
Differential Revision: https://reviews.llvm.org/D127524
Avoid the dependency on TargetInstrInfo, which depends on the subtarget
and therefore the individual function.
Currently AMDGPU is constructing PseudoSourceValue instances in MachineFunctionInfo.
In order to facilitate copying MachineFunctionInfo, we need to stop allocating these
there. Alternatively we could allow targets to subclass PseudoSourceValueManager,
and allocate them similarly to MachineFunctionInfo.
Even though single address image instructions only use a single VGPR
HW accesses 4 or 5 which creates alignment requirement.
Fixes: SWDEV-316648
Differential Revision: https://reviews.llvm.org/D126009
Extend SIInstrInfo::isOperandLegal to enforce a limit on the number of
literal operands for all VALU instructions, not just VOP3. In particular
it now handles VOP2 instructions with a mandatory literal operand like
V_FMAAK_F32.
Differential Revision: https://reviews.llvm.org/D126064
Extend the literal operand checking in SIInstrInfo::verifyInstruction to
check VOP2 instructions like V_FMAAK_F32 which have a mandatory literal
operand. The rule is that src0 can also be a literal, but only if it is
the same literal value.
AMDGPUAsmParser::validateConstantBusLimitations already handles this
correctly.
Differential Revision: https://reviews.llvm.org/D126063
AMDGPUAsmParser::validateSOPLiteral already knew about this but
SIInstrInfo::verifyInstruction did not.
Differential Revision: https://reviews.llvm.org/D125976
Includes MachineCode layer support and tests, and MIR tests not requiring
CodeGen pass changes.
Includes a small change in SMInstructions.td to correct encoded bits.
Contributors:
Petar Avramovic <Petar.Avramovic@amd.com>
Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>
Depends on D125316
Patch 6/N for upstreaming of AMDGPU gfx11 architecture.
Reviewed By: dp, Petar.Avramovic
Differential Revision: https://reviews.llvm.org/D125319
Adds an intrinsic/builtin that can be used to fine tune scheduler behavior. If
there is a need to have highly optimized codegen and kernel developers have
knowledge of inter-wave runtime behavior which is unknown to the compiler this
builtin can be used to tune scheduling.
This intrinsic creates a barrier between scheduling regions. The immediate
parameter is a mask to determine the types of instructions that should be
prevented from crossing the sched_barrier. In this initial patch, there are only
two variations. A mask of 0 means that no instructions may be scheduled across
the sched_barrier. A mask of 1 means that non-memory, non-side-effect inducing
instructions may cross the sched_barrier.
Note that this intrinsic is only meant to work with the scheduling passes. Any
other transformations that may move code will not be impacted in the ways
described above.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D124700
Tablegen definitions for subtarget features and cpp predicate functions to
access the features.
New Sub-TargetProcessors and common latencies.
Simple changes to MIR codegen tests which pass on gfx11 because they have the
same output as previous subtargets or operate on pseudo instructions which
are reused from previous subtargets.
Contributors:
Jay Foad <jay.foad@amd.com>
Petar Avramovic <Petar.Avramovic@amd.com>
Patch 4/N for upstreaming of AMDGPU gfx11 architecture
Depends on D124538
Reviewed By: Petar.Avramovic, foad
Differential Revision: https://reviews.llvm.org/D125261
When folding a COPY of exec into another COPY, the call to
TII->isOperandLegal would crash because COPYs don't have defined
register classes for their operands.
Differential Revision: https://reviews.llvm.org/D122737
This is an NFC patch in preparation to fix a bug related to always
reserving VGPR32 for AGPR copy.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D123651
Summary:
To compute the size of a VALU/SALU instruction, we need to check whether an operand
could ever be literal. Previously isLiteralConstant was used, which missed cases
like global variables or external symbols. These misses lead to under-estimation of
the instruction size and branch offset, and thus incorrectly skip the necessary branch
relaxation when the branch offset is actually greater than what the branch bits can hold.
In this work, we use isLiteralConstantLike to check the operands. It maybe conservative,
but it is safe.
Reviewers: arsenm
Differential Revision: https://reviews.llvm.org/D122778
Split waterfall loops into multiple blocks so that exec mask
manipulation (s_and_saveexec) does not occur in the middle of
a block.
VGPR live range optimizer is updated to handle waterfall loops
spanning multiple blocks.
Reviewed By: ruiling
Differential Revision: https://reviews.llvm.org/D122200
On GFX10.3 targets, the following instruction sequence
v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR
leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.
An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.
This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.
It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.
Same as D119696 including a buildbot and MIR test fix.
Reviewed By: critson
Differential Revision: https://reviews.llvm.org/D122332
and, also update the function indirectCopyToAGPR() to ensure that it is called only on GFX908 sub-target.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D122286