VOPD is a new encoding for dual-issue instructions for use in wave32.
This patch includes MC layer support only.
A VOPD instruction is constituted of an X component (for which there are
13 possible opcodes) and a Y component (for which there are the 13 X
opcodes plus 3 more). Most of the complexity in defining and parsing
a VOPD operation arises from the possible different total numbers of
operands and deferred parsing of certain operands depending on the
constituent X and Y opcodes.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D128218
Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.
Also use the new intrinsic in the atomic optimizer pass.
Differential Revision: https://reviews.llvm.org/D127662
The reverted dependent commit is now relanded, so reland this.
Includes dpp instructions and vop1/vop2 promoted to vop3
Patch 17/N for upstreaming of AMDGPU gfx11 architecture
Depends on D126483
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D126917
Includes dpp instructions and vop1/vop2 promoted to vop3
Patch 17/N for upstreaming of AMDGPU gfx11 architecture
Depends on D126483
Reviewed By: rampitec, #amdgpu
Differential Revision: https://reviews.llvm.org/D126917
NFCI. Previously the implicit use was added to V_MOV_B32_indirect_read
when building the instruction. V_MOV_B32_indirect_write didn't have an
implicit use of M0 at all, but apparently it did not cause any problems.
Differential Revision: https://reviews.llvm.org/D114239
Introduce V_MOV_B32_indirect_read for indexed vgpr reads
(and rename the old V_MOV_B32_indirect to
V_MOV_B32_indirect_write) so they can be unambiguously
distinguished from regular V_MOV_B32_e32. Previously they
were distinguished by looking for extra implicit operands
but this is fragile because regular moves sometimes have
extra implicit operands too:
- either by accident, when instructions end up with
duplicate implicit operands (see e.g. D100939)
- or by design, when SIInstrInfo::copyPhysReg breaks a
multi-dword copy into individual subreg mov instructions
and adds implicit operands for the super-register.
The effect of this is that SIInstrInfo::isFoldableCopy can
be simplified and identifies more foldable copies. The test
diffs show that more immediate 0 values have been folded as
inline operands.
SIInstrInfo::isReallyTriviallyReMaterializable could
probably be simplified too but that is not part of this
patch.
Differential Revision: https://reviews.llvm.org/D114230
This is a pilot change to verify the logic. The rest will be
done in a same way, at least the rest of VOP1.
Differential Revision: https://reviews.llvm.org/D105742
These lines set the value to what it already was,
so they are redundant. NFC
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D100664
Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.
This fix simplifies handling of single VOP instructions by adding a dedicated flag.
Differential Revision: https://reviews.llvm.org/D99408
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.
Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.
Differential Revision: https://reviews.llvm.org/D99187
Make OMod explicit instead of implied by HasModifiers in the
operand list. Requires explicitly setting HasOMod=1 for
irregular OMod usage in instruction V_CVT_{U,I}*
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D97587
Change-Id: I230e1476f529e816eec60e242531f23a99e3839f
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.
Differential Revision: https://reviews.llvm.org/D94795
This is used to mark transcendental instructions that execute on a
separate pipeline from the normal VALU pipeline.
Differential Revision: https://reviews.llvm.org/D92042
This may be missing a few overrides to set it off still in some
special cases. Since the flags set during selection should now be
reliably preserved, this should not change codegen for non-strictfp
functions.
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.
Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.
Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
This was assumed to be a simple move, and interpreting the immediate
modifier operand as a materialized immediate. Apparently the SDWA pass
never produces these, but GlobalISel does emit these for some vector
shuffles.
Summary:
As far as I can tell on gfx10 conversions to/from f32 (that are not
converting f32 to/from f64) are full rate instructions, but they were
marked as quarter rate instructions.
I have fixed this for gfx10 only. I assume the scheduling model was
correct for older architectures, though I don't have any documentation
handy to confirm that.
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75392
a785209bc2 switched to using a pseudos instead of manually tying
operands on the regular instruction. The VGPR indexing mode path
should have the same problems that change attempted to avoid, so these
should use the same strategy.
Use a single pseudo for the VGPR indexing mode and movreld paths, and
expand it based on the subtarget later. These have essentially the
same constraints, reading the index from m0.
Switch from using an offset to the subregister index directly, instead
of computing an offset and re-adding it back. Also add missing pseudos
for existing register class sizes.
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.
We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.
This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886
Differential Revision: https://reviews.llvm.org/D69815
Do not generate non-existing sdwa instructions. It reduces the
number of generated instructions by 185.
Differential Revision: https://reviews.llvm.org/D69010
llvm-svn: 375016
This defaults to zero fi operand, but we do not expose it
anyway. Should we expose it later it needs to be added to
the pseudo.
This enables dpp combining on gfx10.
Differential Revision: https://reviews.llvm.org/D68888
llvm-svn: 374604
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.
Differential Revision: https://reviews.llvm.org/D68607
llvm-svn: 374083
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338