Commit Graph

110 Commits

Author SHA1 Message Date
Joe Nash ebb258d3b0 [AMDGPU] Make V_SAT_PK_U8_I16 a True16 Instruction
The return type is two u8 packed into a 16 bit VGPR, so this instruction
should be True16.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D135478
2022-10-10 10:33:49 -04:00
Dmitry Preobrazhensky f4b1cfa1cb [AMDGPU][MC][GFX11] Correct e64_dpp variants of v_movreld and v_movrelsd
Differential Revision: https://reviews.llvm.org/D135079
2022-10-05 16:47:18 +03:00
Joe Nash b982ba2a6e [AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Due to the encoding changes in GFX11, we had a hack in place that
    disables the use of VGPRs above 128. This patch removes the need for
    that hack.

    We introduce a new register class VGPR_32_Lo128 which is used for 16-bit
    operands of VOP1, VOP2, and VOPC instructions. This register class only has the
    low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1,
    VOP2, and VOPC instructions are correctly limited to use the first 128
    VGPRs, while the other instructions can freely use all 256.

    We introduce new pseduo-instructions used on GFX11 which have the suffix
    t16 (True 16) to use the VGPR_32_Lo128 register class.

Reviewed By: foad, rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D133723
2022-09-20 09:56:28 -04:00
Joe Nash 3e39ab25e6 [AMDGPU][GFX11] Fix dst register class for V_CVT_U32_U16
This instruction was referring to the wrong VOPProfile, likely due to a
typo, leading to an incorrect destination register type.

The MC layer will care about this change, but is NFC while 16-bit values
actually use 32 bit registers.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132878
2022-08-30 14:01:25 -04:00
Joe Nash 70e7a1257c [AMDGPU][NFC] Allow separate RC for VOP3 DPP Dst
Create a field in VOPProfile called DstRCVOP3DPP to allow the VOP3
versions of DPP instructions to have a different destination register
class than the non-VOP3 encoding. NFC for current instructions, but
planned to be functional in upcoming ones.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D132673
2022-08-29 11:22:07 -04:00
Stanislav Mekhanoshin 9fa5a6b7e8 [AMDGPU] Support for gfx940 fp8 conversions
Differential Revision: https://reviews.llvm.org/D129902
2022-07-18 11:48:43 -07:00
Joe Nash 07b7fada73 [AMDGPU] gfx11 VOPD instructions MC support
VOPD is a new encoding for dual-issue instructions for use in wave32.
This patch includes MC layer support only.

A VOPD instruction is constituted of an X component (for which there are
13 possible opcodes) and a Y component (for which there are the 13 X
opcodes plus 3 more). Most of the complexity in defining and parsing
a VOPD operation arises from the possible different total numbers of
operands and deferred parsing of certain operands depending on the
constituent X and Y opcodes.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D128218
2022-06-24 11:08:39 -04:00
Jay Foad 7e681ef35e [AMDGPU] Add GFX11 codegen for llvm.amdgcn.mov.dpp8
Differential Revision: https://reviews.llvm.org/D127980
2022-06-16 19:44:28 +01:00
Dmitry Preobrazhensky b26afab9d1 [AMDGPU][MC][GFX11] Correct src0 for dpp variants of v_cvt_*_e64
Differential Revision: https://reviews.llvm.org/D127847
2022-06-16 13:48:43 +03:00
Jay Foad bfcfd53b92 [AMDGPU] Add GFX11 llvm.amdgcn.permlane64 intrinsic
Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.

Also use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D127662
2022-06-13 21:12:11 +01:00
Joe Nash 086a9c1062 Reland [AMDGPU] gfx11 VOP1+VOP2 Instruction MC support
The reverted dependent commit is now relanded, so reland this.
Includes dpp instructions and vop1/vop2 promoted to vop3

Patch 17/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126483

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126917
2022-06-08 11:10:57 -04:00
Joe Nash f617f89e5b Revert "[AMDGPU] gfx11 VOP1+VOP2 Instruction MC support"
This reverts commit 6079804498.
2022-06-06 17:11:35 -04:00
Joe Nash 6079804498 [AMDGPU] gfx11 VOP1+VOP2 Instruction MC support
Includes dpp instructions and vop1/vop2 promoted to vop3

Patch 17/N for upstreaming of AMDGPU gfx11 architecture

Depends on D126483

Reviewed By: rampitec, #amdgpu

Differential Revision: https://reviews.llvm.org/D126917
2022-06-06 09:57:59 -04:00
Dmitry Preobrazhensky 5c0bf1303e [AMDGPU][MC][GFX10] Removed unsupported 64bit DPP opcodes
Removed 64bit DPP opcodes from asm matcher tables.

Differential Revision: https://reviews.llvm.org/D123611
2022-04-13 14:43:40 +03:00
Stanislav Mekhanoshin e7b362d75d [AMDGPU] Add v_mov_b64 gfx940 opcode
Differential Revision: https://reviews.llvm.org/D121023
2022-03-07 12:07:12 -08:00
Jay Foad 05d79e3562 [AMDGPU] Divergence-driven instruction selection for bitreverse
Differential Revision: https://reviews.llvm.org/D119702
2022-02-24 20:21:59 +00:00
Jay Foad ff7f2cfa95 [AMDGPU] Add an implicit use of M0 to all V_MOV_B32_indirect_read/write
NFCI. Previously the implicit use was added to V_MOV_B32_indirect_read
when building the instruction. V_MOV_B32_indirect_write didn't have an
implicit use of M0 at all, but apparently it did not cause any problems.

Differential Revision: https://reviews.llvm.org/D114239
2021-11-19 19:00:17 +00:00
Jay Foad 30b27ecfc2 [AMDGPU] Use new opcode for indexed vgpr reads
Introduce V_MOV_B32_indirect_read for indexed vgpr reads
(and rename the old V_MOV_B32_indirect to
V_MOV_B32_indirect_write) so they can be unambiguously
distinguished from regular V_MOV_B32_e32. Previously they
were distinguished by looking for extra implicit operands
but this is fragile because regular moves sometimes have
extra implicit operands too:
- either by accident, when instructions end up with
  duplicate implicit operands (see e.g. D100939)
- or by design, when SIInstrInfo::copyPhysReg breaks a
  multi-dword copy into individual subreg mov instructions
  and adds implicit operands for the super-register.

The effect of this is that SIInstrInfo::isFoldableCopy can
be simplified and identifies more foldable copies. The test
diffs show that more immediate 0 values have been folded as
inline operands.

SIInstrInfo::isReallyTriviallyReMaterializable could
probably be simplified too but that is not part of this
patch.

Differential Revision: https://reviews.llvm.org/D114230
2021-11-19 13:08:11 +00:00
Stanislav Mekhanoshin 4eb24817ec [AMDGPU] Mark all relevant VOP1 instructions rematerializable
Differential Revision: https://reviews.llvm.org/D105919
2021-07-21 14:05:32 -07:00
Stanislav Mekhanoshin d46d534dbb [AMDGPU] Make some VOP1 instructions rematerializable
This is a pilot change to verify the logic. The rest will be
done in a same way, at least the rest of VOP1.

Differential Revision: https://reviews.llvm.org/D105742
2021-07-12 23:43:45 -07:00
Jay Foad 7f3ac6714a [AMDGPU] Set SALU, VALU and other instruction type flags on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 13:36:02 +01:00
Jay Foad 323b3e645d [AMDGPU] Set mayLoad and mayStore on Real instructions
This does not affect codegen but might benefit llvm-mca.
2021-06-16 12:10:23 +01:00
Joe Nash a0ed70abde [AMDGPU] Remove redundant field from DPP8 def
These lines set the value to what it already was,
so they are redundant. NFC

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100664

Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3
2021-04-16 16:23:52 -04:00
Dmitry Preobrazhensky 0f5ebbcc7f [AMDGPU][MC] Added flag to identify VOP instructions which have a single variant
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.

This fix simplifies handling of single VOP instructions by adding a dedicated flag.

Differential Revision: https://reviews.llvm.org/D99408
2021-04-01 13:53:12 +03:00
Jay Foad fc7e3e7dd9 [AMDGPU] Set SchedRW on real instructions
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.

Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.

Differential Revision: https://reviews.llvm.org/D99187
2021-03-23 15:38:11 +00:00
Joe Nash 5531f24cc2 [AMDGPU] Make OMod explicit for V_CVT_{U,I}*
Make OMod explicit instead of implied by HasModifiers in the
operand list. Requires explicitly setting HasOMod=1 for
irregular OMod usage in instruction V_CVT_{U,I}*

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D97587

Change-Id: I230e1476f529e816eec60e242531f23a99e3839f
2021-03-02 13:32:06 -05:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Mirko Brkusanin 608ac62540 [AMDGPU] Fix use of HasModifiers in VopProfile
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.

Differential Revision: https://reviews.llvm.org/D94795
2021-01-26 15:21:11 +01:00
Jay Foad 4926eed59c [AMDGPU] Add a TRANS bit to TSFlags. NFC.
This is used to mark transcendental instructions that execute on a
separate pipeline from the normal VALU pipeline.

Differential Revision: https://reviews.llvm.org/D92042
2020-11-24 17:49:56 +00:00
Dmitry Preobrazhensky 2e87acac9b [AMDGPU] Removed s_mov_regrd and mov_fed opcodes
These opcodes are not intended for public use.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D81659
2020-07-17 19:52:54 +03:00
Matt Arsenault 9e03bdebc1 AMDGPU: Add llvm.amdgcn.sqrt intrinsic
I spread the GlobalISel test into the regular one, which I've been
avoiding so far.
2020-06-26 15:07:07 -04:00
Matt Arsenault d259668731 AMDGPU: Set mayRaiseFPException
This may be missing a few overrides to set it off still in some
special cases. Since the flags set during selection should now be
reliably preserved, this should not change codegen for non-strictfp
functions.
2020-06-04 17:35:27 -04:00
Jay Foad 9ce0f7eed6 [AMDGPU] Introduce new sched classes for transcendental instructions
This is in preparation for scheduling them slightly differently on
gfx10. NFC.

Differential Revision: https://reviews.llvm.org/D81011
2020-06-04 10:29:32 +01:00
Matt Arsenault 4b4496312e AMDGPU: Start adding MODE register uses to instructions
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.

Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.

Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
2020-05-27 14:47:00 -04:00
Matt Arsenault b27a538dda AMDGPU: Fix illegally constant folding from V_MOV_B32_sdwa
This was assumed to be a simple move, and interpreting the immediate
modifier operand as a materialized immediate. Apparently the SDWA pass
never produces these, but GlobalISel does emit these for some vector
shuffles.
2020-05-18 15:34:33 -04:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Matt Arsenault f463792506 AMDGPU: Remove custom node for RSQ_LEGACY
Directly select from the intrinsic. This wasn't getting much value
from the custom node.
2020-04-17 19:50:36 -04:00
Matt Arsenault 79b29d6df7 AMDGPU: Remove DisableInst feature
I'm not sure why these were bothering to check the instruction
profile, since those profiles should only be used with these
instruction classes.
2020-04-06 09:27:44 -04:00
Matt Arsenault 9564f46766 AMDGPU: Make use of default operands 2020-03-28 17:33:29 -04:00
Jay Foad c8f0d27ef3 [AMDGPU] Fix the gfx10 scheduling model for f32 conversions
Summary:
As far as I can tell on gfx10 conversions to/from f32 (that are not
converting f32 to/from f64) are full rate instructions, but they were
marked as quarter rate instructions.

I have fixed this for gfx10 only. I assume the scheduling model was
correct for older architectures, though I don't have any documentation
handy to confirm that.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75392
2020-03-10 19:31:24 +00:00
Matt Arsenault d1b393d92c AMDGPU/GlobalISel: Select G_CTTZ_ZERO_UNDEF
Directly select this rather than going through the intermediate
instruction, which may provide some combine value in the future.
2020-02-12 16:19:46 -08:00
Matt Arsenault c05f23e409 AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp
This is deprecated, but easy to support.
2020-01-22 11:43:53 -05:00
Matt Arsenault dd09ec1208 AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp8 2020-01-22 11:43:40 -05:00
Matt Arsenault 9b13b4a0e3 AMDGPU: Prepare to use scalar register indexing
Define pseudos mirroring the the VGPR indexing ones, and adjust the
operands in the s_movrel* instructions to avoid the result def.
2020-01-20 17:19:16 -05:00
Matt Arsenault 8615eeb455 AMDGPU: Partially merge indirect register write handling
a785209bc2 switched to using a pseudos instead of manually tying
operands on the regular instruction. The VGPR indexing mode path
should have the same problems that change attempted to avoid, so these
should use the same strategy.

Use a single pseudo for the VGPR indexing mode and movreld paths, and
expand it based on the subtarget later. These have essentially the
same constraints, reading the index from m0.

Switch from using an offset to the subregister index directly, instead
of computing an offset and re-adding it back. Also add missing pseudos
for existing register class sizes.
2020-01-20 17:19:16 -05:00
Matt Arsenault 592de0009f AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp
The existing test is overly reliant on -mattr=-flat-for-global, and
some missing optimizations to re-use.
2020-01-17 20:09:53 -05:00
Matt Arsenault 78b30a54c9 AMDGPU/GlobalISel: Fix readfirstlane pattern import
The imm folding optimization pattern failed to import. The instruction
pattern was already working, but failing to fail on SGPR inputs.
2020-01-07 11:07:08 -05:00
Dmitry Preobrazhensky edd9f70163 [AMDGPU][MC][GFX10] Enabled v_movrel*[sdwa|dpp|dpp8] opcodes
See https://bugs.llvm.org/show_bug.cgi?id=43712

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D70170
2019-11-18 17:23:40 +03:00
Dmitry Preobrazhensky e25bc5e024 [AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32
See https://bugs.llvm.org/show_bug.cgi?id=40903

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69888
2019-11-08 16:38:56 +03:00
Stanislav Mekhanoshin 4312c4afd4 [AMDGPU] deduplicate tablegen predicates
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.

We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.

This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886

Differential Revision: https://reviews.llvm.org/D69815
2019-11-04 12:19:17 -08:00