Commit Graph

574 Commits

Author SHA1 Message Date
Christudasan Devadasan 4f5ba46e16 [AMDGPU] Set wait state for meta instructions to zero
It looked more reasonable to set the wait state to
zero for all non-instructions. With that we can avoid
the special handling for them in `getWaitStatesSince`
and `AdvanceCycle`. This NFC patch makes the handling
more generic.
2021-08-18 01:46:59 -04:00
Christudasan Devadasan 686607676f [AMDGPU] Skip pseudo MIs in hazard recognizer
Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE
are only placeholders to prevent certain unwanted
transformations and will get discarded during assembly
emission. They should not be counted during nop insertion.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108022
2021-08-16 23:11:14 -04:00
Michael Liao b0402a35fc [amdgpu] Add 64-bit PC support when expanding unconditional branches.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106445
2021-07-26 14:50:30 -04:00
Carl Ritson 6efb3220b4 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Stanislav Mekhanoshin 9625ca5b60 [AMDGPU] Mark relevant rematerializable VOP2 instructions
Differential Revision: https://reviews.llvm.org/D106023
2021-07-21 14:24:59 -07:00
Stanislav Mekhanoshin 4eb24817ec [AMDGPU] Mark all relevant VOP1 instructions rematerializable
Differential Revision: https://reviews.llvm.org/D105919
2021-07-21 14:05:32 -07:00
Sebastian Neubauer afd895709d [AMDGPU] Use isMetaInstruction for instruction size
Meta instructions have a size of 0. Use isMetaInstruction instead of
listing them explicitly.

Differential Revision: https://reviews.llvm.org/D106043
2021-07-15 12:23:11 +02:00
Stanislav Mekhanoshin 76b7d3432e [AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization
Any def of EXEC prevents rematerialization of any VOP instruction
because of the physreg use. Create a callback to check if the
physreg use can be ingored to allow rematerialization.

Differential Revision: https://reviews.llvm.org/D105836
2021-07-14 13:03:58 -07:00
Sebastian Neubauer 9d72c0ad43 [AMDGPU] Mark waterfall loops as SI_WATERFALL_LOOP
This way, they can be detected later, e.g. by the
SIOptimizeVGPRLiveRange pass.

Differential Revision: https://reviews.llvm.org/D105467
2021-07-13 12:15:08 +02:00
Stanislav Mekhanoshin d46d534dbb [AMDGPU] Make some VOP1 instructions rematerializable
This is a pilot change to verify the logic. The rest will be
done in a same way, at least the rest of VOP1.

Differential Revision: https://reviews.llvm.org/D105742
2021-07-12 23:43:45 -07:00
Stanislav Mekhanoshin 661577e698 [AMDGPU] Fix immediate sign during V_MOV_B64_PSEUDO expansion
Creating a V_MOV_B32 with zero extended immediate source
prevented conversion to V_BFREV_B32.

Differential Revision: https://reviews.llvm.org/D105235
2021-07-01 09:00:29 -07:00
Stanislav Mekhanoshin 381ded345b [AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants
This is to allow 64 bit constant rematerialization. If a constant
is split into two separate moves initializing sub0 and sub1 like
now RA cannot rematerizalize a 64 bit register.

This gives 10-20% uplift in a set of huge apps heavily using double
precession math.

Fixes: SWDEV-292645

Differential Revision: https://reviews.llvm.org/D104874
2021-06-30 11:45:38 -07:00
Piotr Sobczak f38a8b54ea [AMDGPU] Fix 224-bit spills
Related to D104622.

Differential Revision: https://reviews.llvm.org/D105109
2021-06-29 17:52:16 +02:00
Carl Ritson 98f48723f2 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622
2021-06-24 12:41:22 +09:00
Carl Ritson f8816c7400 [AMDGPU] Add v5f32/VReg_160 support for MIMG instructions
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are
required for a MIMG address operand.

Maintain _V8 instruction variants of pseudo instructions allowing
assembly prior to GFX10 to work as-is.  Currently the validator
can tell for GFX10 what the correct size is, so will disallow
oversize address registers.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103672
2021-06-08 11:11:40 +09:00
Stanislav Mekhanoshin 9e2e49328f [AMDGPU] All GWS instructions need aligned VGPR on gfx90a
Fixes: SWDEV-288006

Differential Revision: https://reviews.llvm.org/D103197
2021-06-01 17:08:03 -07:00
Brendon Cahoon 3f7b7e7393 [AMDGPU] Update SCC defs to VCC when uses are changed to VCC
The FixSGPRCopies pass converts instructions to VALU when
removing illegal VGPR to SGPR copies. Instructions that use SCC
are changed to use VCC instead. When that happens, the pass must
also change instructions that define SCC to define VCC.

The pass was not changing the SCC definition when an ADDC is
converted due to a input that is a VGPR to SGPR copy. But, the
initial ADD insruction, which define SCC, is not converted.
This causes a compilation failure due to a use of an undefined
physical register.

This patch adds code that inserts the SCC definition in the
MoveToVALU worklist when a SCC use is converted to a VCC use.

Differential Revision: https://reviews.llvm.org/D102111
2021-05-14 18:05:05 -04:00
Matt Arsenault c7cff08f79 AMDGPU: Fix assert when rewriting saddr d16 loads
moveOperands does not handle moving tied operands since it would
generally have to fixup the tied operand references. Avoid the assert
by untying and retying after the modification. These in place
modifications really aren't managable.
2021-05-14 13:24:19 -04:00
Jay Foad 7f81c5a5ba [AMDGPU] getMemOperandsWithOffset: add vaddr operand for stack access BUF instructions
A consequence is that checkInstOffsetsDoNotOverlap can now distinguish
sp+offset from fp+offset, so it knows that it shouldn't try to work out
whether the accesses overlap just by comparing the offsets. For example
in these two instructions:

MIR:
BUFFER_STORE_DWORD_OFFSET %0:vgpr_32(s32), $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 4 into stack + 4, addrspace 5)
%4:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0.alloca, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from `i8 addrspace(5)* undef`, addrspace 5)

ISA:
buffer_store_dword v0, off, s[0:3], s32 offset:4
buffer_load_dword v0, off, s[0:3], s34

Differential Revision: https://reviews.llvm.org/D73957
2021-05-14 10:10:43 +01:00
David Stuttard 72d570ca08 [AMDGPU][AsmParser/Disassembler] Correct A16 and G16 handling
A16 support for image instructions assembly/disassembly (gfx10) was missing

Also refactor MIMG op addr size calcs to common function

We'd got 3 places where the same operation was being done.

One test is now marked XFAIL until a related codegen patch is in place

Differential Revision: https://reviews.llvm.org/D102231

Change-Id: I7e86e730ef8c71901457855cba570581f4f576bb
2021-05-14 09:25:44 +01:00
Sebastian Neubauer 13c0316239 [AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

Differential Revision: https://reviews.llvm.org/D101292
2021-05-07 14:51:32 +02:00
Stanislav Mekhanoshin 4d6ebe8ac0 [AMDGPU] Change FLAT Scratch SADDR to VADDR form in moveToVALU
Extend the legalization of global SADDR loads and stores
with changing to VADDR to the FLAT scratch instructions.

Differential Revision: https://reviews.llvm.org/D101408
2021-05-03 10:57:14 -07:00
Stanislav Mekhanoshin 89a94be16b [AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU
Instead of legalizing saddr operand with a readfirstlane
when address is moved from SGPR to VGPR we can just
change the opcode.

Differential Revision: https://reviews.llvm.org/D101405
2021-05-03 10:36:26 -07:00
David Stuttard a67a377014 [AMDGPU] Tidy up some simple expressions for clarity NFC
Slight refactor for clarity.

Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5

Differential Revision: https://reviews.llvm.org/D101610
2021-04-30 11:13:54 +01:00
Jay Foad ec8c61efdf [AMDGPU] Allow multiple uses of the same literal
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.

AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.

Differential Revision: https://reviews.llvm.org/D100770
2021-04-20 16:44:01 +01:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Stanislav Mekhanoshin bc27a31801 [AMDGPU] Fix copyPhysReg to not produce unalined vgpr access
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.

Differential Revision: https://reviews.llvm.org/D98549
2021-03-15 14:14:30 -07:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Jay Foad 70f013fd3b [AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_*
D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject
V_MOVs with extra implicit operands, but it accidentally rejected all
V_MOVs because of their implicit use of exec. Fix it but avoid adding a
moderately expensive call to MI.getDesc().getNumImplicitUses().

In real graphics shaders this changes quite a few vgpr copies into move-
immediates, which is good for avoiding stalls on GFX10.

Differential Revision: https://reviews.llvm.org/D98347
2021-03-10 16:18:12 +00:00
Ruiling Song f0ccdde3c9 [AMDGPU] Remove SI_MASK_BRANCH
This is already deprecated, so remove code working on this.
Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D97545
2021-03-09 09:13:23 +08:00
Piotr Sobczak 4672bac177 [AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
* The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96258
2021-03-03 14:19:16 +01:00
Piotr Sobczak c3ce7bae80 [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm
 * Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257
2021-03-03 09:33:57 +01:00
Jay Foad 3ad5216ed8 [AMDGPU] Better codegen for i64 bitreverse
Differential Revision: https://reviews.llvm.org/D97547
2021-02-26 15:51:36 +00:00
Matt Arsenault 78b6d73a93 AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Piotr Sobczak c72a63b4b0 [AMDGPU] Add implicit vcc_lo on S_CBRANCH_VCCNZ in wave32
* Update skip-if-dead.ll with tests for wave32.
* Fix the crash in verifier in one newly enabled test by adding
  missing fixImplicitOperands in branch insertion code.

```
*** Bad machine code: Using an undefined physical register ***
- function:    test_kill_divergent_loop
- basic block: %bb.2 bb (0xad96308)
- instruction: S_CBRANCH_VCCNZ %bb.1, implicit $vcc_lo
- operand 1:   implicit $vcc_lo
LLVM ERROR: Found 1 machine code errors.
```

* Simplify "cbranch_kill" to not use interp instructions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96793
2021-02-17 15:14:57 +01:00
Carl Ritson c16f776028 [AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Use live lane tracking to enable early termination of shader
at any point in control flow.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94746
2021-02-11 20:31:29 +09:00
Jay Foad a4b1df8af3 [AMDGPU] Use named unified buffer format constant. NFC. 2021-02-08 17:34:36 +00:00
Carl Ritson 0e8f50595e [AMDGPU] Mark V_SET_INACTIVE as defining SCC
V_SET_INACTIVE is implemented with S_NOT which clobbers SCC.
Mark sure it is marked appropriately.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95509
2021-01-29 09:46:41 +09:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
Sebastian Neubauer 6a195491b6 [AMDGPU] Fix failing assert with scratch ST mode
In ST mode, flat scratch instructions have neither an sgpr nor a vgpr
for the address. This lead to an assertion when inserting hard clauses.

Differential Revision: https://reviews.llvm.org/D94406
2021-01-12 09:54:02 +01:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Sebastian Neubauer 91445979be [AMDGPU] Unify flat offset logic
Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into
AMDGPUBaseInfo.

Differential Revision: https://reviews.llvm.org/D93287
2020-12-15 14:59:59 +01:00
Austin Kerbow 4aa842a800 [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Matt Arsenault e722943e05 AMDGPU: Factor out large flat offset splitting 2020-11-13 11:22:13 -05:00
Stanislav Mekhanoshin 5ab1702129 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin cf6565f6d0 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad f94fd1c8ca [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00