Commit Graph

550 Commits

Author SHA1 Message Date
Jay Foad ec8c61efdf [AMDGPU] Allow multiple uses of the same literal
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.

AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.

Differential Revision: https://reviews.llvm.org/D100770
2021-04-20 16:44:01 +01:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Stanislav Mekhanoshin bc27a31801 [AMDGPU] Fix copyPhysReg to not produce unalined vgpr access
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.

Differential Revision: https://reviews.llvm.org/D98549
2021-03-15 14:14:30 -07:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Jay Foad 70f013fd3b [AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_*
D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject
V_MOVs with extra implicit operands, but it accidentally rejected all
V_MOVs because of their implicit use of exec. Fix it but avoid adding a
moderately expensive call to MI.getDesc().getNumImplicitUses().

In real graphics shaders this changes quite a few vgpr copies into move-
immediates, which is good for avoiding stalls on GFX10.

Differential Revision: https://reviews.llvm.org/D98347
2021-03-10 16:18:12 +00:00
Ruiling Song f0ccdde3c9 [AMDGPU] Remove SI_MASK_BRANCH
This is already deprecated, so remove code working on this.
Also update the tests by using S_CBRANCH_EXECZ instead of SI_MASK_BRANCH.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D97545
2021-03-09 09:13:23 +08:00
Piotr Sobczak 4672bac177 [AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
* The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96258
2021-03-03 14:19:16 +01:00
Piotr Sobczak c3ce7bae80 [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm
 * Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257
2021-03-03 09:33:57 +01:00
Jay Foad 3ad5216ed8 [AMDGPU] Better codegen for i64 bitreverse
Differential Revision: https://reviews.llvm.org/D97547
2021-02-26 15:51:36 +00:00
Matt Arsenault 78b6d73a93 AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Piotr Sobczak c72a63b4b0 [AMDGPU] Add implicit vcc_lo on S_CBRANCH_VCCNZ in wave32
* Update skip-if-dead.ll with tests for wave32.
* Fix the crash in verifier in one newly enabled test by adding
  missing fixImplicitOperands in branch insertion code.

```
*** Bad machine code: Using an undefined physical register ***
- function:    test_kill_divergent_loop
- basic block: %bb.2 bb (0xad96308)
- instruction: S_CBRANCH_VCCNZ %bb.1, implicit $vcc_lo
- operand 1:   implicit $vcc_lo
LLVM ERROR: Found 1 machine code errors.
```

* Simplify "cbranch_kill" to not use interp instructions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96793
2021-02-17 15:14:57 +01:00
Carl Ritson c16f776028 [AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Use live lane tracking to enable early termination of shader
at any point in control flow.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94746
2021-02-11 20:31:29 +09:00
Jay Foad a4b1df8af3 [AMDGPU] Use named unified buffer format constant. NFC. 2021-02-08 17:34:36 +00:00
Carl Ritson 0e8f50595e [AMDGPU] Mark V_SET_INACTIVE as defining SCC
V_SET_INACTIVE is implemented with S_NOT which clobbers SCC.
Mark sure it is marked appropriately.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95509
2021-01-29 09:46:41 +09:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
Sebastian Neubauer 6a195491b6 [AMDGPU] Fix failing assert with scratch ST mode
In ST mode, flat scratch instructions have neither an sgpr nor a vgpr
for the address. This lead to an assertion when inserting hard clauses.

Differential Revision: https://reviews.llvm.org/D94406
2021-01-12 09:54:02 +01:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Sebastian Neubauer 91445979be [AMDGPU] Unify flat offset logic
Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into
AMDGPUBaseInfo.

Differential Revision: https://reviews.llvm.org/D93287
2020-12-15 14:59:59 +01:00
Austin Kerbow 4aa842a800 [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Matt Arsenault e722943e05 AMDGPU: Factor out large flat offset splitting 2020-11-13 11:22:13 -05:00
Stanislav Mekhanoshin 5ab1702129 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin cf6565f6d0 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad f94fd1c8ca [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Stanislav Mekhanoshin d5a465866e [AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
2020-11-09 08:05:20 -08:00
Sebastian Neubauer a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Sebastian Neubauer 31a0b2834f [AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.

This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.

Differential Revision: https://reviews.llvm.org/D90596
2020-11-04 18:43:19 +01:00
Matt Arsenault 790f5771fd AMDGPU: Fix missing writelane cases to skip with exec=0 2020-10-30 11:15:11 -04:00
Jay Foad 58de4b2053 [AMDGPU] Use pseudo instructions for readlane/writelane
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".

All the codegen changes are caused by the post-RA scheduler no longer
treating readlane/writelane as scheduling barriers due to having
unmodelled side effects. (The pseudos are hasSideEffects = 0, but the
real instructions are hasSideEffects = ? which TableGen conservatively
treats as 1.)

Differential Revision: https://reviews.llvm.org/D90401
2020-10-29 16:00:53 +00:00
Jay Foad 69f5105f5c [AMDGPU] Simplify insertNoops functions. NFC. 2020-10-29 10:55:20 +00:00
Jay Foad 5b91a6a88b [AMDGPU] Allow some modifiers on VOP3B instructions
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.

This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".

Differential Revision: https://reviews.llvm.org/D90296
2020-10-28 21:54:14 +00:00
Stanislav Mekhanoshin 2e64ad9494 [AMDGPU] Fixed isLegalRegOperand() with physregs
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.

That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.

Differential Revision: https://reviews.llvm.org/D90064
2020-10-23 11:33:34 -07:00
vpykhtin 00255f4192 [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse.
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393.

First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter.

I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among.
modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary.

Differential Revision: https://reviews.llvm.org/D89386
2020-10-23 19:17:48 +03:00
Matt Arsenault 8a59d4b654 AMDGPU: Don't query for TII in TII 2020-10-23 10:34:24 -04:00
Matt Arsenault d61996473d AMDGPU: Increase branch size estimate with offset bug
This will be relaxed to insert a nop if the offset hits the bad value,
so over estimate branch instruction sizes.
2020-10-23 10:34:24 -04:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Matt Arsenault ce16b6835b AMDGPU: Don't kill super-register with overlapping copy
This would end up killing part of the result super-register, resulting
in a verifier error on a later use of the overlapping registers.  We
could add kills of any non-aliasing registers, but we should be moving
away from relying on kill flags.
2020-10-16 09:34:35 -04:00
Carl Ritson b70cb50204 [AMDGPU] Minimize number of s_mov generated by copyPhysReg
Generate the minimal set of s_mov instructions required when
expanding a SGPR copy operation in copyPhysReg.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89187
2020-10-15 22:35:02 +09:00
Ruiling Song b215a26628 [AMDGPU] Update LiveVariables in convertToThreeAddress()
This can fix an asan failure like below.
==15856==ERROR: AddressSanitizer: use-after-poison on address ...
READ of size 8 at 0x6210001a3cb0 thread T0
    #0 llvm::MachineInstr::getParent()
    #1 llvm::LiveVariables::VarInfo::findKill()
    #2 TwoAddressInstructionPass::rescheduleMIBelowKill()
    #3 TwoAddressInstructionPass::tryInstructionTransform()
    #4 TwoAddressInstructionPass::runOnMachineFunction()

We need to update the Kills if we replace instructions. The Kills
may be later accessed within TwoAddressInstruction pass.

Differential Revision: https://reviews.llvm.org/D89092
2020-10-13 08:12:20 +08:00
Sebastian Neubauer 7f2a641aad [AMDGPU] Insert waterfall loops for divergent calls
Extend loadSRsrcFromVGPR to allow moving a range of instructions into
the loop. The call instruction is surrounded by copies into physical
registers which should be part of the waterfall loop.

Differential Revision: https://reviews.llvm.org/D88291
2020-10-12 17:16:11 +02:00
Sebastian Neubauer b4264210f2 [AMDGPU] Remove SIInstrInfo::calculateLDSSpillAddress
This function does not seem to be used anymore.

Differential Revision: https://reviews.llvm.org/D88904
2020-10-06 18:45:22 +02:00
Jay Foad e20f459229 [AMDGPU] Simplify getNumFlatOffsetBits. NFC.
Remove some checks that have already been done in the only caller.
2020-10-01 15:24:09 +01:00
Jay Foad 866d9b03f2 [AMDGPU] Tiny cleanup in isLegalFLATOffset. NFC. 2020-10-01 14:26:03 +01:00
Carl Ritson 1e0500d4f7 [AMDGPU] Consider all SGPR uses as unique in constant bus verify
Fix the verifier so that overlapping SGPR operands are counted
independently.  We cannot assume that overlapping SGPR accesses
only count as a single constant bus use.
The exception is implicit uses which do not add to constant bus
usage (only) when overlapping.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87748
2020-09-24 10:52:40 +09:00
Matt Arsenault 0576f436e5 AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.

This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.

In a future change, this should always split the block.
2020-09-18 13:43:01 -04:00
Jay Foad 90777e2924 [AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.

Differential Revision: https://reviews.llvm.org/D87446
2020-09-16 16:10:47 +01:00
Michael Liao bf41c4d29e [codegen] Ensure target flags are cleared/set properly. NFC.
- When an operand is changed into an immediate value or like, ensure their
  target flags being cleared or set properly.

Differential Revision: https://reviews.llvm.org/D87109
2020-09-03 18:37:39 -04:00
Piotr Sobczak 4e9d207117 [AMDGPU] Preserve vcc_lo when shrinking V_CNDMASK
There is no justification for changing vcc_lo to vcc
when shrinking V_CNDMASK, and such a change could
later confuse live variable analysis.

Make sure the original register is preserved.

Differential Revision: https://reviews.llvm.org/D86541
2020-08-27 10:22:50 +02:00