Commit Graph

6124 Commits

Author SHA1 Message Date
Sebastian Neubauer 13c0316239 [AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

Differential Revision: https://reviews.llvm.org/D101292
2021-05-07 14:51:32 +02:00
David Stuttard 606d4e8061 AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.

Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
2021-05-07 13:42:57 +01:00
Simon Pilgrim 280aa3415e [DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.

Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.

I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).

NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.

Differential Revision: https://reviews.llvm.org/D101987
2021-05-07 13:12:30 +01:00
David Stuttard 793b4b2603 Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI"
This reverts commit 442de0c1ad.
2021-05-07 12:49:17 +01:00
David Stuttard 442de0c1ad AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
2021-05-07 12:19:49 +01:00
Sebastian Neubauer 98e5ede604 [AMDGPU] Serialize MFInfo::ScavengeFI
Serialize ScavengeFI from SIMachineFunctionInfo into yaml.

ScavengeFI is not used outside of the PrologEpilogInserter,
so this shouldn't change anything.

Differential Revision: https://reviews.llvm.org/D101367
2021-05-07 11:15:25 +02:00
Stanislav Mekhanoshin c714d03785 [AMDGPU] Expose __builtin_amdgcn_perm for v_perm_b32
Differential Revision: https://reviews.llvm.org/D102022
2021-05-06 16:17:33 -07:00
Stanislav Mekhanoshin 28f1d018b1 [AMDGPU] Fix 64 bit DPP validation
AMDGPUAsmParser::isSupportedDPPCtrl() was failing to correctly
find a DPP register operand, regadless of the position it is
always src0. Moved this check into a new validateDPP() method
where we have full instruction already. In particular it was
failing to reject this case:

v_cvt_u32_f64 v5, v[0:1] quad_perm:[0,2,1,1] row_mask:0xf bank_mask:0xf

Essentially it was broken for any case where size of dst and
src0 differ.

It also improves the diagnostics with a proper error message.

The check in the InstPrinter also drops verification of the dst
register as it does not have anything to do with the dpp operand.

Differential Revision: https://reviews.llvm.org/D101930
2021-05-06 08:40:26 -07:00
Austin Kerbow 172d746e16 [AMDGPU][NFC] Fix typos in SIFormMemoryClauses description
NFC.
2021-05-06 07:47:39 -07:00
Jay Foad 9e026273b0 [AMDGPU] SIInsertHardClauses: move more stuff into the class. NFC. 2021-05-06 14:47:54 +01:00
Carl Ritson 67cfefebbb [AMDGPU] Fix WQM failure with single block inactive demote
Instruction test for inactive kill/demote needs to be based on
actual opcode not whether instruction would be lowered to demote.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D101966
2021-05-06 21:02:26 +09:00
Jay Foad 7c706af03b [AMDGPU] SIFoldOperands: clean up tryConstantFoldOp
First clean up the strange API of tryConstantFoldOp where it took an
immediate operand value, but no indication of which operand it was the
value for.

Second clean up the loop that calls tryConstantFoldOp so that it does
not have to restart from the beginning every time it folds an
instruction.

This is NFCI but there are some minor changes caused by the order in
which things are folded.

Differential Revision: https://reviews.llvm.org/D100031
2021-05-06 09:55:22 +01:00
Stanislav Mekhanoshin ab90ae6f47 [AMDGPU] Switch AnnotateUniformValues to MemorySSA
This shall speedup compilation and also remove threshold
limitations used by memory dependency analysis.

It also seem to fix the bug in the coalescer_remat.ll
where an SMRD load was used in presence of a potentially
clobbering store.

Fixes: SWDEV-272132

Differential Revision: https://reviews.llvm.org/D101962
2021-05-05 18:34:41 -07:00
Austin Kerbow 6617a5a5ea [AMDGPU] Move insertion of function entry waitcnt later
This allows tracking these as preexisting waitcnt.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D101380
2021-05-05 17:58:38 -07:00
Austin Kerbow f5199d7ae0 [AMDGPU] Revise handling of preexisting waitcnt
Preexisting waitcnt may not update the scoreboard if the instruction
being examined needed to wait on fewer counters than what was encoded in
the old waitcnt instruction. Fixing this results in the elimination of
some redudnat waitcnt.

These changes also enable combining consecutive waitcnt into a single
S_WAITCNT or S_WAITCNT_VSCNT instruction.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100281
2021-05-05 17:21:33 -07:00
RamNalamothu 41f8b8e807 [MCAsmInfo] Support UsesCFIForDebug for targets with no exception handling
This change enables emitting CFI unwind information for debugging purpose
for targets with MCAsmInfo::ExceptionsType == ExceptionHandling::None.

Currently generating CFI unwind information is entangled with supporting
the exceptions, even when AsmPrinter explicitly recognizes that the unwind
tables are being generated as debug information.

In fact, the unwind information is not generated even if we specify
--force-dwarf-frame-section, unless exceptions are enabled. The LIT test
llvm/test/CodeGen/AMDGPU/debug_frame.ll demonstrates this behavior.

Enable this option for AMDGPU to prepare for future patches which add
complete CFI support.

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D78778
2021-05-06 04:53:45 +05:30
Vang Thao 7a41639c60 [AMDGPU][GlobalISel] Widen 1 and 2 byte scalar loads
Widen 1 and 2 byte scalar loads to 4 bytes when sufficiently
aligned to avoid using a global load.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100430
2021-05-05 15:18:19 -07:00
Stanislav Mekhanoshin 909a5ccf3b [AMDGPU] Improve global SADDR selection
An address can be a uniform sum of two i64 bit values.
That regularly happens in a loop where index is an induction
variable promoted to 64 bit by the LSR. We can materialize
zero in a VGPR and still use SADDR form of the load.

Differential Revision: https://reviews.llvm.org/D101591
2021-05-05 14:44:21 -07:00
Matt Arsenault 8fc4eb9e73 AMDGPU/GlobalISel: Remove unnecessary override
This is the same as the default implementation
2021-05-05 17:35:02 -04:00
Matt Arsenault fa0b93b5a0 GlobalISel: Use DAG call lowering infrastructure in a more compatible way
Unfortunately the current call lowering code is built on top of the
legacy MVT/DAG based code. However, GlobalISel was not using it the
same way. In short, the DAG passes legalized types to the assignment
function, and GlobalISel was passing the original raw type if it was
simple.

I do believe the DAG lowering is conceptually broken since it requires
picking a type up front before knowing how/where the value will be
passed. This ends up being a problem for AArch64, which wants to pass
i1/i8/i16 values as a different size if passed on the stack or in
registers.

The argument type decision is split across 3 different places which is
hard to follow. SelectionDAG builder uses
getRegisterTypeForCallingConv to pick a legal type, tablegen gives the
illusion of controlling the type, and the target may have additional
hacks in the C++ part of the call lowering. AArch64 hacks around this
by not using the standard AnalyzeFormalArguments and special casing
i1/i8/i16 by looking at the underlying type of the original IR
argument.

I believe people have generally assumed the calling convention code is
processing the original types, and I've discovered a number of dead
paths in several targets.

x86 actually relies on the opposite behavior from AArch64, and relies
on x86_32 and x86_64 sharing calling convention code where the 64-bit
cases implicitly do not work on x86_32 due to using the pre-legalized
types.

AMDGPU targets without legal i16/f16 have always used a broken ABI
that promotes to i32/f32. GlobalISel accidentally fixed this to be the
ABI we should have, but this fixes it so we're using the worse ABI
that is compatible with the DAG. Ideally we would fix the DAG to match
the old GlobalISel behavior, but I don't wish to fight that battle.

A new native GlobalISel call lowering framework should let the target
process the incoming types directly.

CCValAssigns select a "ValVT" and "LocVT" but the meanings of these
aren't entirely clear. Different targets don't use them consistently,
even within their own call lowering code. My current belief is the
intent was "ValVT" is supposed to be the legalized value type to use
in the end, and and LocVT was supposed to be the ABI passed type
(which is also legalized).

With the default CCState::Analyze functions always passing the same
type for these arguments, these only differ when the TableGen part of
the lowering decide to promote the type from one legal type to
another. AArch64's i1/i8/i16 hack ends up inverting the meanings of
these values, so I had to add an additional hack to let the target
interpret how large the argument memory is.

Since targets don't consistently interpret ValVT and LocVT, this
doesn't produce quite equivalent code to the initial DAG
lowerings. I've opted to consistently interpret LocVT as the in-memory
size for stack passed values, and ValVT as the register type to assign
from that memory. We therefore produce extending loads directly out of
the IRTranslator, whereas the DAG would emit regular loads of smaller
values. This will also produce loads/stores that are wider than the
argument value if the allocated stack slot is larger (and there will
be undef padding bytes). If we had the optimizations to reduce
load/stores based on truncated values, this wouldn't produce a
different end result.

Since ValVT/LocVT are more consistently interpreted, we now will emit
more G_BITCASTS as requested by the CCAssignFn. For example AArch64
was directly assigning types to some physical vector registers which
according to the tablegen spec should have been casted to a vector
with a different element type.

This also moves the responsibility for inserting
G_ASSERT_SEXT/G_ASSERT_ZEXT from the target ValueHandlers into the
generic code, which is closer to how SelectionDAGBuilder works.

I had to xfail an x86 test since I don't see a quick way to fix it
right now (I filed bug 50035 for this). It's broken independently of
this change, and only triggers since now we end up with more ands
which hit the improperly handled selection pattern.

I also observed that FP arguments that need promotion (e.g. f16 passed
as f32) are broken, and use regular G_TRUNC and G_ANYEXT.

TLDR; the current call lowering infrastructure is bad and nobody has
ever understood how it chooses types.
2021-05-05 17:35:02 -04:00
Julien Pagès a1ed39df96 [AMDGPU] Select V_CVT_*16_F16 more often
Improve the code generation of fp_to_sint
and fp_to_uint for integer on 16-bits.

Differential Revision: https://reviews.llvm.org/D101481

Patch by Julien Pagès!
2021-05-05 08:57:51 +01:00
Baptiste Saleil 6a17609157 [AMDGPU] Disable the scalar IR, SDWA and load store vectorizer passes at -O1
This patch disables some of the passes at -O1. These passes have a significant
impact on compilation time, so we only want them to be enabled starting from -O2.

Differential Revision: https://reviews.llvm.org/D101414
2021-05-04 16:44:39 -04:00
David Stuttard 8f2948731e [AMDGPU][AsmParser] Correct the order of optional operands to mimg
Ordering of operands was incorrect meaning that a16 operand was treated as tfe

Differential Revision: https://reviews.llvm.org/D101618

Change-Id: I3b15e71ef5ff625f19f52823414ab684d76aca33
2021-05-04 13:14:48 +01:00
Stanislav Mekhanoshin 4d6ebe8ac0 [AMDGPU] Change FLAT Scratch SADDR to VADDR form in moveToVALU
Extend the legalization of global SADDR loads and stores
with changing to VADDR to the FLAT scratch instructions.

Differential Revision: https://reviews.llvm.org/D101408
2021-05-03 10:57:14 -07:00
Stanislav Mekhanoshin 89a94be16b [AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU
Instead of legalizing saddr operand with a readfirstlane
when address is moved from SGPR to VGPR we can just
change the opcode.

Differential Revision: https://reviews.llvm.org/D101405
2021-05-03 10:36:26 -07:00
Sebastian Neubauer c0c8548b70 [AMDGPU] Do not annotate features for graphics
SITargetLowering::LowerFormalArguments asserts that none of these
features are used for graphics calling conventions, so
AnnotateKernelFeatures should not add them.

Differential Revision: https://reviews.llvm.org/D101534
2021-05-03 10:33:11 +02:00
Jay Foad 7e43483dd1 [AMDGPU] Remove set_gpr_idx instructions in conditional blocks
SIPreEmitPeephole did not try to remove redundant s_set_gpr_idx_*
instructions in blocks that end with a conditional branch instruction.
This seems like a simple oversight.

Differential Revision: https://reviews.llvm.org/D101629
2021-04-30 22:15:45 +01:00
Daniil Fukalov 3489c2d7b1 [TTI] NFC: Change getTypeLegalizationCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen, kparzysz

Differential Revision: https://reviews.llvm.org/D101533
2021-04-30 22:51:51 +03:00
David Stuttard a67a377014 [AMDGPU] Tidy up some simple expressions for clarity NFC
Slight refactor for clarity.

Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5

Differential Revision: https://reviews.llvm.org/D101610
2021-04-30 11:13:54 +01:00
Jay Foad f251379a91 [AMDGPU] Simplify getWaitStatesSince. NFC. 2021-04-30 08:58:24 +01:00
Christudasan Devadasan 544be70864 [AMDGPU] Skip promote-alloca for insertelement/insertvalue users
It is difficult to track the users of vector and aggregate types.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D101562
2021-04-30 08:37:26 +05:30
Carl Ritson 424f1f6f96 [AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D101430
2021-04-30 09:18:56 +09:00
Carl Ritson 749702fc6b [AMDGPU] Remove dead early-out in GCNHazardRecognizer
Remove an early-out in wait state counting which can never be
taken.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D101520
2021-04-30 08:55:49 +09:00
Jay Foad 16d707e656 [AMDGPU] Fix v_swap_b32 formation on physical registers
As explained in the comments, matchSwap matches:

// mov t, x
// mov x, y
// mov y, t

and turns it into:

// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y

On physical registers we don't have full use-def chains so the check
for T being live-out was not working properly with subregs/superregs.

Differential Revision: https://reviews.llvm.org/D101546
2021-04-29 20:53:40 +01:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Petar Avramovic c34900e133 AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return
When atomic image intrinsic return value is unused, register class for
destination of a sub-register copy of return value ends up not being set.
This copy then hits 'Register class not set' assert later.
If return value has uses, register class is determined by use instruction.
Fix is to not create sub-register copy when image intrinsic destination has
no uses because it would be deleted by dead-mi-elimination later anyway.

Differential Revision: https://reviews.llvm.org/D101448
2021-04-29 20:56:03 +02:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Sebastian Neubauer 9569d5ba02 [AMDGPU] Allow buildSpillLoadStore in empty bb
This allows calling buildSpillLoadStore for an empty basic block, where
MI points at the end of the block instead of to an instruction.

This only happens with downstream CFI changes, so I was not able to
create a testcase that works with upstream LLVM.

Differential Revision: https://reviews.llvm.org/D101356
2021-04-29 12:53:20 +02:00
Joe Nash 168228d76a [AMDGPU] Make some VOP3 insts commutable
Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of some U/I/B vop3 instructions.

This patch revises d35d8da7d6.
It contains the commute opportunities excluding float insts

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D101474

Change-Id: I62938173d750453839f2457a3851661a29135faf
2021-04-28 13:59:08 -04:00
Jay Foad 12011b5217 [AMDGPU] GCNHazardRecognizer: ignore all meta instructions
This is hopefully NFC, but should be more robust in ignoring all
instructions that should be ignored, instead of just some of them.

Differential Revision: https://reviews.llvm.org/D101372
2021-04-27 20:17:15 +01:00
Jay Foad dc2f6bf566 [AMDGPU] Minor refactoring in AMDGPUUnifyDivergentExitNodes. NFC.
Make unifyReturnBlockSet a member function so we don't have to pass TTI
around as an argument.
2021-04-27 14:21:51 +01:00
Petar Avramovic 8110fcc8fc AMDGPU/GlobalISel: Fix negative offset folding for buffer_load
Buffer_load does unsigned offset calculations. Don't fold
operands of 32-bit add that are likely to cause unsigned add
overflow (common case is when one of the operands is negative).

Differential Revision: https://reviews.llvm.org/D91336
2021-04-27 14:45:22 +02:00
Petar Avramovic fb7be0d912 AMDGPU/GlobalISel: Remove redundant G_FCANONICALIZE
Add basic version of isCanonicalized for global-isel. Copied from sdag.
Add post legalizer combine that deletes G_FCANONICALIZE when its input
is already Canonicalized.

Differential Revision: https://reviews.llvm.org/D96605
2021-04-27 12:26:37 +02:00
Petar Avramovic 4a9bc59867 AMDGPU/GlobalISel: Add integer med3 combines
Add signed and unsigned integer version of med3 combine.
Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1. Destination is med3
that corresponds to signedness of min/max in source.

Differential Revision: https://reviews.llvm.org/D90050
2021-04-27 11:52:23 +02:00
Baptiste Saleil caf1294d95 [AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pass and its associated test cases from
the tree.

Differential Revision: https://reviews.llvm.org/D101313

Change-Id: I0599169a7609c19a887f8d847a71e664030cc141
2021-04-26 17:21:49 -04:00
Sebastian Neubauer fcc40d9c17 [AMDGPU] Use MapVector for WWMReservedRegs
Use MapVector instead of SmallDenseMap because it has a deterministic
iteration order.

Differential Revision: https://reviews.llvm.org/D101299
2021-04-26 17:43:00 +02:00
Tim Renouf 8710eff6c3 [MC][AMDGPU][llvm-objdump] Synthesized local labels in disassembly
1. Add an accessor function to MCSymbolizer to retrieve addresses
   referenced by a symbolizable operand, but not resolved to a symbol.
   That way, the caller can synthesize labels at those addresses and
   then retry disassembling the section.

2. Implement that in AMDGPU -- a failed symbol lookup results in the
   address being added to a vector returned by the new function.

3. Use that in llvm-objdump when using MCSymbolizer (which only happens
   on AMDGPU) and SymbolizeOperands is on.

Differential Revision: https://reviews.llvm.org/D101145

Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e
2021-04-26 13:56:36 +01:00
Sebastian Neubauer 3366d81153 [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429

Reapply with fixed tests on window.
2021-04-23 18:09:24 +02:00
dfukalov 9ab17a60eb [TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D101151
2021-04-23 18:02:00 +03:00
Sebastian Neubauer 22d99cb63f Revert "[AMDGPU] Save WWM registers in functions"
This reverts commit 91464c30bf.

Seems to break tests on windows.
2021-04-23 16:38:50 +02:00
Sebastian Neubauer 91464c30bf [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429
2021-04-23 16:09:31 +02:00
Matt Arsenault b58332774f AMDGPU: Fix assert on inline asm on gfx90a
This was assuming all mayLoad instructions have one def.
2021-04-23 09:00:25 -04:00
Matt Arsenault ed633a1daa AMDGPU: Restore atomic fp feature on FP atomic instruction definitions
9931b1f7a4 switched this to checking for
the two specific subtargets, instead of the dedicated feature. This
broke supporting functions which force added the feature when emitting
targets that do not actually support them. This stil does not work for
the targets that use the gfx6/7 or gfx10 encodings.
2021-04-22 21:32:01 -04:00
Jay Foad 79cb3ba08f [AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands
STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so
there is no need to add another implicit use of $exec when lowering them
to V_MOV_B32 instructions.

Differential Revision: https://reviews.llvm.org/D100969
2021-04-22 09:19:47 +01:00
Matt Arsenault 987e52851e AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies 2021-04-21 21:58:18 -04:00
Stanislav Mekhanoshin f9d0d0d7e0 [AMDGPU] Lower regbanks reassign threshold to 15000
Let it work on a very small kernels only. Measurements showed
the performance benefit is not worth the compile time.

Differential Revision: https://reviews.llvm.org/D100904
2021-04-21 08:34:11 -07:00
dfukalov a8b35e0f52 [TTI] NFC: Change getVectorSplitCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100952
2021-04-21 17:32:02 +03:00
Matt Arsenault 70ab76a81b AMDGPU: Fix indirect tail calls
Fix a selection error on uniform callees, and use a regular call if
divergent.
2021-04-21 09:15:24 -04:00
Jay Foad ec8c61efdf [AMDGPU] Allow multiple uses of the same literal
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.

AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.

Differential Revision: https://reviews.llvm.org/D100770
2021-04-20 16:44:01 +01:00
Matt Arsenault 1cb8a9d595 AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers 2021-04-20 11:13:29 -04:00
Sebastian Neubauer 4897effb14 [AMDGPU] Add TransVALU to gfx10
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.

This doesn't seem to have any effect, but it should be more correct.

Differential Revision: https://reviews.llvm.org/D100123
2021-04-20 15:34:43 +02:00
Jay Foad 2aea830ec4 [AMDGPU] Use if instead of foreach in a few places. NFC. 2021-04-20 14:20:30 +01:00
Jay Foad edea476142 [AMDGPU] Use simpler alternatives to !foldl. NFC. 2021-04-20 12:59:04 +01:00
hsmahesha 840c4e4e90 [AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D100773
2021-04-20 16:17:15 +05:30
Jay Foad b22721f01a [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used
Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.

Differential Revision: https://reviews.llvm.org/D100760
2021-04-20 09:17:52 +01:00
madhur13490 6a4d9cb7e0 [AMDGPU] Remove error check for indirect calls and add missing queue-ptr
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100633
2021-04-20 00:35:17 +05:30
Jay Foad a02aa91313 [AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC. 2021-04-19 14:20:46 +01:00
Jay Foad ef443390a9 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Reapply after fixing dependent change D100188.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-19 12:08:02 +01:00
Jay Foad 323ef0eb45 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Reapply with a fix for using InstToErase after it had been erased.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-19 12:05:41 +01:00
Dmitry Preobrazhensky bcc29e0fcf [AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3
Disabled constants as carry in/out operands. See bug 48711.

Differential Revision: https://reviews.llvm.org/D100642
2021-04-19 13:42:31 +03:00
Yaxun (Sam) Liu 3597f02fd5 [AMDGPU] Add GlobalDCE before internalization pass
The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Reviewed by: Artem Belevich, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D98783
2021-04-17 11:25:25 -04:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Joe Nash a0ed70abde [AMDGPU] Remove redundant field from DPP8 def
These lines set the value to what it already was,
so they are redundant. NFC

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100664

Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3
2021-04-16 16:23:52 -04:00
Joe Nash 919236e608 [AMDGPU] NFC, Comment in disassembler for dpp8
Gives reasoning for convertDPP8.
Also corrects typo in Operand type comment.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100665

Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4
2021-04-16 16:21:47 -04:00
Christudasan Devadasan 97618522dc [AMDGPU] Remove dead dcode (NFC). 2021-04-16 23:03:31 +05:30
Joe Nash 7cc4a02fa2 [AMDGPU] Refactor VOP3P Profile and AsmParser, NFC
Refactors VOP3P tablegen and the AsmParser for VOP3P
for better extensibility. NFC intended

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100602

Change-Id: I038e3a772ac348bb18979cdf3e3ae2e9476dd411
2021-04-16 13:06:50 -04:00
hsmahesha 099dcb68a6 [AMDGPU] Refactor ds_read/ds_write related select code for better readability.
Part of the code related to ds_read/ds_write ISel is refactored, and the
corresponding comment is re-written for better readability, which would help
while implementing any future ds_read/ds_write ISel related modifications.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100300
2021-04-16 08:24:00 +05:30
Stanislav Mekhanoshin 13015ebd6f [AMDGPU] Factor out predicate FmaakFmamkF32Insts
Differential Revision: https://reviews.llvm.org/D100409
2021-04-15 12:29:16 -07:00
Stanislav Mekhanoshin d4385e483d [AMDGPU] Add new EmitDstSel field to VOPPofile. NFC.
Differential Revision: https://reviews.llvm.org/D100589
2021-04-15 12:07:08 -07:00
hsmahesha 82787eb228 [AMDGPU] Move LDS lowering related utility functions to a separate utils file.
Move some utility functions which are used within LDS lowering pass to a separate utils
file so that other LDS related passes can make use of them when required.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D100526
2021-04-16 00:15:48 +05:30
Arthur Eubanks c8f0a7c215 [NewPM] Cleanup IR printing instrumentation
Being lazy with printing the banner seems hard to reason with, we should print it
unconditionally first (it could also lead to duplicate banners if we
have multiple functions in -filter-print-funcs).

The printIR() functions were doing too many things. I separated out the
call from PrintPassInstrumentation since we were essentially doing two
completely separate things in printIR() from different callers.

There were multiple ways to generate the name of some IR. That's all
been moved to getIRName(). The printing of the IR name was also
inconsistent, now it's always "IR Dump on $foo" where "$foo" is the
name. For a function, it's the function name. For a loop, it's what's
printed by Loop::print(), which is more detailed. For an SCC, it's the
list of functions in parentheses. For a module it's "[module]", to
differentiate between a possible SCC with a function called "module".

To preserve D74814, we have to check if we're going to print anything at
all first. This is unfortunate, but I would consider this a special
case that shouldn't be handled in the core logic.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D100231
2021-04-15 09:50:55 -07:00
Sebastian Neubauer 7842e1725e [AMDGPU] Fix large return values with amdgpu_gfx
Returning in memory is not supported, so fall back to sret.
Also, extend i1 and i16 to i32. Otherwise, they would be passed through
memory.

Differential Revision: https://reviews.llvm.org/D100543
2021-04-15 14:57:56 +02:00
hsmahesha 4973b0c4e7 [AMDGPU] Disable forceful inline of non-kernel functions which use LDS.
Now since LDS uses within non-kernel functions are being handled in the
pass - LowerModuleLDS, we *NO* need to *forcefully* inline non-kernel
functions just because they use LDS. Do forceful inlining only when the
pass - LowerModuleLDS is not enabled. It is enabled by default.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D100481
2021-04-15 09:12:56 +05:30
Stanislav Mekhanoshin b7ebb25e53 [AMDGPU] Factor out SelectSAddrFI()
This is a service function generally useful for selection
of a FI in an SADDR. NFC for now, needed for future patch.

Differential Revision: https://reviews.llvm.org/D100406
2021-04-14 09:40:02 -07:00
Sander de Smalen 4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen 1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen 174e8f6c5e [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen 14b934f8a6 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00
hsmahesha e3070db0f7 [AMDGPU] Rename "LDS lowering" pass name.
Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to
`amdgpu-enable-lower-module-lds` as later is consistent and reads better.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D100441
2021-04-14 20:19:53 +05:30
Sebastian Neubauer 929edd4375 [AMDGPU] Mark scavenged SGPR as used
Otherwise it reuses the same register for storing the stack slot
offset if the stack slot offset is big.

Differential Revision: https://reviews.llvm.org/D100461
2021-04-14 14:55:01 +02:00
Sander de Smalen 2285dfb73f [TTI] NFC: Change getMinMaxReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100202
2021-04-13 14:21:00 +01:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
madhur13490 5682ae2fc6 [AMDGPU] Set implicit arg attributes for indirect calls
This patch adds attributes corresponding to
implicits to functions/kernels if
1. it has an indirect call OR
2. it's address is taken.

Once such attributes are set, rest of the codegen would work
out-of-box for indirect calls. This patch eliminates
the potential overhead -fixed-abi imposes even though indirect functions
calls are not used.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D99347
2021-04-13 13:15:13 +00:00
Sebastian Neubauer 6cc91adf1e [AMDGPU] Kill temporary register after restoring
Not a correctness issue, but the temporary register is not used
afterwards and should be dead.

Differential Revision: https://reviews.llvm.org/D100295
2021-04-12 14:20:03 +02:00
Dmitry Preobrazhensky 67b39661c8 [AMDGPU][MC][NFC] Removed extra spaces
Fixed bugs 49646, 49647.

Differential Revision: https://reviews.llvm.org/D100173
2021-04-12 13:33:19 +03:00
Sebastian Neubauer 7a8e65dd3d [AMDGPU] Fix ubsan error
The RegScavenger can be null sometimes, so a pointer is needed.

Fixes UBSan error introduced in f9a8c6a0e5.
2021-04-12 12:14:00 +02:00
Sebastian Neubauer b76c2a6c2b [AMDGPU] Fix saving fp and bp
Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.

This builds on the assumptions that
1. a function is never called with exec=0
2. lanes do not die in a function, i.e. exec!=0 in the function epilog
3. no new lanes are active when exiting the function, i.e. exec in the
   epilog is a subset of exec in the prolog.

Differential Revision: https://reviews.llvm.org/D96869
2021-04-12 11:52:55 +02:00
Sebastian Neubauer 32bc9a9bc3 [AMDGPU] Unify spill code
Instead of reimplementing spilling in prolog and epilog, reuse
buildSpillLoadStore.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D99269
2021-04-12 11:19:08 +02:00
Sebastian Neubauer f9a8c6a0e5 [AMDGPU] Save VGPR of whole wave when spilling
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes of the VGPR. We even need to save the VGPR if it is marked as
dead.

The generated code depends on two things:
- Can we scavenge an SGPR to save EXEC?
- And can we scavenge a VGPR?

If we can scavenge an SGPR, we
- save EXEC into the SGPR
- set the needed lane mask
- save the temporary VGPR
- write the spilled SGPR into VGPR lanes
- save the VGPR again to the target stack slot
- restore the VGPR
- restore EXEC

If we were not able to scavenge an SGPR, we do the same operations, but
everytime the temporary VGPR is written to memory, we
- write VGPR to memory
- flip exec (s_not exec, exec)
- write VGPR again (previously inactive lanes)

Surprisingly often, we are able to scavenge an SGPR, even though we are
at the brink of running out of SGPRs.
Scavenging a VGPR does not have a great effect (saves three instructions
if no SGPR was scavenged), but we need to know if the VGPR we use is
live before or not, otherwise the machine verifier complains.

Differential Revision: https://reviews.llvm.org/D96336
2021-04-12 11:01:38 +02:00
dfukalov 8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Mitch Phillips 092f288d36 Revert "[AMDGPU] Remove MachineDCE after SIFoldOperands"
This reverts commit 5a0117b2d0.

Reason: Dependent change d19a42eba9 broke
the ASan buildbots.
2021-04-09 15:47:44 -07:00
Mitch Phillips 3d4730a73f Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs"
This reverts commit d19a42eba9.

Reason: Broke the ASan buildbots. See the original phabricator review
for more details: https://reviews.llvm.org/D100188
2021-04-09 15:47:44 -07:00
Jay Foad 5a0117b2d0 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-09 20:41:09 +01:00
Jay Foad d19a42eba9 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-09 20:41:09 +01:00
Jay Foad a4ced03d34 [AMDGPU] SIFoldOperands: eagerly delete dead copies
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.

Differential Revision: https://reviews.llvm.org/D100117
2021-04-09 13:52:54 +01:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Sebastian Neubauer 36138db116 [AMDGPU] IsFlatScratch/Global -> FlatScratch/Global
Remove 'Is' from IsFlatScratch/Global. NFC

Differential Revision: https://reviews.llvm.org/D100108
2021-04-09 11:20:31 +02:00
Konstantin Zhuravlyov 4fae63c612 AMDGPU: Add gfx90c support to code object v2 for backwards compatibility
Differential Revision: https://reviews.llvm.org/D100126
2021-04-08 16:42:43 -04:00
Stanislav Mekhanoshin 627dab3dbf [AMDGPU] Check for all meta instrs in GCNRegBankReassign
It used to work correctly even with a KILL, but there is
no reason to consider meta instructions since they do not
create real HW uses.

Differential Revision: https://reviews.llvm.org/D100135
2021-04-08 13:41:10 -07:00
Stanislav Mekhanoshin 189310a140 [AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode
Fixes: SWDEV-274276

Differential Revision: https://reviews.llvm.org/D100072
2021-04-08 12:46:36 -07:00
Jay Foad a1a372dfb5 [AMDGPU] SIFoldOperands: remove an unneeded isReg check. NFC. 2021-04-08 16:37:43 +01:00
Jay Foad a250e91d10 [AMDGPU] SIFoldOperands: make use of emplace_back. NFC. 2021-04-08 14:34:10 +01:00
Jay Foad 2724b57ecd [AMDGPU] SIFoldOperands: remove an unneeded make_early_inc_range. NFC. 2021-04-08 14:32:36 +01:00
Jay Foad c28f79a0e3 [AMDGPU] SIFoldOperands: try harder to fold cndmask instructions
Look through copies to find more cases where the two values being
selected are identical. The motivation for this is just to be able to
remove the weird special case where tryFoldCndMask was called from
foldInstOperand, part way through folding a move-immediate into its
users, without regressing any lit tests.
2021-04-08 14:26:12 +01:00
Jay Foad 3344cd3a14 [AMDGPU] SIFoldOperands: make tryFoldCndMask a member function. NFC. 2021-04-08 14:05:29 +01:00
Sebastian Neubauer c10cc4ea27 [AMDGPU] Fix computing live registers in prolog
ScratchExecCopy needs to be marked as live, we cannot use that register
while EXEC is stored in there.

Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as
available is unnecessary, they should not be live at that point anway.

Differential Revision: https://reviews.llvm.org/D100098
2021-04-08 14:52:50 +02:00
Jay Foad 94a6fe43de [AMDGPU] SIFoldOperands: refactor tryFoldCndMask with early-outs. NFC. 2021-04-08 13:16:07 +01:00
hsmahesha ac64995ceb [AMDGPU] Only use ds_read/write_b128 for alignment >= 16
PS: Submitting on behalf of Jay.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100008
2021-04-08 08:12:05 +05:30
Stanislav Mekhanoshin 37878de503 Disable use of SCC bit from asm
Differential Revision: https://reviews.llvm.org/D100069
2021-04-07 15:32:17 -07:00
Tony Tye 4658cd4c18 [AMDGPU] Update gfx90a memory model support
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100070
2021-04-07 22:17:58 +00:00
Stanislav Mekhanoshin d5d412f2ae [AMDGPU] Split GCNRegBankReassign
Allow pass to work separately with SGPR, VGPR registers or both.
This is NFC now but will be needed to split RA for separate
SGPR and VGPR passes.

Differential Revision: https://reviews.llvm.org/D100063
2021-04-07 14:45:13 -07:00
Sebastian Neubauer 2dc6be5209 [AMDGPU] Update SGPRSpillVGPRCSR name. NFC
The struct is used for both, callee and caller-save registers now.
The frame index is not set for entrypoints, as we do not need to save
the registers then.
Update the struct name to reflect that.

Differential Revision: https://reviews.llvm.org/D99722
2021-04-07 16:30:40 +02:00
Jay Foad bf6cab6f07 [AMDGPU] SIFoldOperands: don't dump extra '\n' after MachineInstr. NFC. 2021-04-07 14:13:00 +01:00
Jay Foad 8f798566a3 [AMDGPU] SIFoldOperands: use isUseMIInFoldList. NFC. 2021-04-06 17:53:48 +01:00
Konstantin Zhuravlyov 844012940e AMDGPU: Add isBranch=1 to SOPP branch instructions
Differential Revision: https://reviews.llvm.org/D99955
2021-04-06 10:59:30 -04:00
Jay Foad efc7bf27f5 [AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad 005dcd196e [AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad ce9cca6c3a [AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask
This follows the pattern of the other tryFold* functions. NFC.
2021-04-06 15:23:58 +01:00
Jay Foad cf4f5292f6 [AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef
We are in SSA so getVRegDef is equivalent but simpler. NFC.
2021-04-06 15:23:58 +01:00
Jay Foad e9608a84d8 [AMDGPU][SDag] Add IMG init also for image_gather4 instructions
This fixes an oversight in D99747 which moved the IMG init code from
SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the
hasPostISelHook flag on gather4 instructions.

Differential Revision: https://reviews.llvm.org/D99953
2021-04-06 14:47:20 +01:00
Dmitry Preobrazhensky 3eadcb86ab [AMDGPU][MC][GFX9] Corrected SMEM decoding
Corrected SMEM decoding when IMM=0 and OFFSET>127

Fixed bug 49819 (https://bugs.llvm.org/show_bug.cgi?id=49819)

Differential Revision: https://reviews.llvm.org/D99804
2021-04-06 14:10:46 +03:00
Jay Foad fdc4f19e2f [AMDGPU] Remove SIAddIMGInit pass which is now unused
Differential Revision: https://reviews.llvm.org/D99748
2021-04-01 18:13:17 +01:00
Jay Foad 3d07a6d891 [AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic
Doing this during instruction selection avoids the cost of running
SIAddIMGInit which is yet another pass over the MIR.

Differential Revision: https://reviews.llvm.org/D99670
2021-04-01 18:13:17 +01:00
Jay Foad 4af6251cea [AMDGPU][SDag] Add IMG init in AdjustInstrPostInstrSelection
Doing this in a post-isel hook avoids the cost of running SIAddIMGInit
which is yet another pass over the MIR.

Differential Revision: https://reviews.llvm.org/D99747
2021-04-01 18:13:17 +01:00
Jay Foad b1fbfd9e4c [AMDGPU] Small cleanup to constructRetValue and its caller. NFC. 2021-04-01 16:36:16 +01:00
Brendon Cahoon 65c8bfb509 [AMDGPU] Enable output modifiers for double precision instructions
Update SIFoldOperands pass to recognize v_add_f64 and v_mul_f64
instructions for folding output modifiers.

Differential Revision: https://reviews.llvm.org/D99505
2021-04-01 10:08:17 -04:00
Dmitry Preobrazhensky cd953434f2 [AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices
Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645.

Differential Revision: https://reviews.llvm.org/D99413
2021-04-01 14:21:00 +03:00
Dmitry Preobrazhensky 0f5ebbcc7f [AMDGPU][MC] Added flag to identify VOP instructions which have a single variant
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.

This fix simplifies handling of single VOP instructions by adding a dedicated flag.

Differential Revision: https://reviews.llvm.org/D99408
2021-04-01 13:53:12 +03:00
Sander de Smalen 2f6f249a49 NFC: Change getIntrinsicInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97468

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97469
2021-03-31 14:04:41 +01:00
Sander de Smalen 2f56e1c6b1 NFC: Change getTypeBasedIntrinsicCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Depends on D97466

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D97468
2021-03-31 14:04:41 +01:00
Jay Foad 5d0e9ddfa5 [AMDGPU][GlobalISel] Add support for global atomicrmw fadd
This includes gfx908 which only has a no-return version of the
global_atomic_add_f32 instruction, using the same hack that was
previously implemented for selecting from the
llvm.amdgcn.global.atomic.fadd intrinsic.

Differential Revision: https://reviews.llvm.org/D97767
2021-03-31 11:13:00 +01:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Sebastian Neubauer 1c3b74f0ab [AMDGPU] Remove outdated TODOs. NFC
spillSGPRToVGPR is already respected in these places since D95768.

Differential Revision: https://reviews.llvm.org/D99570
2021-03-30 15:18:49 +02:00
Stanislav Mekhanoshin 619b88849e [AMDGPU] Fix "Sequence" spelling. NFC. 2021-03-29 12:11:36 -07:00
Joe Nash 45fd7c02af Revert "[AMDGPU] Mark additional VOP3 as commutable"
This reverts commit d35d8da7d6.
2021-03-29 14:48:11 -04:00
Joe Nash d35d8da7d6 [AMDGPU] Mark additional VOP3 as commutable
Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of more instructions.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D99376

Change-Id: I61e20490962d95ea429beb355c55f55c024dafdc
2021-03-29 14:22:20 -04:00
Jay Foad 9d08f276d7 [AMDGPU] Use reductions instead of scans in the atomic optimizer
If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a scan. Do
this for GFX10, where the permlanex16 instruction makes it viable. For
wave64 this saves a couple of dpp operations. For wave32 it saves one
readlane (which are generally bad for performance) and one dpp
operation.

Differential Revision: https://reviews.llvm.org/D98953
2021-03-26 15:38:14 +00:00
Jay Foad d92b4956d6 [AMDGPU] Inline FSHRPattern into its only use. NFC. 2021-03-26 09:32:02 +00:00
Konstantin Zhuravlyov f4ace63737 AMDGPU: Add target id and code object v4 support
- Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id)
  - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object)
    - Add kernarg_size to kernel descriptor
    - Change trap handler ABI to no longer move queue pointer into s[0:1]
  - Cleanup ELF definitions
    - Add V2, V3, V4 suffixes to make a clear distinction for code object version
    - Consolidate note names

Differential Revision: https://reviews.llvm.org/D95638
2021-03-24 11:54:05 -04:00
Sander de Smalen 55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
alex-t dccf83acf9 [AMDGPU] SIOptimizeExecMaskingPreRA should check constant bus constraint when folds EXEC copy
Folding EXEC copy into it's single use may lead to constant bus constraint violation as it adds one more SGPR operand.
         This change makes it validate the user instruction with the new SGPR operand and only fold it if it is legal.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D98888
2021-03-24 14:14:13 +03:00
Jay Foad fd142e6c18 [AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitBranchInst. NFC.
A BranchInst is always the terminator of its containing BasicBlock.
2021-03-23 16:54:43 +00:00
Joe Nash 538bda0b80 [AMDGPU] Refactor DPPCombine
NFC. Extract IsShrinkable into a helper function, and
make Subtarget a member variable.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D99099

Change-Id: If4bc97a88a9ae4eb1df47e717345d46a6ed515bf
2021-03-23 11:53:53 -04:00
Jay Foad fc7e3e7dd9 [AMDGPU] Set SchedRW on real instructions
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.

Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.

Differential Revision: https://reviews.llvm.org/D99187
2021-03-23 15:38:11 +00:00
Matt Arsenault b24436ac96 GlobalISel: Lower funnel shifts 2021-03-23 09:11:17 -04:00
Pushpinder Singh d0e5422eb8 [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93963
2021-03-23 05:45:43 +00:00
Carl Ritson 64db6b8d37 [AMDGPU] Only unbundle memory accesses in SIMemoryLegalizer
This restores previous behaviour and is a step toward removing
unbundling entirely.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D99061
2021-03-23 11:30:36 +09:00
Matt Arsenault 1dd23c6d53 AMDGPU: Allow tail calls for amdgpu_gfx functions 2021-03-22 10:55:19 -04:00
Matt Arsenault a0f5aad6d7 AMDGPU: Fix allowing immediates for tail call pseudo.
The pseudo was using SSrc_b64, so it allowed folding immediates into
the destination operand for a tail call to null. However, this is not
a valid operand for the s_setpc_b64 this will be lowered to. Avoids
printing the operand as an invalid immediate.

Avoids a regression when tail calls are enabled in GlobalISel (somehow
tail calls to null get deleted in the DAG).
2021-03-21 13:14:04 -04:00
Matt Arsenault 6314a72730 AMDGPU/GlobalISel: Enable CSE in pre-legalizer combiner 2021-03-21 10:07:37 -04:00
Carl Ritson 6c9cac5da1 [AMDGPU] Add MDT update missing from D98915 2021-03-20 13:38:58 +09:00
Carl Ritson fe5f4c397f [AMDGPU] Rename SIInsertSkips Pass
Pass no longer handles skips.  Pass now removes unnecessary
unconditional branches and lowers early termination branches.
Hence rename to SILateBranchLowering.

Move code to handle returns to epilog from SIPreEmitPeephole
into SILateBranchLowering. This means SIPreEmitPeephole only
contains optional optimisations, and all required transforms
are in SILateBranchLowering.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D98915
2021-03-20 11:48:04 +09:00
Carl Ritson 5df2af8b0e [AMDGPU] Merge SIRemoveShortExecBranches into SIPreEmitPeephole
SIRemoveShortExecBranches is an optimisation so fits well in the
context of SIPreEmitPeephole.

Test changes relate to early termination from kills which have now
been lowered prior to considering branches for removal.
As these use s_cbranch the execz skips are now retained instead.
Currently either behaviour is valid as kill with EXEC=0 is a nop;
however, if early termination is used differently in future then
the new behaviour is the correct one.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D98917
2021-03-20 11:26:42 +09:00
Carl Ritson b76c09023d [AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles
Add code so duplication index register changes can be removed from
inside bundles.

Reviewed By: rampitec, foad

Differential Revision: https://reviews.llvm.org/D98940
2021-03-20 10:26:23 +09:00
Stanislav Mekhanoshin 57effe2205 [AMDGPU] Remove dead glc1 handing in asm parser. NFC. 2021-03-19 08:37:47 -07:00
Jay Foad 5a5a531214 [AMDGPU] Remove some redundant code. NFC.
This is redundant because we have already checked that we can't handle
divergent 64-bit atomic operands.
2021-03-19 11:36:15 +00:00
Jay Foad 5dd5ddcb41 [AMDGPU] Skip building some IR if it won't be used. NFC. 2021-03-19 11:36:14 +00:00
Jay Foad c96dfe0d8b [AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC. 2021-03-19 11:36:14 +00:00
Stanislav Mekhanoshin edd6da10d2 [AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns
These are always selected as 0 anyway.

Differential Revision: https://reviews.llvm.org/D98663
2021-03-18 14:36:04 -07:00
Stanislav Mekhanoshin 961e4384f4 [AMDGPU] Support SCC on buffer atomics
Differential Revision: https://reviews.llvm.org/D98731
2021-03-18 09:56:14 -07:00
Stanislav Mekhanoshin 3f37c28230 [AMDGPU] Remove unused template parameters of MUBUF_Real_AllAddr_vi
Differential Revision: https://reviews.llvm.org/D98804
2021-03-18 09:02:38 -07:00
Jon Chesterfield 253f804deb [amdgpu] Update med3 combine to skip i64
[amdgpu] Update med3 combine to skip i64

Fixes an assumption that a type which is not i32 will be i16. This asserts
when trying to sign/zero extend an i64 to i32.

Test case was cut down from an openmp application. Variations on it are hit by
other combines before reaching the problematic one, e.g. replacing the
immediate values with other function arguments changes the codegen path and
misses this combine.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D98872
2021-03-18 15:56:41 +00:00
Matt Arsenault b9a0384983 GlobalISel: Preserve source value information for outgoing byval args
Pass through the original argument IR value in order to preserve the
aliasing information in the memcpy memory operands.
2021-03-18 09:16:54 -04:00
Carl Ritson 1a4bc3aba3 [AMDGPU] Avoid unnecessary graph visits during WQM marking
Avoid revisiting nodes with the same set of defined lanes by
using a unified visited set which integrates lanes into the key.
This retains the intent of the original code by still revisiting
a subgraph if a different set of lanes is defined and hence
marking might progress differently.

Note: default size of the visited set has been confirmed to
cover >99% of invocations in large array of test shaders.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D98772
2021-03-18 10:00:41 +09:00
David Green e2935dcfc4 [TTI] Add a Mask to getShuffleCost
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.

Differential Revision: https://reviews.llvm.org/D98206
2021-03-17 17:46:26 +00:00
Jay Foad 967b64beb4 [AMDGPU] Split dot2-insts feature
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.

Differential Revision: https://reviews.llvm.org/D98717
2021-03-17 09:42:21 +00:00
RamNalamothu 43f2d269b3 [AMDGPU, NFC] Refactor FP/BP spill index code in emitPrologue/emitEpilogue
Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D98617
2021-03-16 19:19:45 +05:30
Dmitry Preobrazhensky 596db9934b [AMDGPU][MC] Disabled lds_direct for GFX90a
Fixed bug 49382.

Differential Revision: https://reviews.llvm.org/D98626
2021-03-16 13:52:36 +03:00
Stanislav Mekhanoshin bc27a31801 [AMDGPU] Fix copyPhysReg to not produce unalined vgpr access
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.

Differential Revision: https://reviews.llvm.org/D98549
2021-03-15 14:14:30 -07:00
Stanislav Mekhanoshin c297709ee1 [AMDGPU] Fixed msan failure with uninitialized value 2021-03-15 13:58:19 -07:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Jon Chesterfield 13e49dcee4 [amdgpu] Implement lower function LDS pass
[amdgpu] Implement lower function LDS pass

Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kernel functions, moves them into a new struct
type, and allocates an instance of that type in every kernel. Uses are then
replaced with a constantexpr offset.

Prior to this pass, accesses from a function are compiled to trap. With this
pass, most such accesses are removed before reaching codegen. The trap logic
is left unchanged by this pass. It is still reachable for the cases this pass
misses, notably the extern shared construct from hip and variables marked
constant which survive the optimizer.

This is of interest to the openmp project because the deviceRTL runtime library
uses cuda shared variables from functions that cannot be inlined. Trunk llvm
therefore cannot compile some openmp kernels for amdgpu. In addition to the
unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled
and the function pointer hashing scheme deleted passes the openmp suite.

This lowering will use more LDS than strictly necessary. It is intended to be
a functionally correct fallback for cases that are difficult to target from
future optimisation passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94648
2021-03-15 15:24:01 +00:00
Carl Ritson 13877db2fa [AMDGPU] Fix shortfalls in WQM marking
When tracking defined lanes through phi nodes in the live range
graph each branch of the phi must be handled independently.
Also rewrite the marking algorithm to reduce unnecessary
operations.

Previously a shared set of defined lanes was used which caused
marking to stop prematurely. This was observable in existing lit
tests, but test patterns did not cover this detail.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D98614
2021-03-15 21:44:15 +09:00
Jay Foad 5d48b45ce3 [AMDGPU] Use depth first iterator instead of recursive DFS. NFCI.
The reason for this is to avoid deep recursion in DFS() which can cause
stack overflow on large CFGs, especially on Windows.

Differential Revision: https://reviews.llvm.org/D98528
2021-03-15 10:32:55 +00:00
Stanislav Mekhanoshin 315ebe0df3 [AMDGPU] Fix getAlignedAGPRClassID
Not all register classes were listed.

Differential Revision: https://reviews.llvm.org/D98550
2021-03-12 14:41:50 -08:00
Matt Arsenault 6b76d82853 GlobalISel: Fix marking byval arguments as immutable
byval arguments need to be assumed writable. Only implicitly stack
passed arguments which aren't addressable in the IR can be assumed
immutable.

Mips is still broken since for some reason its doing its own thing
with the ValueHandlers (and x86 doesn't actually handle byval
arguments now, although some of the code is there).
2021-03-12 09:01:53 -05:00
Matt Arsenault 3231d2b581 AMDGPU/GlobalISel: Cleanup call lowering sequence
Now that handleAssignments is handling all of the argument splitting,
we don't have to move the insert point around.
2021-03-12 09:01:52 -05:00
Carl Ritson f08dadd242 [AMDGPU] Do not annotate an else branch if there is a kill
As llvm.amdgcn.kill is lowered to a terminator it can cause
else branch annotations to end up in the wrong block.
Do not annotate conditionals as else branches where there is
a kill to avoid this.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D97427
2021-03-12 11:52:08 +09:00
Carl Ritson c07f2025e4 [AMDGPU] Restrict image_msaa_load to MSAA dimension types
This instruction is only valid on 2D MSAA and 2D MSAA Array
surfaces.  Remove intrinsic support for other dimension types,
and block assembly for unsupported dimensions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D98397
2021-03-12 09:47:24 +09:00
Ruiling Song e8e6817d00 [AMDGPU] Don't check hasStackObjects() when reserving VGPR
We have amdgpu_gfx functions that have high register pressure. If
we do not reserve VGPR for SGPR spill, we will fall into the path
to spill the SGPR to memory, which does not only have correctness issue,
but also have really bad performance.

I don't know why there is the check for hasStackObjects(), in our case,
we don't have stack objects at the time of finalizeLowering(). So just
remove the check that we always reserve a VGPR for possible SGPR spill
in non-entry functions.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D98345
2021-03-12 08:11:14 +08:00
Ruiling Song 4cee5cad28 [AMDGPU] Free reserved VGPR if no SGPR spill
I met some code generation behavior change when I tried to remove
the hasStackObject() check when reserving VGPR for SGPR spill.
For example, the function `callee_no_stack_no_fp_elim_all` in the lit
test file `callee-frame-setup.ll`.
The generated code changed from:
```
  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
  s_mov_b32 s4, s33
  s_mov_b32 s33, s32
  s_mov_b32 s33, s4
  s_setpc_b64 s[30:31]
```

into something like:
```
  s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
  v_writelane_b32 v63, s33, 0
  s_mov_b32 s33, s32
  v_readlane_b32 s33, v63, 0
  s_setpc_b64 s[30:31]
```

I think we still prefer the old version where only scalar instructions are needed.
The idea here is free the reserved VGPR if no SGPR spills. So we will very likely
to use a free SGPR for fp/sp spill.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D98344
2021-03-12 08:11:14 +08:00
Stanislav Mekhanoshin 6e8a0213a3 [AMDGPU] Remove dead MTBUF patterns
These patterns are obviously dead, they are using format
operand which is not selected and we have no corresponding
SelectMUBUF() function.

Differential Revision: https://reviews.llvm.org/D98451
2021-03-11 14:13:00 -08:00
Matt Arsenault 70cb57d7da AMDGPU/GlobalISel: Improve private addressing mode matching
This enables the look-through-copy to hack around not correctly
regbankselecting constants to match the use bank.
2021-03-11 10:23:35 -05:00
Nikita Popov 46354bac76 [OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)
Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.
2021-03-11 14:40:57 +01:00
Ruiling Song 66340846b3 [AMDGPU] Always create Stack Object for reserved VGPR
As we may overwrite inactive lanes of a caller-save-vgpr, we should
always save/restore the reserved vgpr for sgpr spill.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D98319
2021-03-11 10:06:07 +08:00
Stanislav Mekhanoshin 9931b1f7a4 [AMDGPU] Disable SCC bit on fp atomics
Differential Revision: https://reviews.llvm.org/D98221
2021-03-10 12:36:09 -08:00
Stanislav Mekhanoshin 574a9dabc6 [AMDGPU] Always expand system scope fp atomics on gfx90a
FP atomics in system scope cannot be used and shall always
be expanded in a CAS loop.

Differential Revision: https://reviews.llvm.org/D98085
2021-03-10 12:35:23 -08:00
Jay Foad 70f013fd3b [AMDGPU] Fix isReallyTriviallyReMaterializable for V_MOV_*
D57708 changed SIInstrInfo::isReallyTriviallyReMaterializable to reject
V_MOVs with extra implicit operands, but it accidentally rejected all
V_MOVs because of their implicit use of exec. Fix it but avoid adding a
moderately expensive call to MI.getDesc().getNumImplicitUses().

In real graphics shaders this changes quite a few vgpr copies into move-
immediates, which is good for avoiding stalls on GFX10.

Differential Revision: https://reviews.llvm.org/D98347
2021-03-10 16:18:12 +00:00