Commit Graph

5948 Commits

Author SHA1 Message Date
Jay Foad 7e43483dd1 [AMDGPU] Remove set_gpr_idx instructions in conditional blocks
SIPreEmitPeephole did not try to remove redundant s_set_gpr_idx_*
instructions in blocks that end with a conditional branch instruction.
This seems like a simple oversight.

Differential Revision: https://reviews.llvm.org/D101629
2021-04-30 22:15:45 +01:00
Daniil Fukalov 3489c2d7b1 [TTI] NFC: Change getTypeLegalizationCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen, kparzysz

Differential Revision: https://reviews.llvm.org/D101533
2021-04-30 22:51:51 +03:00
David Stuttard a67a377014 [AMDGPU] Tidy up some simple expressions for clarity NFC
Slight refactor for clarity.

Change-Id: Ib25e7f4582c67a7c57f066cfd5382c1405d7d4c5

Differential Revision: https://reviews.llvm.org/D101610
2021-04-30 11:13:54 +01:00
Jay Foad f251379a91 [AMDGPU] Simplify getWaitStatesSince. NFC. 2021-04-30 08:58:24 +01:00
Christudasan Devadasan 544be70864 [AMDGPU] Skip promote-alloca for insertelement/insertvalue users
It is difficult to track the users of vector and aggregate types.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D101562
2021-04-30 08:37:26 +05:30
Carl Ritson 424f1f6f96 [AMDGPU][NFC] Refactor hazard recognition IsHazardFn and IsExpiredFn
Refactor IsHazardFn and IsExpiredFn to use constant references as these should not be mutating the instructions visited and the instruction can never be null.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D101430
2021-04-30 09:18:56 +09:00
Carl Ritson 749702fc6b [AMDGPU] Remove dead early-out in GCNHazardRecognizer
Remove an early-out in wait state counting which can never be
taken.

Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D101520
2021-04-30 08:55:49 +09:00
Jay Foad 16d707e656 [AMDGPU] Fix v_swap_b32 formation on physical registers
As explained in the comments, matchSwap matches:

// mov t, x
// mov x, y
// mov y, t

and turns it into:

// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y

On physical registers we don't have full use-def chains so the check
for T being live-out was not working properly with subregs/superregs.

Differential Revision: https://reviews.llvm.org/D101546
2021-04-29 20:53:40 +01:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Petar Avramovic c34900e133 AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return
When atomic image intrinsic return value is unused, register class for
destination of a sub-register copy of return value ends up not being set.
This copy then hits 'Register class not set' assert later.
If return value has uses, register class is determined by use instruction.
Fix is to not create sub-register copy when image intrinsic destination has
no uses because it would be deleted by dead-mi-elimination later anyway.

Differential Revision: https://reviews.llvm.org/D101448
2021-04-29 20:56:03 +02:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Sebastian Neubauer 9569d5ba02 [AMDGPU] Allow buildSpillLoadStore in empty bb
This allows calling buildSpillLoadStore for an empty basic block, where
MI points at the end of the block instead of to an instruction.

This only happens with downstream CFI changes, so I was not able to
create a testcase that works with upstream LLVM.

Differential Revision: https://reviews.llvm.org/D101356
2021-04-29 12:53:20 +02:00
Joe Nash 168228d76a [AMDGPU] Make some VOP3 insts commutable
Note, only src0 and src1 will be commuted if the isCommutable flag
is set. This patch does not change that, it just makes it possible
to commute src0 and src1 of some U/I/B vop3 instructions.

This patch revises d35d8da7d6.
It contains the commute opportunities excluding float insts

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D101474

Change-Id: I62938173d750453839f2457a3851661a29135faf
2021-04-28 13:59:08 -04:00
Jay Foad 12011b5217 [AMDGPU] GCNHazardRecognizer: ignore all meta instructions
This is hopefully NFC, but should be more robust in ignoring all
instructions that should be ignored, instead of just some of them.

Differential Revision: https://reviews.llvm.org/D101372
2021-04-27 20:17:15 +01:00
Jay Foad dc2f6bf566 [AMDGPU] Minor refactoring in AMDGPUUnifyDivergentExitNodes. NFC.
Make unifyReturnBlockSet a member function so we don't have to pass TTI
around as an argument.
2021-04-27 14:21:51 +01:00
Petar Avramovic 8110fcc8fc AMDGPU/GlobalISel: Fix negative offset folding for buffer_load
Buffer_load does unsigned offset calculations. Don't fold
operands of 32-bit add that are likely to cause unsigned add
overflow (common case is when one of the operands is negative).

Differential Revision: https://reviews.llvm.org/D91336
2021-04-27 14:45:22 +02:00
Petar Avramovic fb7be0d912 AMDGPU/GlobalISel: Remove redundant G_FCANONICALIZE
Add basic version of isCanonicalized for global-isel. Copied from sdag.
Add post legalizer combine that deletes G_FCANONICALIZE when its input
is already Canonicalized.

Differential Revision: https://reviews.llvm.org/D96605
2021-04-27 12:26:37 +02:00
Petar Avramovic 4a9bc59867 AMDGPU/GlobalISel: Add integer med3 combines
Add signed and unsigned integer version of med3 combine.
Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1. Destination is med3
that corresponds to signedness of min/max in source.

Differential Revision: https://reviews.llvm.org/D90050
2021-04-27 11:52:23 +02:00
Baptiste Saleil caf1294d95 [AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts
the compilation time and there is no case for which we see any improvement in
performance. This patch removes this pass and its associated test cases from
the tree.

Differential Revision: https://reviews.llvm.org/D101313

Change-Id: I0599169a7609c19a887f8d847a71e664030cc141
2021-04-26 17:21:49 -04:00
Sebastian Neubauer fcc40d9c17 [AMDGPU] Use MapVector for WWMReservedRegs
Use MapVector instead of SmallDenseMap because it has a deterministic
iteration order.

Differential Revision: https://reviews.llvm.org/D101299
2021-04-26 17:43:00 +02:00
Tim Renouf 8710eff6c3 [MC][AMDGPU][llvm-objdump] Synthesized local labels in disassembly
1. Add an accessor function to MCSymbolizer to retrieve addresses
   referenced by a symbolizable operand, but not resolved to a symbol.
   That way, the caller can synthesize labels at those addresses and
   then retry disassembling the section.

2. Implement that in AMDGPU -- a failed symbol lookup results in the
   address being added to a vector returned by the new function.

3. Use that in llvm-objdump when using MCSymbolizer (which only happens
   on AMDGPU) and SymbolizeOperands is on.

Differential Revision: https://reviews.llvm.org/D101145

Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e
2021-04-26 13:56:36 +01:00
Sebastian Neubauer 3366d81153 [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429

Reapply with fixed tests on window.
2021-04-23 18:09:24 +02:00
dfukalov 9ab17a60eb [TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D101151
2021-04-23 18:02:00 +03:00
Sebastian Neubauer 22d99cb63f Revert "[AMDGPU] Save WWM registers in functions"
This reverts commit 91464c30bf.

Seems to break tests on windows.
2021-04-23 16:38:50 +02:00
Sebastian Neubauer 91464c30bf [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429
2021-04-23 16:09:31 +02:00
Matt Arsenault b58332774f AMDGPU: Fix assert on inline asm on gfx90a
This was assuming all mayLoad instructions have one def.
2021-04-23 09:00:25 -04:00
Matt Arsenault ed633a1daa AMDGPU: Restore atomic fp feature on FP atomic instruction definitions
9931b1f7a4 switched this to checking for
the two specific subtargets, instead of the dedicated feature. This
broke supporting functions which force added the feature when emitting
targets that do not actually support them. This stil does not work for
the targets that use the gfx6/7 or gfx10 encodings.
2021-04-22 21:32:01 -04:00
Jay Foad 79cb3ba08f [AMDGPU] SIWholeQuadMode: don't add duplicate implicit $exec operands
STRICT_WWM and STRICT_WQM are already defined with Uses = [EXEC], so
there is no need to add another implicit use of $exec when lowering them
to V_MOV_B32 instructions.

Differential Revision: https://reviews.llvm.org/D100969
2021-04-22 09:19:47 +01:00
Matt Arsenault 987e52851e AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies 2021-04-21 21:58:18 -04:00
Stanislav Mekhanoshin f9d0d0d7e0 [AMDGPU] Lower regbanks reassign threshold to 15000
Let it work on a very small kernels only. Measurements showed
the performance benefit is not worth the compile time.

Differential Revision: https://reviews.llvm.org/D100904
2021-04-21 08:34:11 -07:00
dfukalov a8b35e0f52 [TTI] NFC: Change getVectorSplitCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100952
2021-04-21 17:32:02 +03:00
Matt Arsenault 70ab76a81b AMDGPU: Fix indirect tail calls
Fix a selection error on uniform callees, and use a regular call if
divergent.
2021-04-21 09:15:24 -04:00
Jay Foad ec8c61efdf [AMDGPU] Allow multiple uses of the same literal
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.

AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.

Differential Revision: https://reviews.llvm.org/D100770
2021-04-20 16:44:01 +01:00
Matt Arsenault 1cb8a9d595 AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers 2021-04-20 11:13:29 -04:00
Sebastian Neubauer 4897effb14 [AMDGPU] Add TransVALU to gfx10
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.

This doesn't seem to have any effect, but it should be more correct.

Differential Revision: https://reviews.llvm.org/D100123
2021-04-20 15:34:43 +02:00
Jay Foad 2aea830ec4 [AMDGPU] Use if instead of foreach in a few places. NFC. 2021-04-20 14:20:30 +01:00
Jay Foad edea476142 [AMDGPU] Use simpler alternatives to !foldl. NFC. 2021-04-20 12:59:04 +01:00
hsmahesha 840c4e4e90 [AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D100773
2021-04-20 16:17:15 +05:30
Jay Foad b22721f01a [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used
Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.

Differential Revision: https://reviews.llvm.org/D100760
2021-04-20 09:17:52 +01:00
madhur13490 6a4d9cb7e0 [AMDGPU] Remove error check for indirect calls and add missing queue-ptr
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100633
2021-04-20 00:35:17 +05:30
Jay Foad a02aa91313 [AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC. 2021-04-19 14:20:46 +01:00
Jay Foad ef443390a9 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Reapply after fixing dependent change D100188.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-19 12:08:02 +01:00
Jay Foad 323ef0eb45 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Reapply with a fix for using InstToErase after it had been erased.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-19 12:05:41 +01:00
Dmitry Preobrazhensky bcc29e0fcf [AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3
Disabled constants as carry in/out operands. See bug 48711.

Differential Revision: https://reviews.llvm.org/D100642
2021-04-19 13:42:31 +03:00
Yaxun (Sam) Liu 3597f02fd5 [AMDGPU] Add GlobalDCE before internalization pass
The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Reviewed by: Artem Belevich, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D98783
2021-04-17 11:25:25 -04:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Joe Nash a0ed70abde [AMDGPU] Remove redundant field from DPP8 def
These lines set the value to what it already was,
so they are redundant. NFC

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100664

Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3
2021-04-16 16:23:52 -04:00
Joe Nash 919236e608 [AMDGPU] NFC, Comment in disassembler for dpp8
Gives reasoning for convertDPP8.
Also corrects typo in Operand type comment.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100665

Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4
2021-04-16 16:21:47 -04:00
Christudasan Devadasan 97618522dc [AMDGPU] Remove dead dcode (NFC). 2021-04-16 23:03:31 +05:30