Commit Graph

5915 Commits

Author SHA1 Message Date
Jay Foad ec8c61efdf [AMDGPU] Allow multiple uses of the same literal
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.

AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.

Differential Revision: https://reviews.llvm.org/D100770
2021-04-20 16:44:01 +01:00
Matt Arsenault 1cb8a9d595 AMDGPU/GlobalISel: Fix uitofp/sitofp with non-power-of-2 integers 2021-04-20 11:13:29 -04:00
Sebastian Neubauer 4897effb14 [AMDGPU] Add TransVALU to gfx10
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.

This doesn't seem to have any effect, but it should be more correct.

Differential Revision: https://reviews.llvm.org/D100123
2021-04-20 15:34:43 +02:00
Jay Foad 2aea830ec4 [AMDGPU] Use if instead of foreach in a few places. NFC. 2021-04-20 14:20:30 +01:00
Jay Foad edea476142 [AMDGPU] Use simpler alternatives to !foldl. NFC. 2021-04-20 12:59:04 +01:00
hsmahesha 840c4e4e90 [AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D100773
2021-04-20 16:17:15 +05:30
Jay Foad b22721f01a [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used
Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.

Differential Revision: https://reviews.llvm.org/D100760
2021-04-20 09:17:52 +01:00
madhur13490 6a4d9cb7e0 [AMDGPU] Remove error check for indirect calls and add missing queue-ptr
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100633
2021-04-20 00:35:17 +05:30
Jay Foad a02aa91313 [AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC. 2021-04-19 14:20:46 +01:00
Jay Foad ef443390a9 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Reapply after fixing dependent change D100188.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-19 12:08:02 +01:00
Jay Foad 323ef0eb45 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Reapply with a fix for using InstToErase after it had been erased.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-19 12:05:41 +01:00
Dmitry Preobrazhensky bcc29e0fcf [AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3
Disabled constants as carry in/out operands. See bug 48711.

Differential Revision: https://reviews.llvm.org/D100642
2021-04-19 13:42:31 +03:00
Yaxun (Sam) Liu 3597f02fd5 [AMDGPU] Add GlobalDCE before internalization pass
The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Reviewed by: Artem Belevich, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D98783
2021-04-17 11:25:25 -04:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Joe Nash a0ed70abde [AMDGPU] Remove redundant field from DPP8 def
These lines set the value to what it already was,
so they are redundant. NFC

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100664

Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3
2021-04-16 16:23:52 -04:00
Joe Nash 919236e608 [AMDGPU] NFC, Comment in disassembler for dpp8
Gives reasoning for convertDPP8.
Also corrects typo in Operand type comment.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100665

Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4
2021-04-16 16:21:47 -04:00
Christudasan Devadasan 97618522dc [AMDGPU] Remove dead dcode (NFC). 2021-04-16 23:03:31 +05:30
Joe Nash 7cc4a02fa2 [AMDGPU] Refactor VOP3P Profile and AsmParser, NFC
Refactors VOP3P tablegen and the AsmParser for VOP3P
for better extensibility. NFC intended

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100602

Change-Id: I038e3a772ac348bb18979cdf3e3ae2e9476dd411
2021-04-16 13:06:50 -04:00
hsmahesha 099dcb68a6 [AMDGPU] Refactor ds_read/ds_write related select code for better readability.
Part of the code related to ds_read/ds_write ISel is refactored, and the
corresponding comment is re-written for better readability, which would help
while implementing any future ds_read/ds_write ISel related modifications.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100300
2021-04-16 08:24:00 +05:30
Stanislav Mekhanoshin 13015ebd6f [AMDGPU] Factor out predicate FmaakFmamkF32Insts
Differential Revision: https://reviews.llvm.org/D100409
2021-04-15 12:29:16 -07:00
Stanislav Mekhanoshin d4385e483d [AMDGPU] Add new EmitDstSel field to VOPPofile. NFC.
Differential Revision: https://reviews.llvm.org/D100589
2021-04-15 12:07:08 -07:00
hsmahesha 82787eb228 [AMDGPU] Move LDS lowering related utility functions to a separate utils file.
Move some utility functions which are used within LDS lowering pass to a separate utils
file so that other LDS related passes can make use of them when required.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D100526
2021-04-16 00:15:48 +05:30
Arthur Eubanks c8f0a7c215 [NewPM] Cleanup IR printing instrumentation
Being lazy with printing the banner seems hard to reason with, we should print it
unconditionally first (it could also lead to duplicate banners if we
have multiple functions in -filter-print-funcs).

The printIR() functions were doing too many things. I separated out the
call from PrintPassInstrumentation since we were essentially doing two
completely separate things in printIR() from different callers.

There were multiple ways to generate the name of some IR. That's all
been moved to getIRName(). The printing of the IR name was also
inconsistent, now it's always "IR Dump on $foo" where "$foo" is the
name. For a function, it's the function name. For a loop, it's what's
printed by Loop::print(), which is more detailed. For an SCC, it's the
list of functions in parentheses. For a module it's "[module]", to
differentiate between a possible SCC with a function called "module".

To preserve D74814, we have to check if we're going to print anything at
all first. This is unfortunate, but I would consider this a special
case that shouldn't be handled in the core logic.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D100231
2021-04-15 09:50:55 -07:00
Sebastian Neubauer 7842e1725e [AMDGPU] Fix large return values with amdgpu_gfx
Returning in memory is not supported, so fall back to sret.
Also, extend i1 and i16 to i32. Otherwise, they would be passed through
memory.

Differential Revision: https://reviews.llvm.org/D100543
2021-04-15 14:57:56 +02:00
hsmahesha 4973b0c4e7 [AMDGPU] Disable forceful inline of non-kernel functions which use LDS.
Now since LDS uses within non-kernel functions are being handled in the
pass - LowerModuleLDS, we *NO* need to *forcefully* inline non-kernel
functions just because they use LDS. Do forceful inlining only when the
pass - LowerModuleLDS is not enabled. It is enabled by default.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D100481
2021-04-15 09:12:56 +05:30
Stanislav Mekhanoshin b7ebb25e53 [AMDGPU] Factor out SelectSAddrFI()
This is a service function generally useful for selection
of a FI in an SADDR. NFC for now, needed for future patch.

Differential Revision: https://reviews.llvm.org/D100406
2021-04-14 09:40:02 -07:00
Sander de Smalen 4f42d873c2 [TTI] NFC: Change getArithmeticInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100317
2021-04-14 17:20:36 +01:00
Sander de Smalen 1af35e77f4 [TTI] NFC: Change getVectorInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100315
2021-04-14 17:20:35 +01:00
Sander de Smalen 174e8f6c5e [TTI] NFC: Change getShuffleCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100314
2021-04-14 17:20:35 +01:00
Sander de Smalen 14b934f8a6 [TTI] NFC: Change getCFInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100313
2021-04-14 17:20:34 +01:00
hsmahesha e3070db0f7 [AMDGPU] Rename "LDS lowering" pass name.
Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to
`amdgpu-enable-lower-module-lds` as later is consistent and reads better.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D100441
2021-04-14 20:19:53 +05:30
Sebastian Neubauer 929edd4375 [AMDGPU] Mark scavenged SGPR as used
Otherwise it reuses the same register for storing the stack slot
offset if the stack slot offset is big.

Differential Revision: https://reviews.llvm.org/D100461
2021-04-14 14:55:01 +02:00
Sander de Smalen 2285dfb73f [TTI] NFC: Change getMinMaxReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100202
2021-04-13 14:21:00 +01:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
madhur13490 5682ae2fc6 [AMDGPU] Set implicit arg attributes for indirect calls
This patch adds attributes corresponding to
implicits to functions/kernels if
1. it has an indirect call OR
2. it's address is taken.

Once such attributes are set, rest of the codegen would work
out-of-box for indirect calls. This patch eliminates
the potential overhead -fixed-abi imposes even though indirect functions
calls are not used.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D99347
2021-04-13 13:15:13 +00:00
Sebastian Neubauer 6cc91adf1e [AMDGPU] Kill temporary register after restoring
Not a correctness issue, but the temporary register is not used
afterwards and should be dead.

Differential Revision: https://reviews.llvm.org/D100295
2021-04-12 14:20:03 +02:00
Dmitry Preobrazhensky 67b39661c8 [AMDGPU][MC][NFC] Removed extra spaces
Fixed bugs 49646, 49647.

Differential Revision: https://reviews.llvm.org/D100173
2021-04-12 13:33:19 +03:00
Sebastian Neubauer 7a8e65dd3d [AMDGPU] Fix ubsan error
The RegScavenger can be null sometimes, so a pointer is needed.

Fixes UBSan error introduced in f9a8c6a0e5.
2021-04-12 12:14:00 +02:00
Sebastian Neubauer b76c2a6c2b [AMDGPU] Fix saving fp and bp
Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.

This builds on the assumptions that
1. a function is never called with exec=0
2. lanes do not die in a function, i.e. exec!=0 in the function epilog
3. no new lanes are active when exiting the function, i.e. exec in the
   epilog is a subset of exec in the prolog.

Differential Revision: https://reviews.llvm.org/D96869
2021-04-12 11:52:55 +02:00
Sebastian Neubauer 32bc9a9bc3 [AMDGPU] Unify spill code
Instead of reimplementing spilling in prolog and epilog, reuse
buildSpillLoadStore.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D99269
2021-04-12 11:19:08 +02:00
Sebastian Neubauer f9a8c6a0e5 [AMDGPU] Save VGPR of whole wave when spilling
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes of the VGPR. We even need to save the VGPR if it is marked as
dead.

The generated code depends on two things:
- Can we scavenge an SGPR to save EXEC?
- And can we scavenge a VGPR?

If we can scavenge an SGPR, we
- save EXEC into the SGPR
- set the needed lane mask
- save the temporary VGPR
- write the spilled SGPR into VGPR lanes
- save the VGPR again to the target stack slot
- restore the VGPR
- restore EXEC

If we were not able to scavenge an SGPR, we do the same operations, but
everytime the temporary VGPR is written to memory, we
- write VGPR to memory
- flip exec (s_not exec, exec)
- write VGPR again (previously inactive lanes)

Surprisingly often, we are able to scavenge an SGPR, even though we are
at the brink of running out of SGPRs.
Scavenging a VGPR does not have a great effect (saves three instructions
if no SGPR was scavenged), but we need to know if the VGPR we use is
live before or not, otherwise the machine verifier complains.

Differential Revision: https://reviews.llvm.org/D96336
2021-04-12 11:01:38 +02:00
dfukalov 8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Mitch Phillips 092f288d36 Revert "[AMDGPU] Remove MachineDCE after SIFoldOperands"
This reverts commit 5a0117b2d0.

Reason: Dependent change d19a42eba9 broke
the ASan buildbots.
2021-04-09 15:47:44 -07:00
Mitch Phillips 3d4730a73f Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs"
This reverts commit d19a42eba9.

Reason: Broke the ASan buildbots. See the original phabricator review
for more details: https://reviews.llvm.org/D100188
2021-04-09 15:47:44 -07:00
Jay Foad 5a0117b2d0 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-09 20:41:09 +01:00
Jay Foad d19a42eba9 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-09 20:41:09 +01:00
Jay Foad a4ced03d34 [AMDGPU] SIFoldOperands: eagerly delete dead copies
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.

Differential Revision: https://reviews.llvm.org/D100117
2021-04-09 13:52:54 +01:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Sebastian Neubauer 36138db116 [AMDGPU] IsFlatScratch/Global -> FlatScratch/Global
Remove 'Is' from IsFlatScratch/Global. NFC

Differential Revision: https://reviews.llvm.org/D100108
2021-04-09 11:20:31 +02:00