Commit Graph

5630 Commits

Author SHA1 Message Date
Matt Arsenault 5f9707b796 AMDGPU: Fix redundant FP spilling/assert in some functions
If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.

Try to predict that we will have an FP later, although this is ugly.
2021-01-26 13:01:45 -05:00
Matt Arsenault 92d1195b5f AMDGPU: Add assertion to determineCalleeSaves
Make sure this isn't getting called multiple times. I was surprised we
were modifying the function here, which I think is a bit questionable.
2021-01-26 13:01:45 -05:00
Simon Pilgrim f82cff31d3 [AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI.
Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions
2021-01-26 16:09:39 +00:00
Simon Pilgrim ee3da8958a [AMDGPU] Fix null-dereference static analysis warnings. NFCI.
Avoid repeated calls to isZeroValue() and check for a null pointer before dereferencing a dyn_cast<>.
2021-01-26 15:43:59 +00:00
Matt Arsenault 551a69e418 AMDGPU: Clear IsSSA property in SIFormMemoryClauses
Fixes verifier error when writing MIR testcases
2021-01-26 10:40:41 -05:00
Mirko Brkusanin 608ac62540 [AMDGPU] Fix use of HasModifiers in VopProfile
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.

Differential Revision: https://reviews.llvm.org/D94795
2021-01-26 15:21:11 +01:00
Dmitry Preobrazhensky 745064e36b [AMDGPU][MC] Refactored exp tgt handling
Summary:
- Separated tgt encoding from parsing;
- Separated tgt decoding from printing;
- Improved errors handling;
- Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0

Reviewers: arsenm, rampitec, foad

Differential Revision: https://reviews.llvm.org/D95216
2021-01-26 14:54:15 +03:00
Kazu Hirata c85b6bf33c [AMDGPU] Forward-declare MachineIRBuilder (NFC)
AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward
declaration of MachineIRBuilder in LegalizerInfo.h.  This patch adds a
forward declaration right in AMDGPULegalizerInfo.h.

While we are at it, this patch removes the one in LegalizerInfo.h,
where it is unnecessary.
2021-01-25 19:24:01 -08:00
Changpeng Fang 5b648df1a8 AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause
Summary:
  RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge.
We observed a long compilation time issue when RPT::reset() is called once for each cluster.

In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get
the register pressure for the later clusters in the same basic block. This could effectively reduce the number
of the expensive calls and thus reduce the compile time.

Reviewers:
  rampitec

Fixes:
  SWDEV-239161

Differential Revision:
  https://reviews.llvm.org/D95273
2021-01-25 16:08:08 -08:00
Konstantin Zhuravlyov 2cdb34efda Revert "[IndirectFunctions] Skip propagating attributes to address taken functions"
This reverts commit dd8ae42674.

This commit causes infinite loop when compiling rocThrust and hipCUB.

Differential Revision: https://reviews.llvm.org/D95389
2021-01-25 15:58:06 -05:00
Dmitry Preobrazhensky 558b3bbb5b [AMDGPU][MC] Improved errors handling for SDWA operands
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D95212
2021-01-25 19:02:53 +03:00
Carl Ritson a80ebd0179 [AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization
Frame-base materialization may insert vector instructions before EXEC is initialised.
Fix this by moving lowering of llvm.amdgcn.init.exec later in backend.
Also remove SI_INIT_EXEC_LO pseudo as this is not necessary.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D94645
2021-01-25 08:31:17 +09:00
Kazu Hirata 054444177b [Target] Use llvm::append_range (NFC) 2021-01-24 12:18:56 -08:00
Kazu Hirata e4847a7fcf Revert "[Target] Use llvm::append_range (NFC)"
This reverts commit cc7a238286.

The X86WinEHState.cpp hunk seems to break certain builds.
2021-01-23 11:25:27 -08:00
Kazu Hirata cc7a238286 [Target] Use llvm::append_range (NFC) 2021-01-23 10:56:31 -08:00
Kazu Hirata 49231c1f80 [llvm] Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2021-01-22 23:25:05 -08:00
Stanislav Mekhanoshin ca904b81e6 [AMDGPU] Fix FP materialization/resolve with flat scratch
Differential Revision: https://reviews.llvm.org/D95266
2021-01-22 16:06:47 -08:00
Stanislav Mekhanoshin 607bec0bb9 Change materializeFrameBaseRegister() to return register
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268
2021-01-22 15:51:06 -08:00
Arthur Eubanks 42d682a217 [NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0
The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0.

Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95250
2021-01-22 12:29:39 -08:00
Sebastian Neubauer 8214982b50 [AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Relands ba7dcd8542, which had memory leaks.

Differential Revision: https://reviews.llvm.org/D95215
2021-01-22 11:24:08 +01:00
Christudasan Devadasan ff8a1cae18 [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing
the initial soffset value. With certain early passes, this value is
getting modified and that brought additional fixup during
eliminateFrameIndex to work for all cases. This whole transformation
looks trivial and can be handled better.

This patch clearly defines the initial value for soffset and keeps it
unchanged before eliminateFrameIndex. The initial value must be zero
for MUBUF with a frame index. The non-frame index MUBUF forms that
use a raw offset from SP will have the stack register for soffset.
During frame elimination, the soffset remains zero for entry functions
with zero dynamic allocas and no callsites, or else is updated to the
appropriate frame/stack register.

Also, did some code clean up and made all asserts around soffset
stricter to match.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D95071
2021-01-22 14:20:59 +05:30
Arthur Eubanks a11bf9a7fb [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Sebastian Neubauer 4dbdff66fe Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"
This reverts commit ba7dcd8542.

(caused memory leaks)
2021-01-21 18:11:48 +01:00
Jay Foad c0b3c5a064 [AMDGPU][GlobalISel] Run SIAddImgInit
This pass is required to get correct codegen for image instructions with
the tfe or lwe bits set.

Differential Revision: https://reviews.llvm.org/D95132
2021-01-21 15:54:54 +00:00
Matt Arsenault 94375d1083 AMDGPU: Remove v_rsq_f64 patterns
This isn't accurate enough without correction
2021-01-21 10:51:36 -05:00
Matt Arsenault 2a0db8d70e AMDGPU: Use more accurate fast f64 fdiv
A raw v_rcp_f64 isn't accurate enough, so start applying correction.
2021-01-21 10:51:36 -05:00
Sebastian Neubauer ba7dcd8542 [AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Differential Revision: https://reviews.llvm.org/D94768
2021-01-21 16:32:17 +01:00
Matt Arsenault 20566a2ed8 AMDGPU: Add occupancy to serialized MachineFunctionInfo
Not sure about the default value handling, but also not sure
defaulting to a theoretically subtarget dependent value.
2021-01-21 09:21:00 -05:00
madhur13490 dd8ae42674 [IndirectFunctions] Skip propagating attributes to address taken functions
In case of indirect calls or address taken functions,
skip propagating any attributes to them. We just
propagate features to such functions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94585
2021-01-21 07:04:28 +00:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Mirko Brkusanin a6a72dfdf2 [AMDGPU][GlobalISel] Avoid selecting S_PACK with constants
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.

Differential Revision: https://reviews.llvm.org/D92219
2021-01-20 11:54:53 +01:00
Petar Avramovic 4ab704d628 [AMDGPU][MC] Add tfe disassembler support MIMG opcodes
With tfe on there can be a vgpr write to vdata+1.
Add tablegen support for 5 register vdata store.
This is required for 4 register vdata store with tfe.

Differential Revision: https://reviews.llvm.org/D94960
2021-01-20 10:37:09 +01:00
Jay Foad 18cb7441b6 [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975
2021-01-19 18:47:14 +00:00
Jay Foad 49dce85584 [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.
Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00
2021-01-19 10:39:56 +00:00
Dmitry Preobrazhensky 30b8f55378 Fix for sanitizer issue in 55c557a 2021-01-18 18:39:55 +03:00
Dmitry Preobrazhensky 55c557a5d2 [AMDGPU][MC] Refactored parsing of dpp ctrl
Summary of changes:
- simplified code to improve maintainability;
- replaced lex() with higher level parser functions;
- improved errors handling.

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D94777
2021-01-18 18:14:19 +03:00
Dmitry Preobrazhensky 911961c9c1 [AMDGPU][MC][GFX10] Improved dpp8 errors handling
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D94756
2021-01-18 15:02:31 +03:00
Kazu Hirata 352fcfc697 [llvm] Use llvm::sort (NFC) 2021-01-17 10:39:45 -08:00
Kazu Hirata 2082b10d10 [llvm] Use *::empty (NFC) 2021-01-16 09:40:55 -08:00
Kazu Hirata 19aacdb715 [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-16 09:40:53 -08:00
Kazu Hirata 4707b21298 [AMDGPU] Use llvm::is_contained (NFC) 2021-01-15 21:00:54 -08:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Matt Arsenault d55d592a92 GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper
This fixes double printing of insertion debug messages in the
legalizer.

Try to cleanup usage of observers. Currently the use of observers is
pretty hard to follow and it's not clear what is responsible for
them. Observers are referenced in 3 places:

1. In the MachineFunction
2. In the MachineIRBuilder
3. In the LegalizerHelper

The observers in the MachineFunction and MachineIRBuilder are both
called only on insertions, and are redundant with each other. The
source of the double printing was the same observer was added to both
the MachineFunction, and the MachineIRBuilder. One of these references
needs to be removed. Arguably observers in general should be fully
removed from one or the other, but it may be useful to have a local
observer in the MachineIRBuilder that is not added to the function's
observers. Alternatively, the wrapper observer could manage a local
observer in one place.

The LegalizerHelper only ever calls the observer on changing/changed
instructions, and never insertions. Logically these are two different
types of observers, for changes and for insertions.

Additionally, some places used the GISelObserverWrapper when they only
needed a single observer they could use directly.

Setting the observer in the LegalizerHelper constructor is not
flexible enough if the LegalizerHelper is constructed anywhere outside
the one used by the legalizer. AMDGPU calls the LegalizerHelper in
RegBankSelect, and needs to use a local observer to apply the regbank
to newly created instructions. Currently it accomplishes this by
constructing a local MachineIRBuilder. I'm trying to move the
MachineIRBuilder to be owned/maintained by the RegBankSelect pass
itself, but the locally constructed LegalizerHelper would reset the
observer.

Mips also has a special case use of the LegalizationArtifactCombiner
in applyMappingImpl; I think we do need to run the artifact combiner
during RegBankSelect, but in a more consistent way outside of
applyMappingImpl.
2021-01-13 10:44:31 -05:00
Carl Ritson 790c75c163 [AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader
Add pseudo instruction to allow early termination of pixel shader
anywhere based on the value of SCC.  The intention is to use this
when a mask of live lanes is updated, e.g. live lanes in WQM pass.
This facilitates early termination of shaders even when EXEC is
incomplete, e.g. in non-uniform control flow.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D88777
2021-01-13 13:29:05 +09:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
Matt Arsenault 3d39709159 AMDGPU: Remove wrapper only call limitation
This seems to only have overridden cold handling, which we probably
shouldn't do. As far as I can tell the wrapper library functions are
still inlined as appropriate.
2021-01-12 17:12:49 -05:00
Sebastian Neubauer 6a195491b6 [AMDGPU] Fix failing assert with scratch ST mode
In ST mode, flat scratch instructions have neither an sgpr nor a vgpr
for the address. This lead to an assertion when inserting hard clauses.

Differential Revision: https://reviews.llvm.org/D94406
2021-01-12 09:54:02 +01:00
Kazu Hirata 8590a3e3ad [llvm] Use *Set::contains (NFC) 2021-01-11 18:48:07 -08:00
Joe Nash bcec0f27a2 [AMDGPU] Deduplicate VOP tablegen asm & ins
VOP3 and VOP DPP subroutines to generate input
operands and asm strings were essentially copy
pasted several times. They are deduplicated to
reduce the maintenance burden and allow faster
development.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D94102

Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea
2021-01-11 13:49:26 -05:00