Commit Graph

5803 Commits

Author SHA1 Message Date
Thomas Symalla fce3230be2 Added early exit. 2021-02-02 09:14:52 +01:00
Thomas Symalla d722924f20 Added comments. 2021-02-02 09:14:52 +01:00
Thomas Symalla ec043967ec clang-format 2021-02-02 09:14:52 +01:00
Thomas Symalla 62af0305b7 Added clamp i64 to i16 global isel pattern. 2021-02-02 09:14:52 +01:00
Matt Arsenault 41877b82f0 AMDGPU: Fix dbg_value handling when forming soft clause bundles
DBG_VALUES placed between memory instructions would change
codegen. Skip over these and re-insert them after the bundle instead
of giving up on bundling.
2021-02-01 22:16:35 -05:00
Austin Kerbow e068e236c3 [AMDGPU] Fix release build after 0397dca0. 2021-02-01 08:55:14 -08:00
Austin Kerbow 0397dca021 [AMDGPU] Fix crash with sgpr spills to vgpr disabled
This would assert with amdgpu-spill-sgpr-to-vgpr disabled when trying to
spill the FP.

Fixes: SWDEV-262704

Reviewed By: RamNalamothu

Differential Revision: https://reviews.llvm.org/D95768
2021-02-01 08:35:25 -08:00
Dmitry Preobrazhensky 99b5631649 [AMDGPU][MC] Corrected error position for invalid operands
Generic parser may report an incorrect error position when an offending operand is followed by a comma.
See bug 48884 for details: https://bugs.llvm.org/show_bug.cgi?id=48884.

Differential Revision: https://reviews.llvm.org/D95674
2021-02-01 14:31:08 +03:00
Matt Arsenault 8f14a08863 AMDGPU: Add missing consts 2021-01-31 10:47:57 -05:00
Kazu Hirata 627b5bda11 [llvm] Add missing header guards (NFC)
Identified with llvm-header-guard.
2021-01-30 09:53:42 -08:00
Kazu Hirata b4e780697d [AMDGPU] Forward-declare AMDGPUTargetMachine (NFC)
AMDGPUTargetTransformInfo.h needs AMDGPUTargetMachine but relies on a
forward declaration of AMDGPUTargetMachine in AMDGPU.h.  This patch
adds a forward declaration right in AMDGPUTargetTransformInfo.h.

While we are at it, this patch removes the one in
AMDGPU.h, where it is unnecessary.
2021-01-30 09:53:40 -08:00
Stanislav Mekhanoshin 9dbe736cbd [AMDGPU] Be more specific in needsFrameBaseReg
A condition "mayLoadOrStore" is too broad for that function.

Differential Revision: https://reviews.llvm.org/D95700
2021-01-29 14:40:25 -08:00
Kazu Hirata 7925aa091d [llvm] Populate SmallVector at construction time (NFC) 2021-01-28 22:21:14 -08:00
Kazu Hirata 046cfb8565 [llvm] Forward-declare formatted_raw_ostream (NFC)
Various *TargetStreamer.h need formatted_raw_ostream but rely on a
forward declaration of formatted_raw_ostream in MCStreamer.h.  This
patch adds forward declarations right in *TargetStreamer.h.

While we are at it, this patch removes the one in MCStreamer.h, where
it is unnecessary.
2021-01-28 22:21:13 -08:00
Carl Ritson 0824694d68 [AMDGPU] Fix WMM Entry SCC preservation
SCC was not correctly preserved when entering WWM.
Current lit test was unable to detect this as entry block is
handled differently.
Additionally fix an issue where SCC was unnecessarily preserved
when exiting from WWM to Exact mode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D95500
2021-01-29 10:05:36 +09:00
Carl Ritson 0e8f50595e [AMDGPU] Mark V_SET_INACTIVE as defining SCC
V_SET_INACTIVE is implemented with S_NOT which clobbers SCC.
Mark sure it is marked appropriately.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95509
2021-01-29 09:46:41 +09:00
Simon Pilgrim 0805e40a94 AMDGPUPrintfRuntimeBinding - don't dereference a dyn_cast<> pointer. NFCI.
We dereference the dyn_cast<> in all paths - use cast<> to silence the clang static analyzer warning.
2021-01-28 12:38:44 +00:00
Mirko Brkusanin 3c979ae9ec [AMDGPU][GlobalISel] Remove redundant cmp when copying constant to vcc
Differential Revision: https://reviews.llvm.org/D95540
2021-01-28 11:20:09 +01:00
Mirko Brkusanin 4b422708ba [AMDGPU][GlobalISel] Handle G_PTR_ADD when looking for constant offset
Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant
offset for buffer stores. This also helps with merging of these instructions
later on.

Differential Revision: https://reviews.llvm.org/D95242
2021-01-28 11:20:09 +01:00
Piotr Sobczak fc8e741121 [AMDGPU] Avoid an illegal operand in si-shrink-instructions
Before the patch it was possible to trigger a constant bus
violation when folding immediates into a shrunk instruction.

The patch adds a check to enforce the legality of the new operand.

Differential Revision: https://reviews.llvm.org/D95527
2021-01-28 08:49:21 +01:00
Stanislav Mekhanoshin d91ee2f782 [AMDGPU] Do not reassign spilled registers
We cannot call LRM::unassign() if LRM::assign() was never called
before, these are symmetrical calls. There are two ways of
assigning a physical register to virtual, via LRM::assign() and
via VRM::assignVirt2Phys(). LRM::assign() will call the VRM to
assign the register and then update LiveIntervalUnion. Inline
spiller calls VRM directly and thus LiveIntervalUnion never gets
updated. A call to LRM::unassign() then asserts about inconsistent
liveness.

We have to note that not all callers of the InlineSpiller even
have LRM to pass, RegAllocPBQP does not have it, so we cannot
always pass LRM into the spiller.

The only way to get into that spiller LRE_DidCloneVirtReg() call
is from LiveRangeEdit::eliminateDeadDefs if we split an LI.

This patch refuses to reassign a LiveInterval created by a split
to workaround the problem. In fact we cannot reassign a spill
anyway as all registers of the needed class are occupied and we
are spilling.

Fixes: SWDEV-267996

Differential Revision: https://reviews.llvm.org/D95489
2021-01-27 16:29:05 -08:00
Kazu Hirata 6bde085366 [AMDGPU] Forward-declare TargetRegisterClass (NFC)
AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This patch adds a forward declaration right in
AMDGPUInstructionSelector.h.

While we are at it, this patch removes the one in
InstructionSelector.h, where it is unnecessary.
2021-01-26 20:00:16 -08:00
Austin Kerbow 2291bd137d [AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.

The possible settings are:

- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On

GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".

The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.

Differential Revision: https://reviews.llvm.org/D85882
2021-01-26 11:25:51 -08:00
Matt Arsenault 5f9707b796 AMDGPU: Fix redundant FP spilling/assert in some functions
If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.

Try to predict that we will have an FP later, although this is ugly.
2021-01-26 13:01:45 -05:00
Matt Arsenault 92d1195b5f AMDGPU: Add assertion to determineCalleeSaves
Make sure this isn't getting called multiple times. I was surprised we
were modifying the function here, which I think is a bit questionable.
2021-01-26 13:01:45 -05:00
Simon Pilgrim f82cff31d3 [AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI.
Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions
2021-01-26 16:09:39 +00:00
Simon Pilgrim ee3da8958a [AMDGPU] Fix null-dereference static analysis warnings. NFCI.
Avoid repeated calls to isZeroValue() and check for a null pointer before dereferencing a dyn_cast<>.
2021-01-26 15:43:59 +00:00
Matt Arsenault 551a69e418 AMDGPU: Clear IsSSA property in SIFormMemoryClauses
Fixes verifier error when writing MIR testcases
2021-01-26 10:40:41 -05:00
Mirko Brkusanin 608ac62540 [AMDGPU] Fix use of HasModifiers in VopProfile
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.

Differential Revision: https://reviews.llvm.org/D94795
2021-01-26 15:21:11 +01:00
Dmitry Preobrazhensky 745064e36b [AMDGPU][MC] Refactored exp tgt handling
Summary:
- Separated tgt encoding from parsing;
- Separated tgt decoding from printing;
- Improved errors handling;
- Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0

Reviewers: arsenm, rampitec, foad

Differential Revision: https://reviews.llvm.org/D95216
2021-01-26 14:54:15 +03:00
Kazu Hirata c85b6bf33c [AMDGPU] Forward-declare MachineIRBuilder (NFC)
AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward
declaration of MachineIRBuilder in LegalizerInfo.h.  This patch adds a
forward declaration right in AMDGPULegalizerInfo.h.

While we are at it, this patch removes the one in LegalizerInfo.h,
where it is unnecessary.
2021-01-25 19:24:01 -08:00
Changpeng Fang 5b648df1a8 AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause
Summary:
  RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge.
We observed a long compilation time issue when RPT::reset() is called once for each cluster.

In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get
the register pressure for the later clusters in the same basic block. This could effectively reduce the number
of the expensive calls and thus reduce the compile time.

Reviewers:
  rampitec

Fixes:
  SWDEV-239161

Differential Revision:
  https://reviews.llvm.org/D95273
2021-01-25 16:08:08 -08:00
Konstantin Zhuravlyov 2cdb34efda Revert "[IndirectFunctions] Skip propagating attributes to address taken functions"
This reverts commit dd8ae42674.

This commit causes infinite loop when compiling rocThrust and hipCUB.

Differential Revision: https://reviews.llvm.org/D95389
2021-01-25 15:58:06 -05:00
Dmitry Preobrazhensky 558b3bbb5b [AMDGPU][MC] Improved errors handling for SDWA operands
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D95212
2021-01-25 19:02:53 +03:00
Carl Ritson a80ebd0179 [AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization
Frame-base materialization may insert vector instructions before EXEC is initialised.
Fix this by moving lowering of llvm.amdgcn.init.exec later in backend.
Also remove SI_INIT_EXEC_LO pseudo as this is not necessary.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D94645
2021-01-25 08:31:17 +09:00
Kazu Hirata 054444177b [Target] Use llvm::append_range (NFC) 2021-01-24 12:18:56 -08:00
Kazu Hirata e4847a7fcf Revert "[Target] Use llvm::append_range (NFC)"
This reverts commit cc7a238286.

The X86WinEHState.cpp hunk seems to break certain builds.
2021-01-23 11:25:27 -08:00
Kazu Hirata cc7a238286 [Target] Use llvm::append_range (NFC) 2021-01-23 10:56:31 -08:00
Kazu Hirata 49231c1f80 [llvm] Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2021-01-22 23:25:05 -08:00
Stanislav Mekhanoshin ca904b81e6 [AMDGPU] Fix FP materialization/resolve with flat scratch
Differential Revision: https://reviews.llvm.org/D95266
2021-01-22 16:06:47 -08:00
Stanislav Mekhanoshin 607bec0bb9 Change materializeFrameBaseRegister() to return register
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268
2021-01-22 15:51:06 -08:00
Arthur Eubanks 42d682a217 [NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0
The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0.

Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95250
2021-01-22 12:29:39 -08:00
Sebastian Neubauer 8214982b50 [AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Relands ba7dcd8542, which had memory leaks.

Differential Revision: https://reviews.llvm.org/D95215
2021-01-22 11:24:08 +01:00
Christudasan Devadasan ff8a1cae18 [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing
the initial soffset value. With certain early passes, this value is
getting modified and that brought additional fixup during
eliminateFrameIndex to work for all cases. This whole transformation
looks trivial and can be handled better.

This patch clearly defines the initial value for soffset and keeps it
unchanged before eliminateFrameIndex. The initial value must be zero
for MUBUF with a frame index. The non-frame index MUBUF forms that
use a raw offset from SP will have the stack register for soffset.
During frame elimination, the soffset remains zero for entry functions
with zero dynamic allocas and no callsites, or else is updated to the
appropriate frame/stack register.

Also, did some code clean up and made all asserts around soffset
stricter to match.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D95071
2021-01-22 14:20:59 +05:30
Arthur Eubanks a11bf9a7fb [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Sebastian Neubauer 4dbdff66fe Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"
This reverts commit ba7dcd8542.

(caused memory leaks)
2021-01-21 18:11:48 +01:00
Jay Foad c0b3c5a064 [AMDGPU][GlobalISel] Run SIAddImgInit
This pass is required to get correct codegen for image instructions with
the tfe or lwe bits set.

Differential Revision: https://reviews.llvm.org/D95132
2021-01-21 15:54:54 +00:00
Matt Arsenault 94375d1083 AMDGPU: Remove v_rsq_f64 patterns
This isn't accurate enough without correction
2021-01-21 10:51:36 -05:00
Matt Arsenault 2a0db8d70e AMDGPU: Use more accurate fast f64 fdiv
A raw v_rcp_f64 isn't accurate enough, so start applying correction.
2021-01-21 10:51:36 -05:00
Sebastian Neubauer ba7dcd8542 [AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Differential Revision: https://reviews.llvm.org/D94768
2021-01-21 16:32:17 +01:00
Matt Arsenault 20566a2ed8 AMDGPU: Add occupancy to serialized MachineFunctionInfo
Not sure about the default value handling, but also not sure
defaulting to a theoretically subtarget dependent value.
2021-01-21 09:21:00 -05:00
madhur13490 dd8ae42674 [IndirectFunctions] Skip propagating attributes to address taken functions
In case of indirect calls or address taken functions,
skip propagating any attributes to them. We just
propagate features to such functions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94585
2021-01-21 07:04:28 +00:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Mirko Brkusanin a6a72dfdf2 [AMDGPU][GlobalISel] Avoid selecting S_PACK with constants
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.

Differential Revision: https://reviews.llvm.org/D92219
2021-01-20 11:54:53 +01:00
Petar Avramovic 4ab704d628 [AMDGPU][MC] Add tfe disassembler support MIMG opcodes
With tfe on there can be a vgpr write to vdata+1.
Add tablegen support for 5 register vdata store.
This is required for 4 register vdata store with tfe.

Differential Revision: https://reviews.llvm.org/D94960
2021-01-20 10:37:09 +01:00
Jay Foad 18cb7441b6 [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975
2021-01-19 18:47:14 +00:00
Jay Foad 49dce85584 [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.
Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00
2021-01-19 10:39:56 +00:00
Dmitry Preobrazhensky 30b8f55378 Fix for sanitizer issue in 55c557a 2021-01-18 18:39:55 +03:00
Dmitry Preobrazhensky 55c557a5d2 [AMDGPU][MC] Refactored parsing of dpp ctrl
Summary of changes:
- simplified code to improve maintainability;
- replaced lex() with higher level parser functions;
- improved errors handling.

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D94777
2021-01-18 18:14:19 +03:00
Dmitry Preobrazhensky 911961c9c1 [AMDGPU][MC][GFX10] Improved dpp8 errors handling
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D94756
2021-01-18 15:02:31 +03:00
Kazu Hirata 352fcfc697 [llvm] Use llvm::sort (NFC) 2021-01-17 10:39:45 -08:00
Kazu Hirata 2082b10d10 [llvm] Use *::empty (NFC) 2021-01-16 09:40:55 -08:00
Kazu Hirata 19aacdb715 [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-16 09:40:53 -08:00
Kazu Hirata 4707b21298 [AMDGPU] Use llvm::is_contained (NFC) 2021-01-15 21:00:54 -08:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Matt Arsenault d55d592a92 GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper
This fixes double printing of insertion debug messages in the
legalizer.

Try to cleanup usage of observers. Currently the use of observers is
pretty hard to follow and it's not clear what is responsible for
them. Observers are referenced in 3 places:

1. In the MachineFunction
2. In the MachineIRBuilder
3. In the LegalizerHelper

The observers in the MachineFunction and MachineIRBuilder are both
called only on insertions, and are redundant with each other. The
source of the double printing was the same observer was added to both
the MachineFunction, and the MachineIRBuilder. One of these references
needs to be removed. Arguably observers in general should be fully
removed from one or the other, but it may be useful to have a local
observer in the MachineIRBuilder that is not added to the function's
observers. Alternatively, the wrapper observer could manage a local
observer in one place.

The LegalizerHelper only ever calls the observer on changing/changed
instructions, and never insertions. Logically these are two different
types of observers, for changes and for insertions.

Additionally, some places used the GISelObserverWrapper when they only
needed a single observer they could use directly.

Setting the observer in the LegalizerHelper constructor is not
flexible enough if the LegalizerHelper is constructed anywhere outside
the one used by the legalizer. AMDGPU calls the LegalizerHelper in
RegBankSelect, and needs to use a local observer to apply the regbank
to newly created instructions. Currently it accomplishes this by
constructing a local MachineIRBuilder. I'm trying to move the
MachineIRBuilder to be owned/maintained by the RegBankSelect pass
itself, but the locally constructed LegalizerHelper would reset the
observer.

Mips also has a special case use of the LegalizationArtifactCombiner
in applyMappingImpl; I think we do need to run the artifact combiner
during RegBankSelect, but in a more consistent way outside of
applyMappingImpl.
2021-01-13 10:44:31 -05:00
Carl Ritson 790c75c163 [AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader
Add pseudo instruction to allow early termination of pixel shader
anywhere based on the value of SCC.  The intention is to use this
when a mask of live lanes is updated, e.g. live lanes in WQM pass.
This facilitates early termination of shaders even when EXEC is
incomplete, e.g. in non-uniform control flow.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D88777
2021-01-13 13:29:05 +09:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
Matt Arsenault 3d39709159 AMDGPU: Remove wrapper only call limitation
This seems to only have overridden cold handling, which we probably
shouldn't do. As far as I can tell the wrapper library functions are
still inlined as appropriate.
2021-01-12 17:12:49 -05:00
Sebastian Neubauer 6a195491b6 [AMDGPU] Fix failing assert with scratch ST mode
In ST mode, flat scratch instructions have neither an sgpr nor a vgpr
for the address. This lead to an assertion when inserting hard clauses.

Differential Revision: https://reviews.llvm.org/D94406
2021-01-12 09:54:02 +01:00
Kazu Hirata 8590a3e3ad [llvm] Use *Set::contains (NFC) 2021-01-11 18:48:07 -08:00
Joe Nash bcec0f27a2 [AMDGPU] Deduplicate VOP tablegen asm & ins
VOP3 and VOP DPP subroutines to generate input
operands and asm strings were essentially copy
pasted several times. They are deduplicated to
reduce the maintenance burden and allow faster
development.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D94102

Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea
2021-01-11 13:49:26 -05:00
QingShan Zhang 7539c75bb4 [DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D93891
2021-01-11 02:25:53 +00:00
Kazu Hirata b7c5e0b02c [Target, Transforms] Use *Set::contains (NFC) 2021-01-08 18:39:54 -08:00
Tony 2f499b9aff [AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.

A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.

Differential Revision: https://reviews.llvm.org/D94214
2021-01-09 00:52:33 +00:00
Christudasan Devadasan ae25a397e9 AMDGPU/GlobalISel: Enable sret demotion 2021-01-08 10:56:35 +05:30
Kazu Hirata b934160aaa [Target] Use llvm::find_if (NFC) 2021-01-07 20:29:36 -08:00
Mehdi Amini 467e916d30 Fix gcc5 build failure (NFC)
The loop index was shadowing the container name.
It seems that we can just not use a for-range loop here since there is
an induction variable anyway.

Differential Revision: https://reviews.llvm.org/D94254
2021-01-07 20:11:57 +00:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Matt Arsenault 6b7d5a928f AMDGPU/GlobalISel: Start cleaning up calling convention lowering
There are various hacks working around limitations in
handleAssignments, and the logical split between different parts isn't
correct. Start separating the type legalization to satisfy going
through the DAG infrastructure from the code required to split into
register types. The type splitting should be moved to generic code.
2021-01-07 10:36:45 -05:00
Matt Arsenault ab3a3f543b AMDGPU/GlobalISel: Update fdiv lowering for denormal/ulp interaction
Change the GlobalISel fast fdiv handling to match the changes in
2531535984 and
884acbb9e1
2021-01-06 12:32:01 -05:00
Christudasan Devadasan d68458bd56 [GlobalISel] Base implementation for sret demotion.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.

Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92953
2021-01-06 10:30:50 +05:30
Changpeng Fang cb5b52a06e AMDGPU: Annotate amdgpu.noclobber for global loads only
Summary:
  This is to avoid unnecessary analysis since amdgpu.noclobber is only used for globals.

Reviewers:
  arsenm

Fixes:
   SWDEV-239161

Differential Revision:
  https://reviews.llvm.org/D94107
2021-01-05 14:47:19 -08:00
Arthur Eubanks 28a326eba0 [NFC] Rename registerAliasAnalyses -> registerDefaultAliasAnalyses
To clarify that this only affects the "default" AA.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D93980
2021-01-05 11:07:58 -08:00
Joe Nash 60466fad2d [AMDGPU] Remove deprecated V_MUL_LO_I32 from GFX10
It was removed in GFX10 GPUs, but LLVM could
generate it.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D94020

Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c
2021-01-05 11:59:57 -05:00
Jay Foad 3914bebe91 [AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands
Convert it to v_fma_legacy_f32 if it is profitable to do so, just like
other mac instructions that are converted to their mad equivalents.

Differential Revision: https://reviews.llvm.org/D94010
2021-01-05 11:55:33 +00:00
Jay Foad 4e6054a86c [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.
Differential Revision: https://reviews.llvm.org/D94009
2021-01-05 11:54:48 +00:00
Arthur Eubanks 8e293fe6ad [NewPM][AMDGPU] Pass TargetMachine to AMDGPUSimplifyLibCallsPass
Missed in https://reviews.llvm.org/D93863.
2021-01-04 13:48:09 -08:00
Arthur Eubanks 191552344b [NewPM][AMDGPU] Make amdgpu-aa work with NewPM
An AMDGPUAA class already existed that was supposed to work with the new
PM, but it wasn't tested and was a bit broken.

Fix up the existing classes to have the right keys/parameters.
Wire up AMDGPUAA inside AMDGPUTargetMachine.

Add it to the list of alias analyses for the "default" AAManager since
in adjustPassManager() amdgpu-aa is added into the pipeline at the
beginning.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93914
2021-01-04 12:36:27 -08:00
Arthur Eubanks 4e838ba9ea [NewPM][AMDGPU] Port amdgpu-always-inline
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94025
2021-01-04 12:27:01 -08:00
Arthur Eubanks fd323a897c [NewPM][AMDGPU] Port amdgpu-printf-runtime-binding
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94026
2021-01-04 12:25:50 -08:00
Arthur Eubanks e1833e7493 [NewPM][AMDGPU] Port amdgpu-unify-metadata
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94023
2021-01-04 11:57:46 -08:00
Arthur Eubanks a5f863e076 [NewPM][AMDGPU] Port amdgpu-propagate-attributes-early/late
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94022
2021-01-04 11:53:37 -08:00
Arthur Eubanks b8f22f9d30 [NewPM][AMDGPU] Run InternalizePass when -amdgpu-internalize-symbols
The legacy PM doesn't run EP_ModuleOptimizerEarly on -O0, so skip
running it here when given O0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93886
2021-01-04 11:34:40 -08:00
Kazu Hirata 0e219b6443 [Target] Construct SmallVector with iterator ranges (NFC) 2021-01-03 09:57:45 -08:00
Kazu Hirata 985f899bf2 [Target] Use llvm::append_range (NFC) 2021-01-03 09:57:43 -08:00
Roman Lebedev 7c8b8063b6
[SimplifyCFG][AMDGPU] AMDGPUUnifyDivergentExitNodes: SimplifyCFG isn't ready to preserve PostDomTree
There is a number of transforms in SimplifyCFG that take DomTree out of
DomTreeUpdater, and do updates manually. Until they are fixed,
user passes are unable to claim that PDT is preserved.

Note that the default for SimplifyCFG is still not to preserve DomTree,
so this is still effectively NFC.
2021-01-03 01:45:46 +03:00
Roman Lebedev 4b80647367
[AMDGPU][SimplifyCFG] Teach AMDGPUUnifyDivergentExitNodes to preserve {,Post}DomTree
This is a (last big?) part of the patch series to make SimplifyCFG
preserve DomTree. Currently, it still does not actually preserve it,
even thought it is pretty much fully updated to preserve it.

Once the default is flipped, a valid DomTree must be passed into
simplifyCFG, which means that whatever pass calls simplifyCFG,
should also be smart about DomTree's.

As far as i can see from `check-llvm` with default flipped,
this is the last LLVM test batch (other than bugpoint tests)
that needed fixes to not break with default flipped.

The changes here are boringly identical to the ones i did
over 42+ times/commits recently already,
so while AMDGPU is outside of my normal ecosystem,
i'm going to go for post-commit review here,
like in all the other 42+ changes.

Note that while the pass is taught to preserve {,Post}DomTree,
it still doesn't do that by default, because simplifycfg
still doesn't do that by default, and flipping default
in this pass will implicitly flip the default for simplifycfg.
That will happen, but not right now.
2021-01-02 01:01:20 +03:00
Juneyoung Lee 420d046d6b clang-format, address warnings 2020-12-30 23:05:07 +09:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Arthur Eubanks 7ecbe0c7a0 [NewPM][AMDGPU] Port amdgpu-lower-kernel-attributes
And add it to the AMDGPU opt pipeline.

This is a function pass instead of a module pass (like the legacy pass)
because it's getting added to a CGSCCPassManager, and you can't put a
module pass in a CGSCCPassManager.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93885
2020-12-29 10:26:06 -08:00
Arthur Eubanks c2ef06d3dd [NewPM] Port infer-address-spaces
And add it to the AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93880
2020-12-28 19:58:12 -08:00
Arthur Eubanks 0e9abcfc19 [AMDGPU][NewPM] Port amdgpu-promote-alloca(-to-vector)
And add to AMDGPU opt pipeline.

Don't pin an opt run to the legacy PM when -enable-new-pm=1 if these
passes (or passes introduced in https://reviews.llvm.org/D93863) are in
the list of passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93875
2020-12-28 17:52:31 -08:00
Arthur Eubanks 9abc457724 [NewPM][AMDGPU] Port amdgpu-simplifylib/amdgpu-usenative
And add them to the pipeline via
AMDGPUTargetMachine::registerPassBuilderCallbacks(), which mirrors
AMDGPUTargetMachine::adjustPassManager().

These passes can't be unconditionally added to PassRegistry.def since
they are only present when the AMDGPU backend is enabled. And there are
no target-specific headers in llvm/include, so parsing these pass names
must occur somewhere in the AMDGPU directory. I decided the best place
was inside the TargetMachine, since the PassBuilder invokes
TargetMachine::registerPassBuilderCallbacks() anyway. If we come up with
a cleaner solution for target-specific passes in the future that's fine,
but there aren't too many target-specific IR passes living in
target-specific directories so it shouldn't be too bad to change in the
future.

Reviewed By: ychen, arsenm

Differential Revision: https://reviews.llvm.org/D93863
2020-12-28 10:38:51 -08:00
alex-t 644da789e3 [AMDGPU] Split edge to make si_if dominate end_cf
Basic block containing "if" not necessarily dominates block that is the "false" target for the if.

That "false" target block may have another predecessor besides the "if" block. IR value corresponding to the Exec mask is generated by the

si_if intrinsic and then used by the end_cf intrinsic. In this case IR verifier complains that 'Def does not dominate all uses'.

This change split the edge between the "if" block and "false" target block to make it dominated by the "if" block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D91435
2020-12-28 17:14:02 +03:00
Dmitry Preobrazhensky 8c25bb3d0d [AMDGPU][MC] Improved errors handling for v_interp* operands
See bug 48596 (https://bugs.llvm.org/show_bug.cgi?id=48596)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D93757
2020-12-28 16:15:48 +03:00
Dmitry Preobrazhensky 5b17263b6b [AMDGPU][MC][NFC] Parser refactoring
See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D93756
2020-12-28 14:59:49 +03:00
Kazu Hirata d6ff5cf995 [Target] Use llvm::any_of (NFC) 2020-12-24 19:43:26 -08:00
Praveen Velliengiri 61177943c9 [AMDGPU] Use MUBUF instructions for global address space access
Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This patch
fix the crash by selecting MUBUF instructions for global load/stores
in gfx600.

Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com>

Reviewed by: t-tye

Differential revision: https://reviews.llvm.org/D92483
2020-12-24 10:13:04 +00:00
Stanislav Mekhanoshin 747f67e034 [AMDGPU] Fix adjustWritemask subreg handling
If we happen to extract a non-dword subreg that breaks the
logic of the function and it may shrink the dmask because
it does not recognize the use of a lane(s).

This bug is next to impossible to trigger with the current
lowering in the BE, but it breaks in one of my future patches.

Differential Revision: https://reviews.llvm.org/D93782
2020-12-23 14:43:31 -08:00
Sebastian Neubauer 221fdedc69 [AMDGPU][GlobalISel] Fold flat vgpr + constant addresses
Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more
vgpr+constant addresses.

Differential Revision: https://reviews.llvm.org/D93692
2020-12-23 10:40:30 +01:00
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Matt Arsenault 8bf9cdeaee AMDGPU: Use Register 2020-12-22 21:55:59 -05:00
Matt Arsenault bac54639c7 AMDGPU: Add spilled CSR SGPRs to entry block live ins 2020-12-22 21:55:59 -05:00
Matt Arsenault 29ed846d67 AMDGPU: Fix assert when checking for implicit operand legality 2020-12-22 20:56:24 -05:00
Stanislav Mekhanoshin d15119a02d [AMDGPU][GlobalISel] GlobalISel for flat scratch
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.

Differential Revision: https://reviews.llvm.org/D93670
2020-12-22 16:33:06 -08:00
Stanislav Mekhanoshin ca4bf58e4e [AMDGPU] Support unaligned flat scratch in TLI
Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for
unaligned flat scratch support. Mostly needed for global isel.

Differential Revision: https://reviews.llvm.org/D93669
2020-12-22 16:12:31 -08:00
Stanislav Mekhanoshin ae8f4b2178 [AMDGPU] Folding of FI operand with flat scratch
Differential Revision: https://reviews.llvm.org/D93501
2020-12-22 10:48:04 -08:00
Dmitry Preobrazhensky f4f49d9d0d [AMDGPU][MC][NFC] Fix for sanitizer error in 8ab5770
Corrected to fix sanitizer error introduced by 8ab5770
2020-12-21 20:42:35 +03:00
Dmitry Preobrazhensky 8ab5770a17 [AMDGPU][MC][NFC] Parser refactoring
See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D93548
2020-12-21 20:21:07 +03:00
Carl Ritson 7722494834 [AMDGPU][NFC] Remove unused Hi16Elt definition 2020-12-18 20:38:54 +09:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Matt Arsenault f333736757 AMDGPU: Remove SGPRSpillVGPRDefinedSet hack
These VGPRs should be reserved and therefore do not need "correct"
liveness. They should not have undef uses, which can still cause
issues.
2020-12-16 21:33:35 -05:00
Roman Lebedev 49dac4aca0
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Piotr Sobczak c7afb698ca [AMDGPU] Avoid calling copyFastMathFlags in wrong context
Calling Instruction::copyFastMathFlags() assumes the caller is
FPMathOperator. Avoid calling the function for instructions
that are not instances of FPMathOperator.
2020-12-16 10:22:51 +01:00
Sebastian Neubauer 409a2f0f9e [AMDGPU] Allow no saddr for global addtid insts
I think the global_load/store_dword_addtid instructions support
switching off the scalar address.
Add assembler and disassembler support for this.

Differential Revision: https://reviews.llvm.org/D93288
2020-12-16 10:01:40 +01:00
Stanislav Mekhanoshin eb66bf0802 [AMDGPU] Print SCRATCH_EN field after the kernel
Differential Revision: https://reviews.llvm.org/D93353
2020-12-15 22:44:30 -08:00
Matt Arsenault 97f51f0489 AMDGPU: Remove redundant CCAction for i1 2020-12-15 17:00:27 -05:00
Tony d5ea8f7010 [AMDGPU] Clarify scratch initialization
- Clarify documentation on initializing scratch.
- Rename compute_pgm_rsrc2 field for enabling scratch from
  ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET to
  ENABLE_PRIVATE_SEGMENT to match hardware definition.

Differential Revision: https://reviews.llvm.org/D93271
2020-12-15 20:14:20 +00:00
Sebastian Neubauer 91445979be [AMDGPU] Unify flat offset logic
Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into
AMDGPUBaseInfo.

Differential Revision: https://reviews.llvm.org/D93287
2020-12-15 14:59:59 +01:00
Changpeng Fang ce0c0013d8 AMDGPU: If a store defines (alias) a load, it clobbers the load.
Summary:
 If a store defines (must alias) a load, it clobbers the load.

Fixes: SWDEV-258915

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D92951
2020-12-14 16:34:32 -08:00
Stanislav Mekhanoshin cf5845d6c4 [AMDGPU] Use multi-dword flat scratch for spilling
Differential Revision: https://reviews.llvm.org/D93067
2020-12-14 14:19:29 -08:00
Michael Liao 1fd1f638b6 [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.
- Once an instruction is simplified, foldable candidates from it should
  be invalidated or skipped as the operand index is no longer valid.

Differential Revision: https://reviews.llvm.org/D93174
2020-12-14 13:08:13 -05:00
Stanislav Mekhanoshin 87d7757bbe [SLP] Control maximum vectorization factor from TTI
D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059
2020-12-14 08:49:40 -08:00
Jay Foad 07e92e6b60 [AMDGPU] Make use of HasSMemRealTime predicate. NFC.
We have this subtarget feature so it makes sense to use it here. This is
NFC because it's always defined by default on GFX8+.

Differential Revision: https://reviews.llvm.org/D93202
2020-12-14 16:34:57 +00:00
Carl Ritson 62c246eda2 [AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0
These parameters set a default value of 0, so I believe they
should include a 0 suffix. This allows for versions which do not
set a default value in future.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93187
2020-12-14 20:01:56 +09:00
Carl Ritson af4570cd3a [AMDGPU][NFC] Remove unused VOP3Mods0Clamp
This is unused and the selection function does not exist.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93188
2020-12-14 20:00:58 +09:00
Sebastian Neubauer 5733167f54 [AMDGPU] Mark amdgpu_gfx functions as module entry function
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata

Differential Revision: https://reviews.llvm.org/D92946
2020-12-14 10:43:39 +01:00
Jay Foad 4f25e53982 [AMDGPU] Make use of emitRemovedIntrinsicError. NFC.
Change-Id: I482bbf528255f2eacd3878ddfe7edb9a8f63d5c2
2020-12-11 14:02:14 +00:00
Mirko Brkusanin 0c7cce54eb [AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2
Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned
loads but the one that will be selected is determined either by the order in
tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has
priority.

While ds_read2_b64 has lower alignment requirements, we cannot always
restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode
option. This was causing ds_read_b128 to be selected for 8-byte aligned
loads regardles of chosen access mode.

To resolve this we use two patterns for selecting ds_read_b128. One
requires alignment of 16-byte and the other requires
unaligned-access-mode option.

Same goes for ds_write2_b64 and ds_write_b128.

Differential Revision: https://reviews.llvm.org/D92767
2020-12-10 12:40:49 +01:00
Stanislav Mekhanoshin 4617cc68f6 [AMDGPU] Fix expansion of 192 bit spills in PEI
Differential Revision: https://reviews.llvm.org/D92979
2020-12-09 16:36:29 -08:00
Scott Linder 9260a99999 [MC][AMDGPU] Consume EndOfStatement in asm parser
Avoids spurious newlines showing up in the output when emitting assembly
via MC.

Reviewed By: MaskRay, arsenm

Differential Revision: https://reviews.llvm.org/D92690
2020-12-09 21:45:55 +00:00
Scott Linder f5f4b8b60f [AMDGPU][MC] Restore old error position for "too few operands"
Revert part of https://reviews.llvm.org/D92084 to make it simpler to
start consuming the EndOfStatement token within AMDGPU's
ParseInstruction in a future patch. This also brings us back to what
every other target currently does.

A future change to move the position back to the end of the statement
would likely need to audit all of the AMDGPUOperand SMLoc ranges, and
determine the SMLoc for the last character of the last operand.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D92960
2020-12-09 21:09:47 +00:00
Austin Kerbow 4aa842a800 [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Stanislav Mekhanoshin dd89249498 [AMDGPU] Annotate vgpr<->agpr spills in asm
Differential Revision: https://reviews.llvm.org/D92125
2020-12-07 11:25:25 -08:00
Petar Avramovic 3a042dcd2e [AMDGPU] Fix default value of glc for mubuf rtn atomics
Mubuf rtn atomics use GLC_1 thus default value for glc operand
should be -1, see https://reviews.llvm.org/D90730.
This allows us to report error when rtn atomic requires glc=1
but does not have glc operand in input.

Differential Revision: https://reviews.llvm.org/D92654
2020-12-07 14:00:08 +01:00
Dmitry Preobrazhensky a0b3a9391c [AMDGPU][MC] Improved diagnostics message for sym/expr operands
See bug 48295 (https://bugs.llvm.org/show_bug.cgi?id=48295)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D92088
2020-12-05 14:05:53 +03:00
Dmitry Preobrazhensky e97dd11977 [AMDGPU][MC] Corrected error position for invalid MOVREL src
See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D92084
2020-12-05 13:23:14 +03:00
Kazu Hirata 2dc4a14e4d [AMDGPU] Use llvm::is_contained (NFC) 2020-12-04 21:42:55 -08:00
Paul C. Anagnostopoulos 415fab6f67 [TableGen] Eliminate the 'code' type
Update the documentation.

Rework various backends that relied on the code type.

Differential Revision: https://reviews.llvm.org/D92269
2020-12-03 10:19:11 -05:00
Mircea Trofin bab72dd5d5 [NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister.
Typing the API appropriately.

Differential Revision: https://reviews.llvm.org/D92341
2020-12-02 15:46:38 -08:00
Jay Foad d28624a209 [AMDGPU] Stop adding an implicit def of vcc_hi for wave32
This doesn't seem to be needed for anything.

Differential Revision: https://reviews.llvm.org/D92400
2020-12-02 10:11:42 +00:00
Caroline Concatto 4b0ef2b075 [NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type
This patch replaces the attribute  `unsigned VF`  in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.

Differential Revision: https://reviews.llvm.org/D91532
2020-12-01 11:12:51 +00:00
Jay Foad 839c9635ed [AMDGPU] Simplify some generation checks. NFC. 2020-12-01 10:15:32 +00:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Jay Foad 4f87d30a06 [AMDGPU] Introduce and use isGFX10Plus. NFC.
It's more future-proof to use isGFX10Plus from the start, on the
assumption that future architectures will be based on current
architectures.

Also make use of the existing isGFX9Plus in a few places.

Differential Revision: https://reviews.llvm.org/D92092
2020-11-26 09:02:36 +00:00
Sebastian Neubauer edd675643d [AMDGPU] Emit stack frame size in metadata
Add .shader_functions to pal metadata, which contains the stack frame
size for all non-entry-point functions.

Differential Revision: https://reviews.llvm.org/D90036
2020-11-25 16:30:02 +01:00
Jay Foad 4926eed59c [AMDGPU] Add a TRANS bit to TSFlags. NFC.
This is used to mark transcendental instructions that execute on a
separate pipeline from the normal VALU pipeline.

Differential Revision: https://reviews.llvm.org/D92042
2020-11-24 17:49:56 +00:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Dmitry Preobrazhensky ce44bf2cf2 [AMDGPU][MC] Improved diagnostic messages
See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91794
2020-11-23 16:15:05 +03:00
Dmitry Preobrazhensky e4effef330 [AMDGPU][MC] Improved diagnostic messages for invalid literals
See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91793
2020-11-23 15:48:06 +03:00
Matt Arsenault 79f75468b4 AMDGPU: Fix counting kernel arguments towards register usage
Also use DataLayout to get type size. Relying on the IR type size is
also pretty broken here, since this won't perfectly capture how types
are legalized.
2020-11-20 21:23:33 -05:00
Alex Richardson 51e09e1d5a [AMDGPU] Set the default globals address space to 1
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D84345
2020-11-20 15:46:53 +00:00
Sebastian Neubauer 7a18bdb350 [AMDGPU] Implement flat scratch init for pal
Extract the scratch offset from the scratch buffer descriptor that is
stored in the global table.

Differential Revision: https://reviews.llvm.org/D91701
2020-11-20 11:14:30 +01:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Duncan P. N. Exon Smith 5abf76fbe3 ADT: Add assertions to SmallVector::insert, etc., for reference invalidation
2c196bbc6b asserted that
`SmallVector::push_back` doesn't invalidate the parameter when it needs
to grow. Do the same for `resize`, `append`, `assign`, `insert`, and
`emplace_back`.

Differential Revision: https://reviews.llvm.org/D91744
2020-11-18 17:36:28 -08:00
Sebastian Neubauer 72ccec1bbc [AMDGPU] Fix v3f16 interaction with image store workaround
In some cases, the wrong amount of registers was reserved.

Also enable more v3f16 tests.

Differential Revision: https://reviews.llvm.org/D90847
2020-11-18 18:21:04 +01:00
Jay Foad 7ecf19697e [AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of
sync with the value in vcc. This fixes them in two ways:

1. Fix the case where the def of vcc was in a previous basic block, by
pessimistically assuming that vccz might be incorrect at a basic block
boundary.

2. Fix the handling of pre-existing waitcnt instructions by calling
generateWaitcntInstBefore before examining ScoreBrackets to determine
whether there's an outstanding smem read operation.

Differential Revision: https://reviews.llvm.org/D91636
2020-11-18 15:26:06 +00:00
Jay Foad 9f69c1bc54 [AMDGPU] Rename pseudo S_WAITCNT_IDLE to S_WAIT_IDLE. NFC. 2020-11-18 14:03:43 +00:00
Florian Hahn b2f4c5fddc
[AsmWriter] Factor out mnemonic generation to accessible getMnemonic.
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).

Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D90039
2020-11-17 09:47:38 +00:00
Michael Liao f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Matt Arsenault d2e52eec51 AMDGPU: Select global saddr mode from SGPR pointer
Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer
instruction to materialize the 0 vs. the 64-bit copy.
2020-11-16 11:51:06 -05:00
Matt Arsenault a6e353b1d0 AMDGPU: Split large offsets when selecting global saddr mode
When the offset doesn't fit in the immediate field, move some to
voffset.
2020-11-16 11:36:01 -05:00
Jay Foad a6ecb2eb3d [AMDGPU] Add comments. NFC. 2020-11-16 16:34:13 +00:00
Dmitry Preobrazhensky 65f3e121fe [AMDGPU][MC] Corrected error position for some operands and modifiers
Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91412
2020-11-16 16:11:23 +03:00
Dmitry Preobrazhensky 0bee8c784b [AMDGPU][MC] Corrected error position for swizzle()
Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91408
2020-11-16 14:37:57 +03:00
Dmitry Preobrazhensky 89df8fc0d7 [AMDGPU][MC] Corrected error position for hwreg() and sendmsg()
Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91407
2020-11-16 14:25:07 +03:00
Stanislav Mekhanoshin c9821cec74 [AMDGPU] Mark sin/cos load folding as modifying the function.
When the load value is folded into the sin/cos operation, the
AMDGPU library calls simplifier could still mark the function
as unmodified. Instead ensure if there is an early return,
return whether the load was folded into the sin/cos call.

Authored by MJDSys

Differential Revision: https://reviews.llvm.org/D91401
2020-11-13 14:49:33 -08:00
Jessica Paquette b184a2eccf [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Matt Arsenault e722943e05 AMDGPU: Factor out large flat offset splitting 2020-11-13 11:22:13 -05:00
Matt Arsenault 0fd6a04ba4 AMDGPU: Refactor getBaseWithOffsetUsingSplitOR usage 2020-11-13 10:58:17 -05:00
Simon Pilgrim a4d3691d55 Fix MSVC signed/unsigned comparison warning. NFCI. 2020-11-13 10:20:48 +00:00
Jay Foad ad3ec08955 [AMDGPU] One more use of the new export target names. NFC. 2020-11-13 09:44:09 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Stanislav Mekhanoshin 5ab1702129 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin cf6565f6d0 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad 6881a82e8c [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00
Jay Foad d7d6ac5624 [AMDGPU] Define and use names for export targets. NFC.
Differential Revision: https://reviews.llvm.org/D91289
2020-11-12 19:57:14 +00:00
Jay Foad f23c4c6f8a [AMDGPU] Separate out real exp instructions by subtarget. NFC.
Differential Revision: https://reviews.llvm.org/D91247
2020-11-11 17:13:40 +00:00
Jay Foad 2b33ea6935 [AMDGPU] Split exp instructions out into their own tablegen file. NFC.
Differential Revision: https://reviews.llvm.org/D91246
2020-11-11 17:13:40 +00:00
Jay Foad f94fd1c8ca [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Jay Foad 830ed64ccd Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103.

The underlying problems were fixed by D90607.
2020-11-11 14:40:14 +00:00
Stanislav Mekhanoshin 544ef42e40 [AMDGPU] Set default op_sel_hi on accvgpr read/write
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.

Differential Revision: https://reviews.llvm.org/D91202
2020-11-10 13:07:29 -08:00
Jay Foad bb8d1437a6 [AMDGPU] Simplify multiclass EXP_m. NFC. 2020-11-10 17:28:36 +00:00
Jay Foad 0ad4d04002 [AMDGPU] Remove an unused return value. NFC.
Differential Revision: https://reviews.llvm.org/D91063
2020-11-10 09:15:14 +00:00
Carl Ritson fde8351743 [AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451
2020-11-10 12:16:31 +09:00
Mircea Trofin 2ac3a7d0c4 [NFC] Use [MC]Register
Differential Revision: https://reviews.llvm.org/D90795
2020-11-09 08:37:14 -08:00
Stanislav Mekhanoshin d5a465866e [AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
2020-11-09 08:05:20 -08:00
Paul C. Anagnostopoulos 91d2e5c81a [TableGen] Add the !filter bang operator.
Add a test. Update the Programmer's Reference.

Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D91008
2020-11-09 10:56:55 -05:00