Commit Graph

6124 Commits

Author SHA1 Message Date
Dmitry Preobrazhensky 586df38478 [AMDGPU][MC] Corrected parsing of optional modifiers
Fixed bugs in parsing of "no*" modifiers and improved errors handling.
See https://bugs.llvm.org/show_bug.cgi?id=41282.

Differential Revision: https://reviews.llvm.org/D95675
2021-02-02 14:52:29 +03:00
Benjamin Kramer 679ef22f2e Fold one-use variable into assert. NFCI.
Avoids a warning in Release builds.
2021-02-02 10:50:48 +01:00
Sebastian Neubauer b91afa474e [AMDGPU] Mark epilog restores as frame-destroy
I guess instructions were marked as frame-setup by accident, they are
restores as part of the epilog.

Differential Revision: https://reviews.llvm.org/D95783
2021-02-02 10:24:37 +01:00
Thomas Symalla faeed774d1 Fixed includes.
Differential Revision: https://reviews.llvm.org/D93708
2021-02-02 09:14:54 +01:00
Thomas Symalla 6c85e98f06 Fixed includes. 2021-02-02 09:14:54 +01:00
Thomas Symalla 09508d2849 Reverted whitespace changes.
Differential Revision: https://reviews.llvm.org/D90968
2021-02-02 09:14:54 +01:00
Thomas Symalla e630dd476c Added missing includes. 2021-02-02 09:14:54 +01:00
Thomas Symalla 602896b9d2 Renamed med3 opcode, removed superfluous copy. 2021-02-02 09:14:54 +01:00
Thomas Symalla fa3e840d3d Removed the generic virtual register creations. Reworked the tests. 2021-02-02 09:14:54 +01:00
Thomas Symalla c781c25412 Implemented a MED3_S32 GIR opcode. 2021-02-02 09:14:53 +01:00
Thomas Symalla 6604d81e1b Added and used new target pseudo for v_cvt_pk_i16_i32, changes due to code review. 2021-02-02 09:14:53 +01:00
Thomas Symalla 52bfb50145 Formatting changes 2021-02-02 09:14:53 +01:00
Thomas Symalla 7d24026ed2 Formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla bcd6c2d203 Updating formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla ecbed4e0ab Resolve formatting changes. 2021-02-02 09:14:53 +01:00
Thomas Symalla 7b2e701906 Code changes yielded from review. 2021-02-02 09:14:53 +01:00
Thomas Symalla 3a46502264 Move step to PreLegalizer 2021-02-02 09:14:53 +01:00
Thomas Symalla cdfd9b3bf5 Move Combiner to PreLegalize step 2021-02-02 09:14:53 +01:00
Thomas Symalla 9a8da909f1 Reverted unintended git-format change. 2021-02-02 09:14:52 +01:00
Thomas Symalla dae85e4671 Fixed the lit tests and a bug in the implementation. 2021-02-02 09:14:52 +01:00
Thomas Symalla 88a832aef1 Refactored the pattern matching. 2021-02-02 09:14:52 +01:00
Thomas Symalla fce3230be2 Added early exit. 2021-02-02 09:14:52 +01:00
Thomas Symalla d722924f20 Added comments. 2021-02-02 09:14:52 +01:00
Thomas Symalla ec043967ec clang-format 2021-02-02 09:14:52 +01:00
Thomas Symalla 62af0305b7 Added clamp i64 to i16 global isel pattern. 2021-02-02 09:14:52 +01:00
Matt Arsenault 41877b82f0 AMDGPU: Fix dbg_value handling when forming soft clause bundles
DBG_VALUES placed between memory instructions would change
codegen. Skip over these and re-insert them after the bundle instead
of giving up on bundling.
2021-02-01 22:16:35 -05:00
Austin Kerbow e068e236c3 [AMDGPU] Fix release build after 0397dca0. 2021-02-01 08:55:14 -08:00
Austin Kerbow 0397dca021 [AMDGPU] Fix crash with sgpr spills to vgpr disabled
This would assert with amdgpu-spill-sgpr-to-vgpr disabled when trying to
spill the FP.

Fixes: SWDEV-262704

Reviewed By: RamNalamothu

Differential Revision: https://reviews.llvm.org/D95768
2021-02-01 08:35:25 -08:00
Dmitry Preobrazhensky 99b5631649 [AMDGPU][MC] Corrected error position for invalid operands
Generic parser may report an incorrect error position when an offending operand is followed by a comma.
See bug 48884 for details: https://bugs.llvm.org/show_bug.cgi?id=48884.

Differential Revision: https://reviews.llvm.org/D95674
2021-02-01 14:31:08 +03:00
Matt Arsenault 8f14a08863 AMDGPU: Add missing consts 2021-01-31 10:47:57 -05:00
Kazu Hirata 627b5bda11 [llvm] Add missing header guards (NFC)
Identified with llvm-header-guard.
2021-01-30 09:53:42 -08:00
Kazu Hirata b4e780697d [AMDGPU] Forward-declare AMDGPUTargetMachine (NFC)
AMDGPUTargetTransformInfo.h needs AMDGPUTargetMachine but relies on a
forward declaration of AMDGPUTargetMachine in AMDGPU.h.  This patch
adds a forward declaration right in AMDGPUTargetTransformInfo.h.

While we are at it, this patch removes the one in
AMDGPU.h, where it is unnecessary.
2021-01-30 09:53:40 -08:00
Stanislav Mekhanoshin 9dbe736cbd [AMDGPU] Be more specific in needsFrameBaseReg
A condition "mayLoadOrStore" is too broad for that function.

Differential Revision: https://reviews.llvm.org/D95700
2021-01-29 14:40:25 -08:00
Kazu Hirata 7925aa091d [llvm] Populate SmallVector at construction time (NFC) 2021-01-28 22:21:14 -08:00
Kazu Hirata 046cfb8565 [llvm] Forward-declare formatted_raw_ostream (NFC)
Various *TargetStreamer.h need formatted_raw_ostream but rely on a
forward declaration of formatted_raw_ostream in MCStreamer.h.  This
patch adds forward declarations right in *TargetStreamer.h.

While we are at it, this patch removes the one in MCStreamer.h, where
it is unnecessary.
2021-01-28 22:21:13 -08:00
Carl Ritson 0824694d68 [AMDGPU] Fix WMM Entry SCC preservation
SCC was not correctly preserved when entering WWM.
Current lit test was unable to detect this as entry block is
handled differently.
Additionally fix an issue where SCC was unnecessarily preserved
when exiting from WWM to Exact mode.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D95500
2021-01-29 10:05:36 +09:00
Carl Ritson 0e8f50595e [AMDGPU] Mark V_SET_INACTIVE as defining SCC
V_SET_INACTIVE is implemented with S_NOT which clobbers SCC.
Mark sure it is marked appropriately.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95509
2021-01-29 09:46:41 +09:00
Simon Pilgrim 0805e40a94 AMDGPUPrintfRuntimeBinding - don't dereference a dyn_cast<> pointer. NFCI.
We dereference the dyn_cast<> in all paths - use cast<> to silence the clang static analyzer warning.
2021-01-28 12:38:44 +00:00
Mirko Brkusanin 3c979ae9ec [AMDGPU][GlobalISel] Remove redundant cmp when copying constant to vcc
Differential Revision: https://reviews.llvm.org/D95540
2021-01-28 11:20:09 +01:00
Mirko Brkusanin 4b422708ba [AMDGPU][GlobalISel] Handle G_PTR_ADD when looking for constant offset
Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant
offset for buffer stores. This also helps with merging of these instructions
later on.

Differential Revision: https://reviews.llvm.org/D95242
2021-01-28 11:20:09 +01:00
Piotr Sobczak fc8e741121 [AMDGPU] Avoid an illegal operand in si-shrink-instructions
Before the patch it was possible to trigger a constant bus
violation when folding immediates into a shrunk instruction.

The patch adds a check to enforce the legality of the new operand.

Differential Revision: https://reviews.llvm.org/D95527
2021-01-28 08:49:21 +01:00
Stanislav Mekhanoshin d91ee2f782 [AMDGPU] Do not reassign spilled registers
We cannot call LRM::unassign() if LRM::assign() was never called
before, these are symmetrical calls. There are two ways of
assigning a physical register to virtual, via LRM::assign() and
via VRM::assignVirt2Phys(). LRM::assign() will call the VRM to
assign the register and then update LiveIntervalUnion. Inline
spiller calls VRM directly and thus LiveIntervalUnion never gets
updated. A call to LRM::unassign() then asserts about inconsistent
liveness.

We have to note that not all callers of the InlineSpiller even
have LRM to pass, RegAllocPBQP does not have it, so we cannot
always pass LRM into the spiller.

The only way to get into that spiller LRE_DidCloneVirtReg() call
is from LiveRangeEdit::eliminateDeadDefs if we split an LI.

This patch refuses to reassign a LiveInterval created by a split
to workaround the problem. In fact we cannot reassign a spill
anyway as all registers of the needed class are occupied and we
are spilling.

Fixes: SWDEV-267996

Differential Revision: https://reviews.llvm.org/D95489
2021-01-27 16:29:05 -08:00
Kazu Hirata 6bde085366 [AMDGPU] Forward-declare TargetRegisterClass (NFC)
AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This patch adds a forward declaration right in
AMDGPUInstructionSelector.h.

While we are at it, this patch removes the one in
InstructionSelector.h, where it is unnecessary.
2021-01-26 20:00:16 -08:00
Austin Kerbow 2291bd137d [AMDGPU] Update subtarget features for new target ID support
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.

The possible settings are:

- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On

GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".

The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.

Differential Revision: https://reviews.llvm.org/D85882
2021-01-26 11:25:51 -08:00
Matt Arsenault 5f9707b796 AMDGPU: Fix redundant FP spilling/assert in some functions
If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.

Try to predict that we will have an FP later, although this is ugly.
2021-01-26 13:01:45 -05:00
Matt Arsenault 92d1195b5f AMDGPU: Add assertion to determineCalleeSaves
Make sure this isn't getting called multiple times. I was surprised we
were modifying the function here, which I think is a bit questionable.
2021-01-26 13:01:45 -05:00
Simon Pilgrim f82cff31d3 [AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI.
Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions
2021-01-26 16:09:39 +00:00
Simon Pilgrim ee3da8958a [AMDGPU] Fix null-dereference static analysis warnings. NFCI.
Avoid repeated calls to isZeroValue() and check for a null pointer before dereferencing a dyn_cast<>.
2021-01-26 15:43:59 +00:00
Matt Arsenault 551a69e418 AMDGPU: Clear IsSSA property in SIFormMemoryClauses
Fixes verifier error when writing MIR testcases
2021-01-26 10:40:41 -05:00
Mirko Brkusanin 608ac62540 [AMDGPU] Fix use of HasModifiers in VopProfile
HasModifiers should be true if at least one modifier is used.
This should make the use of this field bit more consistent.

Differential Revision: https://reviews.llvm.org/D94795
2021-01-26 15:21:11 +01:00
Dmitry Preobrazhensky 745064e36b [AMDGPU][MC] Refactored exp tgt handling
Summary:
- Separated tgt encoding from parsing;
- Separated tgt decoding from printing;
- Improved errors handling;
- Disabled leading zeroes in index. The following code is no longer accepted: exp pos00 v3, v2, v1, v0

Reviewers: arsenm, rampitec, foad

Differential Revision: https://reviews.llvm.org/D95216
2021-01-26 14:54:15 +03:00
Kazu Hirata c85b6bf33c [AMDGPU] Forward-declare MachineIRBuilder (NFC)
AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward
declaration of MachineIRBuilder in LegalizerInfo.h.  This patch adds a
forward declaration right in AMDGPULegalizerInfo.h.

While we are at it, this patch removes the one in LegalizerInfo.h,
where it is unnecessary.
2021-01-25 19:24:01 -08:00
Changpeng Fang 5b648df1a8 AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause
Summary:
  RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge.
We observed a long compilation time issue when RPT::reset() is called once for each cluster.

In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get
the register pressure for the later clusters in the same basic block. This could effectively reduce the number
of the expensive calls and thus reduce the compile time.

Reviewers:
  rampitec

Fixes:
  SWDEV-239161

Differential Revision:
  https://reviews.llvm.org/D95273
2021-01-25 16:08:08 -08:00
Konstantin Zhuravlyov 2cdb34efda Revert "[IndirectFunctions] Skip propagating attributes to address taken functions"
This reverts commit dd8ae42674.

This commit causes infinite loop when compiling rocThrust and hipCUB.

Differential Revision: https://reviews.llvm.org/D95389
2021-01-25 15:58:06 -05:00
Dmitry Preobrazhensky 558b3bbb5b [AMDGPU][MC] Improved errors handling for SDWA operands
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D95212
2021-01-25 19:02:53 +03:00
Carl Ritson a80ebd0179 [AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization
Frame-base materialization may insert vector instructions before EXEC is initialised.
Fix this by moving lowering of llvm.amdgcn.init.exec later in backend.
Also remove SI_INIT_EXEC_LO pseudo as this is not necessary.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D94645
2021-01-25 08:31:17 +09:00
Kazu Hirata 054444177b [Target] Use llvm::append_range (NFC) 2021-01-24 12:18:56 -08:00
Kazu Hirata e4847a7fcf Revert "[Target] Use llvm::append_range (NFC)"
This reverts commit cc7a238286.

The X86WinEHState.cpp hunk seems to break certain builds.
2021-01-23 11:25:27 -08:00
Kazu Hirata cc7a238286 [Target] Use llvm::append_range (NFC) 2021-01-23 10:56:31 -08:00
Kazu Hirata 49231c1f80 [llvm] Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2021-01-22 23:25:05 -08:00
Stanislav Mekhanoshin ca904b81e6 [AMDGPU] Fix FP materialization/resolve with flat scratch
Differential Revision: https://reviews.llvm.org/D95266
2021-01-22 16:06:47 -08:00
Stanislav Mekhanoshin 607bec0bb9 Change materializeFrameBaseRegister() to return register
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268
2021-01-22 15:51:06 -08:00
Arthur Eubanks 42d682a217 [NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0
The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0.

Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95250
2021-01-22 12:29:39 -08:00
Sebastian Neubauer 8214982b50 [AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Relands ba7dcd8542, which had memory leaks.

Differential Revision: https://reviews.llvm.org/D95215
2021-01-22 11:24:08 +01:00
Christudasan Devadasan ff8a1cae18 [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing
the initial soffset value. With certain early passes, this value is
getting modified and that brought additional fixup during
eliminateFrameIndex to work for all cases. This whole transformation
looks trivial and can be handled better.

This patch clearly defines the initial value for soffset and keeps it
unchanged before eliminateFrameIndex. The initial value must be zero
for MUBUF with a frame index. The non-frame index MUBUF forms that
use a raw offset from SP will have the stack register for soffset.
During frame elimination, the soffset remains zero for entry functions
with zero dynamic allocas and no callsites, or else is updated to the
appropriate frame/stack register.

Also, did some code clean up and made all asserts around soffset
stricter to match.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D95071
2021-01-22 14:20:59 +05:30
Arthur Eubanks a11bf9a7fb [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Sebastian Neubauer 4dbdff66fe Revert "[AMDGPU] Implement mir parseCustomPseudoSourceValue"
This reverts commit ba7dcd8542.

(caused memory leaks)
2021-01-21 18:11:48 +01:00
Jay Foad c0b3c5a064 [AMDGPU][GlobalISel] Run SIAddImgInit
This pass is required to get correct codegen for image instructions with
the tfe or lwe bits set.

Differential Revision: https://reviews.llvm.org/D95132
2021-01-21 15:54:54 +00:00
Matt Arsenault 94375d1083 AMDGPU: Remove v_rsq_f64 patterns
This isn't accurate enough without correction
2021-01-21 10:51:36 -05:00
Matt Arsenault 2a0db8d70e AMDGPU: Use more accurate fast f64 fdiv
A raw v_rcp_f64 isn't accurate enough, so start applying correction.
2021-01-21 10:51:36 -05:00
Sebastian Neubauer ba7dcd8542 [AMDGPU] Implement mir parseCustomPseudoSourceValue
Allow parsing generated mir with custom pseudo source value tokens.
Also rename pseudo source values to have more meaningful names.

Differential Revision: https://reviews.llvm.org/D94768
2021-01-21 16:32:17 +01:00
Matt Arsenault 20566a2ed8 AMDGPU: Add occupancy to serialized MachineFunctionInfo
Not sure about the default value handling, but also not sure
defaulting to a theoretically subtarget dependent value.
2021-01-21 09:21:00 -05:00
madhur13490 dd8ae42674 [IndirectFunctions] Skip propagating attributes to address taken functions
In case of indirect calls or address taken functions,
skip propagating any attributes to them. We just
propagate features to such functions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94585
2021-01-21 07:04:28 +00:00
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Mirko Brkusanin a6a72dfdf2 [AMDGPU][GlobalISel] Avoid selecting S_PACK with constants
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.

Differential Revision: https://reviews.llvm.org/D92219
2021-01-20 11:54:53 +01:00
Petar Avramovic 4ab704d628 [AMDGPU][MC] Add tfe disassembler support MIMG opcodes
With tfe on there can be a vgpr write to vdata+1.
Add tablegen support for 5 register vdata store.
This is required for 4 register vdata store with tfe.

Differential Revision: https://reviews.llvm.org/D94960
2021-01-20 10:37:09 +01:00
Jay Foad 18cb7441b6 [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975
2021-01-19 18:47:14 +00:00
Jay Foad 49dce85584 [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.
Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00
2021-01-19 10:39:56 +00:00
Dmitry Preobrazhensky 30b8f55378 Fix for sanitizer issue in 55c557a 2021-01-18 18:39:55 +03:00
Dmitry Preobrazhensky 55c557a5d2 [AMDGPU][MC] Refactored parsing of dpp ctrl
Summary of changes:
- simplified code to improve maintainability;
- replaced lex() with higher level parser functions;
- improved errors handling.

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D94777
2021-01-18 18:14:19 +03:00
Dmitry Preobrazhensky 911961c9c1 [AMDGPU][MC][GFX10] Improved dpp8 errors handling
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D94756
2021-01-18 15:02:31 +03:00
Kazu Hirata 352fcfc697 [llvm] Use llvm::sort (NFC) 2021-01-17 10:39:45 -08:00
Kazu Hirata 2082b10d10 [llvm] Use *::empty (NFC) 2021-01-16 09:40:55 -08:00
Kazu Hirata 19aacdb715 [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-16 09:40:53 -08:00
Kazu Hirata 4707b21298 [AMDGPU] Use llvm::is_contained (NFC) 2021-01-15 21:00:54 -08:00
Kazu Hirata 7dc3575ef2 [llvm] Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-01-14 20:30:34 -08:00
Matt Arsenault d55d592a92 GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper
This fixes double printing of insertion debug messages in the
legalizer.

Try to cleanup usage of observers. Currently the use of observers is
pretty hard to follow and it's not clear what is responsible for
them. Observers are referenced in 3 places:

1. In the MachineFunction
2. In the MachineIRBuilder
3. In the LegalizerHelper

The observers in the MachineFunction and MachineIRBuilder are both
called only on insertions, and are redundant with each other. The
source of the double printing was the same observer was added to both
the MachineFunction, and the MachineIRBuilder. One of these references
needs to be removed. Arguably observers in general should be fully
removed from one or the other, but it may be useful to have a local
observer in the MachineIRBuilder that is not added to the function's
observers. Alternatively, the wrapper observer could manage a local
observer in one place.

The LegalizerHelper only ever calls the observer on changing/changed
instructions, and never insertions. Logically these are two different
types of observers, for changes and for insertions.

Additionally, some places used the GISelObserverWrapper when they only
needed a single observer they could use directly.

Setting the observer in the LegalizerHelper constructor is not
flexible enough if the LegalizerHelper is constructed anywhere outside
the one used by the legalizer. AMDGPU calls the LegalizerHelper in
RegBankSelect, and needs to use a local observer to apply the regbank
to newly created instructions. Currently it accomplishes this by
constructing a local MachineIRBuilder. I'm trying to move the
MachineIRBuilder to be owned/maintained by the RegBankSelect pass
itself, but the locally constructed LegalizerHelper would reset the
observer.

Mips also has a special case use of the LegalizationArtifactCombiner
in applyMappingImpl; I think we do need to run the artifact combiner
during RegBankSelect, but in a more consistent way outside of
applyMappingImpl.
2021-01-13 10:44:31 -05:00
Carl Ritson 790c75c163 [AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader
Add pseudo instruction to allow early termination of pixel shader
anywhere based on the value of SCC.  The intention is to use this
when a mask of live lanes is updated, e.g. live lanes in WQM pass.
This facilitates early termination of shaders even when EXEC is
incomplete, e.g. in non-uniform control flow.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D88777
2021-01-13 13:29:05 +09:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
Matt Arsenault 3d39709159 AMDGPU: Remove wrapper only call limitation
This seems to only have overridden cold handling, which we probably
shouldn't do. As far as I can tell the wrapper library functions are
still inlined as appropriate.
2021-01-12 17:12:49 -05:00
Sebastian Neubauer 6a195491b6 [AMDGPU] Fix failing assert with scratch ST mode
In ST mode, flat scratch instructions have neither an sgpr nor a vgpr
for the address. This lead to an assertion when inserting hard clauses.

Differential Revision: https://reviews.llvm.org/D94406
2021-01-12 09:54:02 +01:00
Kazu Hirata 8590a3e3ad [llvm] Use *Set::contains (NFC) 2021-01-11 18:48:07 -08:00
Joe Nash bcec0f27a2 [AMDGPU] Deduplicate VOP tablegen asm & ins
VOP3 and VOP DPP subroutines to generate input
operands and asm strings were essentially copy
pasted several times. They are deduplicated to
reduce the maintenance burden and allow faster
development.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D94102

Change-Id: I76225eed3c33239d9573351e0c8a0abfad0146ea
2021-01-11 13:49:26 -05:00
QingShan Zhang 7539c75bb4 [DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D93891
2021-01-11 02:25:53 +00:00
Kazu Hirata b7c5e0b02c [Target, Transforms] Use *Set::contains (NFC) 2021-01-08 18:39:54 -08:00
Tony 2f499b9aff [AMDGPU] Add volatile support to SIMemoryLegalizer
Treat a non-atomic volatile load and store as a relaxed atomic at
system scope for the address spaces accessed. This will ensure all
relevant caches will be bypassed.

A volatile atomic is not changed and still only bypasses caches upto
the level specified by the SyncScope operand.

Differential Revision: https://reviews.llvm.org/D94214
2021-01-09 00:52:33 +00:00
Christudasan Devadasan ae25a397e9 AMDGPU/GlobalISel: Enable sret demotion 2021-01-08 10:56:35 +05:30
Kazu Hirata b934160aaa [Target] Use llvm::find_if (NFC) 2021-01-07 20:29:36 -08:00
Mehdi Amini 467e916d30 Fix gcc5 build failure (NFC)
The loop index was shadowing the container name.
It seems that we can just not use a for-range loop here since there is
an induction variable anyway.

Differential Revision: https://reviews.llvm.org/D94254
2021-01-07 20:11:57 +00:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Matt Arsenault 6b7d5a928f AMDGPU/GlobalISel: Start cleaning up calling convention lowering
There are various hacks working around limitations in
handleAssignments, and the logical split between different parts isn't
correct. Start separating the type legalization to satisfy going
through the DAG infrastructure from the code required to split into
register types. The type splitting should be moved to generic code.
2021-01-07 10:36:45 -05:00
Matt Arsenault ab3a3f543b AMDGPU/GlobalISel: Update fdiv lowering for denormal/ulp interaction
Change the GlobalISel fast fdiv handling to match the changes in
2531535984 and
884acbb9e1
2021-01-06 12:32:01 -05:00
Christudasan Devadasan d68458bd56 [GlobalISel] Base implementation for sret demotion.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.

Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92953
2021-01-06 10:30:50 +05:30
Changpeng Fang cb5b52a06e AMDGPU: Annotate amdgpu.noclobber for global loads only
Summary:
  This is to avoid unnecessary analysis since amdgpu.noclobber is only used for globals.

Reviewers:
  arsenm

Fixes:
   SWDEV-239161

Differential Revision:
  https://reviews.llvm.org/D94107
2021-01-05 14:47:19 -08:00
Arthur Eubanks 28a326eba0 [NFC] Rename registerAliasAnalyses -> registerDefaultAliasAnalyses
To clarify that this only affects the "default" AA.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D93980
2021-01-05 11:07:58 -08:00
Joe Nash 60466fad2d [AMDGPU] Remove deprecated V_MUL_LO_I32 from GFX10
It was removed in GFX10 GPUs, but LLVM could
generate it.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D94020

Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c
2021-01-05 11:59:57 -05:00
Jay Foad 3914bebe91 [AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands
Convert it to v_fma_legacy_f32 if it is profitable to do so, just like
other mac instructions that are converted to their mad equivalents.

Differential Revision: https://reviews.llvm.org/D94010
2021-01-05 11:55:33 +00:00
Jay Foad 4e6054a86c [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.
Differential Revision: https://reviews.llvm.org/D94009
2021-01-05 11:54:48 +00:00
Arthur Eubanks 8e293fe6ad [NewPM][AMDGPU] Pass TargetMachine to AMDGPUSimplifyLibCallsPass
Missed in https://reviews.llvm.org/D93863.
2021-01-04 13:48:09 -08:00
Arthur Eubanks 191552344b [NewPM][AMDGPU] Make amdgpu-aa work with NewPM
An AMDGPUAA class already existed that was supposed to work with the new
PM, but it wasn't tested and was a bit broken.

Fix up the existing classes to have the right keys/parameters.
Wire up AMDGPUAA inside AMDGPUTargetMachine.

Add it to the list of alias analyses for the "default" AAManager since
in adjustPassManager() amdgpu-aa is added into the pipeline at the
beginning.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93914
2021-01-04 12:36:27 -08:00
Arthur Eubanks 4e838ba9ea [NewPM][AMDGPU] Port amdgpu-always-inline
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94025
2021-01-04 12:27:01 -08:00
Arthur Eubanks fd323a897c [NewPM][AMDGPU] Port amdgpu-printf-runtime-binding
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94026
2021-01-04 12:25:50 -08:00
Arthur Eubanks e1833e7493 [NewPM][AMDGPU] Port amdgpu-unify-metadata
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94023
2021-01-04 11:57:46 -08:00
Arthur Eubanks a5f863e076 [NewPM][AMDGPU] Port amdgpu-propagate-attributes-early/late
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94022
2021-01-04 11:53:37 -08:00
Arthur Eubanks b8f22f9d30 [NewPM][AMDGPU] Run InternalizePass when -amdgpu-internalize-symbols
The legacy PM doesn't run EP_ModuleOptimizerEarly on -O0, so skip
running it here when given O0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93886
2021-01-04 11:34:40 -08:00
Kazu Hirata 0e219b6443 [Target] Construct SmallVector with iterator ranges (NFC) 2021-01-03 09:57:45 -08:00
Kazu Hirata 985f899bf2 [Target] Use llvm::append_range (NFC) 2021-01-03 09:57:43 -08:00
Roman Lebedev 7c8b8063b6
[SimplifyCFG][AMDGPU] AMDGPUUnifyDivergentExitNodes: SimplifyCFG isn't ready to preserve PostDomTree
There is a number of transforms in SimplifyCFG that take DomTree out of
DomTreeUpdater, and do updates manually. Until they are fixed,
user passes are unable to claim that PDT is preserved.

Note that the default for SimplifyCFG is still not to preserve DomTree,
so this is still effectively NFC.
2021-01-03 01:45:46 +03:00
Roman Lebedev 4b80647367
[AMDGPU][SimplifyCFG] Teach AMDGPUUnifyDivergentExitNodes to preserve {,Post}DomTree
This is a (last big?) part of the patch series to make SimplifyCFG
preserve DomTree. Currently, it still does not actually preserve it,
even thought it is pretty much fully updated to preserve it.

Once the default is flipped, a valid DomTree must be passed into
simplifyCFG, which means that whatever pass calls simplifyCFG,
should also be smart about DomTree's.

As far as i can see from `check-llvm` with default flipped,
this is the last LLVM test batch (other than bugpoint tests)
that needed fixes to not break with default flipped.

The changes here are boringly identical to the ones i did
over 42+ times/commits recently already,
so while AMDGPU is outside of my normal ecosystem,
i'm going to go for post-commit review here,
like in all the other 42+ changes.

Note that while the pass is taught to preserve {,Post}DomTree,
it still doesn't do that by default, because simplifycfg
still doesn't do that by default, and flipping default
in this pass will implicitly flip the default for simplifycfg.
That will happen, but not right now.
2021-01-02 01:01:20 +03:00
Juneyoung Lee 420d046d6b clang-format, address warnings 2020-12-30 23:05:07 +09:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Arthur Eubanks 7ecbe0c7a0 [NewPM][AMDGPU] Port amdgpu-lower-kernel-attributes
And add it to the AMDGPU opt pipeline.

This is a function pass instead of a module pass (like the legacy pass)
because it's getting added to a CGSCCPassManager, and you can't put a
module pass in a CGSCCPassManager.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93885
2020-12-29 10:26:06 -08:00
Arthur Eubanks c2ef06d3dd [NewPM] Port infer-address-spaces
And add it to the AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93880
2020-12-28 19:58:12 -08:00
Arthur Eubanks 0e9abcfc19 [AMDGPU][NewPM] Port amdgpu-promote-alloca(-to-vector)
And add to AMDGPU opt pipeline.

Don't pin an opt run to the legacy PM when -enable-new-pm=1 if these
passes (or passes introduced in https://reviews.llvm.org/D93863) are in
the list of passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93875
2020-12-28 17:52:31 -08:00
Arthur Eubanks 9abc457724 [NewPM][AMDGPU] Port amdgpu-simplifylib/amdgpu-usenative
And add them to the pipeline via
AMDGPUTargetMachine::registerPassBuilderCallbacks(), which mirrors
AMDGPUTargetMachine::adjustPassManager().

These passes can't be unconditionally added to PassRegistry.def since
they are only present when the AMDGPU backend is enabled. And there are
no target-specific headers in llvm/include, so parsing these pass names
must occur somewhere in the AMDGPU directory. I decided the best place
was inside the TargetMachine, since the PassBuilder invokes
TargetMachine::registerPassBuilderCallbacks() anyway. If we come up with
a cleaner solution for target-specific passes in the future that's fine,
but there aren't too many target-specific IR passes living in
target-specific directories so it shouldn't be too bad to change in the
future.

Reviewed By: ychen, arsenm

Differential Revision: https://reviews.llvm.org/D93863
2020-12-28 10:38:51 -08:00
alex-t 644da789e3 [AMDGPU] Split edge to make si_if dominate end_cf
Basic block containing "if" not necessarily dominates block that is the "false" target for the if.

That "false" target block may have another predecessor besides the "if" block. IR value corresponding to the Exec mask is generated by the

si_if intrinsic and then used by the end_cf intrinsic. In this case IR verifier complains that 'Def does not dominate all uses'.

This change split the edge between the "if" block and "false" target block to make it dominated by the "if" block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D91435
2020-12-28 17:14:02 +03:00
Dmitry Preobrazhensky 8c25bb3d0d [AMDGPU][MC] Improved errors handling for v_interp* operands
See bug 48596 (https://bugs.llvm.org/show_bug.cgi?id=48596)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D93757
2020-12-28 16:15:48 +03:00
Dmitry Preobrazhensky 5b17263b6b [AMDGPU][MC][NFC] Parser refactoring
See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D93756
2020-12-28 14:59:49 +03:00
Kazu Hirata d6ff5cf995 [Target] Use llvm::any_of (NFC) 2020-12-24 19:43:26 -08:00
Praveen Velliengiri 61177943c9 [AMDGPU] Use MUBUF instructions for global address space access
Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This patch
fix the crash by selecting MUBUF instructions for global load/stores
in gfx600.

Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com>

Reviewed by: t-tye

Differential revision: https://reviews.llvm.org/D92483
2020-12-24 10:13:04 +00:00
Stanislav Mekhanoshin 747f67e034 [AMDGPU] Fix adjustWritemask subreg handling
If we happen to extract a non-dword subreg that breaks the
logic of the function and it may shrink the dmask because
it does not recognize the use of a lane(s).

This bug is next to impossible to trigger with the current
lowering in the BE, but it breaks in one of my future patches.

Differential Revision: https://reviews.llvm.org/D93782
2020-12-23 14:43:31 -08:00
Sebastian Neubauer 221fdedc69 [AMDGPU][GlobalISel] Fold flat vgpr + constant addresses
Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more
vgpr+constant addresses.

Differential Revision: https://reviews.llvm.org/D93692
2020-12-23 10:40:30 +01:00
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Matt Arsenault 8bf9cdeaee AMDGPU: Use Register 2020-12-22 21:55:59 -05:00
Matt Arsenault bac54639c7 AMDGPU: Add spilled CSR SGPRs to entry block live ins 2020-12-22 21:55:59 -05:00
Matt Arsenault 29ed846d67 AMDGPU: Fix assert when checking for implicit operand legality 2020-12-22 20:56:24 -05:00
Stanislav Mekhanoshin d15119a02d [AMDGPU][GlobalISel] GlobalISel for flat scratch
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.

Differential Revision: https://reviews.llvm.org/D93670
2020-12-22 16:33:06 -08:00
Stanislav Mekhanoshin ca4bf58e4e [AMDGPU] Support unaligned flat scratch in TLI
Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for
unaligned flat scratch support. Mostly needed for global isel.

Differential Revision: https://reviews.llvm.org/D93669
2020-12-22 16:12:31 -08:00
Stanislav Mekhanoshin ae8f4b2178 [AMDGPU] Folding of FI operand with flat scratch
Differential Revision: https://reviews.llvm.org/D93501
2020-12-22 10:48:04 -08:00
Dmitry Preobrazhensky f4f49d9d0d [AMDGPU][MC][NFC] Fix for sanitizer error in 8ab5770
Corrected to fix sanitizer error introduced by 8ab5770
2020-12-21 20:42:35 +03:00
Dmitry Preobrazhensky 8ab5770a17 [AMDGPU][MC][NFC] Parser refactoring
See bug 48515 (https://bugs.llvm.org/show_bug.cgi?id=48515)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D93548
2020-12-21 20:21:07 +03:00
Carl Ritson 7722494834 [AMDGPU][NFC] Remove unused Hi16Elt definition 2020-12-18 20:38:54 +09:00
dfukalov 9ed8e0caab [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Matt Arsenault f333736757 AMDGPU: Remove SGPRSpillVGPRDefinedSet hack
These VGPRs should be reserved and therefore do not need "correct"
liveness. They should not have undef uses, which can still cause
issues.
2020-12-16 21:33:35 -05:00
Roman Lebedev 49dac4aca0
[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Piotr Sobczak c7afb698ca [AMDGPU] Avoid calling copyFastMathFlags in wrong context
Calling Instruction::copyFastMathFlags() assumes the caller is
FPMathOperator. Avoid calling the function for instructions
that are not instances of FPMathOperator.
2020-12-16 10:22:51 +01:00
Sebastian Neubauer 409a2f0f9e [AMDGPU] Allow no saddr for global addtid insts
I think the global_load/store_dword_addtid instructions support
switching off the scalar address.
Add assembler and disassembler support for this.

Differential Revision: https://reviews.llvm.org/D93288
2020-12-16 10:01:40 +01:00
Stanislav Mekhanoshin eb66bf0802 [AMDGPU] Print SCRATCH_EN field after the kernel
Differential Revision: https://reviews.llvm.org/D93353
2020-12-15 22:44:30 -08:00
Matt Arsenault 97f51f0489 AMDGPU: Remove redundant CCAction for i1 2020-12-15 17:00:27 -05:00
Tony d5ea8f7010 [AMDGPU] Clarify scratch initialization
- Clarify documentation on initializing scratch.
- Rename compute_pgm_rsrc2 field for enabling scratch from
  ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET to
  ENABLE_PRIVATE_SEGMENT to match hardware definition.

Differential Revision: https://reviews.llvm.org/D93271
2020-12-15 20:14:20 +00:00
Sebastian Neubauer 91445979be [AMDGPU] Unify flat offset logic
Move getNumFlatOffsetBits from AMDGPUAsmParser and SIInstrInfo into
AMDGPUBaseInfo.

Differential Revision: https://reviews.llvm.org/D93287
2020-12-15 14:59:59 +01:00
Changpeng Fang ce0c0013d8 AMDGPU: If a store defines (alias) a load, it clobbers the load.
Summary:
 If a store defines (must alias) a load, it clobbers the load.

Fixes: SWDEV-258915

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D92951
2020-12-14 16:34:32 -08:00
Stanislav Mekhanoshin cf5845d6c4 [AMDGPU] Use multi-dword flat scratch for spilling
Differential Revision: https://reviews.llvm.org/D93067
2020-12-14 14:19:29 -08:00
Michael Liao 1fd1f638b6 [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.
- Once an instruction is simplified, foldable candidates from it should
  be invalidated or skipped as the operand index is no longer valid.

Differential Revision: https://reviews.llvm.org/D93174
2020-12-14 13:08:13 -05:00
Stanislav Mekhanoshin 87d7757bbe [SLP] Control maximum vectorization factor from TTI
D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059
2020-12-14 08:49:40 -08:00
Jay Foad 07e92e6b60 [AMDGPU] Make use of HasSMemRealTime predicate. NFC.
We have this subtarget feature so it makes sense to use it here. This is
NFC because it's always defined by default on GFX8+.

Differential Revision: https://reviews.llvm.org/D93202
2020-12-14 16:34:57 +00:00
Carl Ritson 62c246eda2 [AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0
These parameters set a default value of 0, so I believe they
should include a 0 suffix. This allows for versions which do not
set a default value in future.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93187
2020-12-14 20:01:56 +09:00
Carl Ritson af4570cd3a [AMDGPU][NFC] Remove unused VOP3Mods0Clamp
This is unused and the selection function does not exist.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93188
2020-12-14 20:00:58 +09:00
Sebastian Neubauer 5733167f54 [AMDGPU] Mark amdgpu_gfx functions as module entry function
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata

Differential Revision: https://reviews.llvm.org/D92946
2020-12-14 10:43:39 +01:00
Jay Foad 4f25e53982 [AMDGPU] Make use of emitRemovedIntrinsicError. NFC.
Change-Id: I482bbf528255f2eacd3878ddfe7edb9a8f63d5c2
2020-12-11 14:02:14 +00:00
Mirko Brkusanin 0c7cce54eb [AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2
Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned
loads but the one that will be selected is determined either by the order in
tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has
priority.

While ds_read2_b64 has lower alignment requirements, we cannot always
restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode
option. This was causing ds_read_b128 to be selected for 8-byte aligned
loads regardles of chosen access mode.

To resolve this we use two patterns for selecting ds_read_b128. One
requires alignment of 16-byte and the other requires
unaligned-access-mode option.

Same goes for ds_write2_b64 and ds_write_b128.

Differential Revision: https://reviews.llvm.org/D92767
2020-12-10 12:40:49 +01:00
Stanislav Mekhanoshin 4617cc68f6 [AMDGPU] Fix expansion of 192 bit spills in PEI
Differential Revision: https://reviews.llvm.org/D92979
2020-12-09 16:36:29 -08:00
Scott Linder 9260a99999 [MC][AMDGPU] Consume EndOfStatement in asm parser
Avoids spurious newlines showing up in the output when emitting assembly
via MC.

Reviewed By: MaskRay, arsenm

Differential Revision: https://reviews.llvm.org/D92690
2020-12-09 21:45:55 +00:00
Scott Linder f5f4b8b60f [AMDGPU][MC] Restore old error position for "too few operands"
Revert part of https://reviews.llvm.org/D92084 to make it simpler to
start consuming the EndOfStatement token within AMDGPU's
ParseInstruction in a future patch. This also brings us back to what
every other target currently does.

A future change to move the position back to the end of the statement
would likely need to audit all of the AMDGPUOperand SMLoc ranges, and
determine the SMLoc for the last character of the last operand.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D92960
2020-12-09 21:09:47 +00:00
Austin Kerbow 4aa842a800 [AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing
It is possible for copies or spills to be inserted in the middle of indirect
addressing sequences which use VGPR indexing. Spills to accvgprs could be
effected by the indexing mode.

Add new pseudo instructions that are expanded after register allocation to avoid
the problematic spill or copy placement.

Differential Revision: https://reviews.llvm.org/D91048
2020-12-08 12:24:12 -08:00
Stanislav Mekhanoshin dd89249498 [AMDGPU] Annotate vgpr<->agpr spills in asm
Differential Revision: https://reviews.llvm.org/D92125
2020-12-07 11:25:25 -08:00
Petar Avramovic 3a042dcd2e [AMDGPU] Fix default value of glc for mubuf rtn atomics
Mubuf rtn atomics use GLC_1 thus default value for glc operand
should be -1, see https://reviews.llvm.org/D90730.
This allows us to report error when rtn atomic requires glc=1
but does not have glc operand in input.

Differential Revision: https://reviews.llvm.org/D92654
2020-12-07 14:00:08 +01:00
Dmitry Preobrazhensky a0b3a9391c [AMDGPU][MC] Improved diagnostics message for sym/expr operands
See bug 48295 (https://bugs.llvm.org/show_bug.cgi?id=48295)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D92088
2020-12-05 14:05:53 +03:00
Dmitry Preobrazhensky e97dd11977 [AMDGPU][MC] Corrected error position for invalid MOVREL src
See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D92084
2020-12-05 13:23:14 +03:00
Kazu Hirata 2dc4a14e4d [AMDGPU] Use llvm::is_contained (NFC) 2020-12-04 21:42:55 -08:00
Paul C. Anagnostopoulos 415fab6f67 [TableGen] Eliminate the 'code' type
Update the documentation.

Rework various backends that relied on the code type.

Differential Revision: https://reviews.llvm.org/D92269
2020-12-03 10:19:11 -05:00
Mircea Trofin bab72dd5d5 [NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister.
Typing the API appropriately.

Differential Revision: https://reviews.llvm.org/D92341
2020-12-02 15:46:38 -08:00
Jay Foad d28624a209 [AMDGPU] Stop adding an implicit def of vcc_hi for wave32
This doesn't seem to be needed for anything.

Differential Revision: https://reviews.llvm.org/D92400
2020-12-02 10:11:42 +00:00
Caroline Concatto 4b0ef2b075 [NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type
This patch replaces the attribute  `unsigned VF`  in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.

Differential Revision: https://reviews.llvm.org/D91532
2020-12-01 11:12:51 +00:00
Jay Foad 839c9635ed [AMDGPU] Simplify some generation checks. NFC. 2020-12-01 10:15:32 +00:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Jay Foad 4f87d30a06 [AMDGPU] Introduce and use isGFX10Plus. NFC.
It's more future-proof to use isGFX10Plus from the start, on the
assumption that future architectures will be based on current
architectures.

Also make use of the existing isGFX9Plus in a few places.

Differential Revision: https://reviews.llvm.org/D92092
2020-11-26 09:02:36 +00:00
Sebastian Neubauer edd675643d [AMDGPU] Emit stack frame size in metadata
Add .shader_functions to pal metadata, which contains the stack frame
size for all non-entry-point functions.

Differential Revision: https://reviews.llvm.org/D90036
2020-11-25 16:30:02 +01:00
Jay Foad 4926eed59c [AMDGPU] Add a TRANS bit to TSFlags. NFC.
This is used to mark transcendental instructions that execute on a
separate pipeline from the normal VALU pipeline.

Differential Revision: https://reviews.llvm.org/D92042
2020-11-24 17:49:56 +00:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Dmitry Preobrazhensky ce44bf2cf2 [AMDGPU][MC] Improved diagnostic messages
See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91794
2020-11-23 16:15:05 +03:00
Dmitry Preobrazhensky e4effef330 [AMDGPU][MC] Improved diagnostic messages for invalid literals
See bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91793
2020-11-23 15:48:06 +03:00
Matt Arsenault 79f75468b4 AMDGPU: Fix counting kernel arguments towards register usage
Also use DataLayout to get type size. Relying on the IR type size is
also pretty broken here, since this won't perfectly capture how types
are legalized.
2020-11-20 21:23:33 -05:00
Alex Richardson 51e09e1d5a [AMDGPU] Set the default globals address space to 1
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D84345
2020-11-20 15:46:53 +00:00
Sebastian Neubauer 7a18bdb350 [AMDGPU] Implement flat scratch init for pal
Extract the scratch offset from the scratch buffer descriptor that is
stored in the global table.

Differential Revision: https://reviews.llvm.org/D91701
2020-11-20 11:14:30 +01:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Duncan P. N. Exon Smith 5abf76fbe3 ADT: Add assertions to SmallVector::insert, etc., for reference invalidation
2c196bbc6b asserted that
`SmallVector::push_back` doesn't invalidate the parameter when it needs
to grow. Do the same for `resize`, `append`, `assign`, `insert`, and
`emplace_back`.

Differential Revision: https://reviews.llvm.org/D91744
2020-11-18 17:36:28 -08:00
Sebastian Neubauer 72ccec1bbc [AMDGPU] Fix v3f16 interaction with image store workaround
In some cases, the wrong amount of registers was reserved.

Also enable more v3f16 tests.

Differential Revision: https://reviews.llvm.org/D90847
2020-11-18 18:21:04 +01:00
Jay Foad 7ecf19697e [AMDGPU] Fix and extend vccz workarounds
We have workarounds for two different cases where vccz can get out of
sync with the value in vcc. This fixes them in two ways:

1. Fix the case where the def of vcc was in a previous basic block, by
pessimistically assuming that vccz might be incorrect at a basic block
boundary.

2. Fix the handling of pre-existing waitcnt instructions by calling
generateWaitcntInstBefore before examining ScoreBrackets to determine
whether there's an outstanding smem read operation.

Differential Revision: https://reviews.llvm.org/D91636
2020-11-18 15:26:06 +00:00
Jay Foad 9f69c1bc54 [AMDGPU] Rename pseudo S_WAITCNT_IDLE to S_WAIT_IDLE. NFC. 2020-11-18 14:03:43 +00:00
Florian Hahn b2f4c5fddc
[AsmWriter] Factor out mnemonic generation to accessible getMnemonic.
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).

Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D90039
2020-11-17 09:47:38 +00:00
Michael Liao f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Matt Arsenault d2e52eec51 AMDGPU: Select global saddr mode from SGPR pointer
Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer
instruction to materialize the 0 vs. the 64-bit copy.
2020-11-16 11:51:06 -05:00
Matt Arsenault a6e353b1d0 AMDGPU: Split large offsets when selecting global saddr mode
When the offset doesn't fit in the immediate field, move some to
voffset.
2020-11-16 11:36:01 -05:00
Jay Foad a6ecb2eb3d [AMDGPU] Add comments. NFC. 2020-11-16 16:34:13 +00:00
Dmitry Preobrazhensky 65f3e121fe [AMDGPU][MC] Corrected error position for some operands and modifiers
Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91412
2020-11-16 16:11:23 +03:00
Dmitry Preobrazhensky 0bee8c784b [AMDGPU][MC] Corrected error position for swizzle()
Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91408
2020-11-16 14:37:57 +03:00
Dmitry Preobrazhensky 89df8fc0d7 [AMDGPU][MC] Corrected error position for hwreg() and sendmsg()
Partially fixes bug 47518 (https://bugs.llvm.org/show_bug.cgi?id=47518)

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D91407
2020-11-16 14:25:07 +03:00
Stanislav Mekhanoshin c9821cec74 [AMDGPU] Mark sin/cos load folding as modifying the function.
When the load value is folded into the sin/cos operation, the
AMDGPU library calls simplifier could still mark the function
as unmodified. Instead ensure if there is an early return,
return whether the load was folded into the sin/cos call.

Authored by MJDSys

Differential Revision: https://reviews.llvm.org/D91401
2020-11-13 14:49:33 -08:00
Jessica Paquette b184a2eccf [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Matt Arsenault e722943e05 AMDGPU: Factor out large flat offset splitting 2020-11-13 11:22:13 -05:00
Matt Arsenault 0fd6a04ba4 AMDGPU: Refactor getBaseWithOffsetUsingSplitOR usage 2020-11-13 10:58:17 -05:00
Simon Pilgrim a4d3691d55 Fix MSVC signed/unsigned comparison warning. NFCI. 2020-11-13 10:20:48 +00:00
Jay Foad ad3ec08955 [AMDGPU] One more use of the new export target names. NFC. 2020-11-13 09:44:09 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Stanislav Mekhanoshin 5ab1702129 [AMDGPU] Remove scratch rsrc from spill pseudos
Differential Revision: https://reviews.llvm.org/D91110
2020-11-12 15:23:37 -08:00
Stanislav Mekhanoshin cf6565f6d0 [AMDGPU] Enable multi-dword flat scratch load/stores
Differential Revision: https://reviews.llvm.org/D91384
2020-11-12 13:38:56 -08:00
Jay Foad 6881a82e8c [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00
Jay Foad d7d6ac5624 [AMDGPU] Define and use names for export targets. NFC.
Differential Revision: https://reviews.llvm.org/D91289
2020-11-12 19:57:14 +00:00
Jay Foad f23c4c6f8a [AMDGPU] Separate out real exp instructions by subtarget. NFC.
Differential Revision: https://reviews.llvm.org/D91247
2020-11-11 17:13:40 +00:00
Jay Foad 2b33ea6935 [AMDGPU] Split exp instructions out into their own tablegen file. NFC.
Differential Revision: https://reviews.llvm.org/D91246
2020-11-11 17:13:40 +00:00
Jay Foad f94fd1c8ca [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Jay Foad 830ed64ccd Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103.

The underlying problems were fixed by D90607.
2020-11-11 14:40:14 +00:00
Stanislav Mekhanoshin 544ef42e40 [AMDGPU] Set default op_sel_hi on accvgpr read/write
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.

Differential Revision: https://reviews.llvm.org/D91202
2020-11-10 13:07:29 -08:00
Jay Foad bb8d1437a6 [AMDGPU] Simplify multiclass EXP_m. NFC. 2020-11-10 17:28:36 +00:00
Jay Foad 0ad4d04002 [AMDGPU] Remove an unused return value. NFC.
Differential Revision: https://reviews.llvm.org/D91063
2020-11-10 09:15:14 +00:00
Carl Ritson fde8351743 [AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451
2020-11-10 12:16:31 +09:00
Mircea Trofin 2ac3a7d0c4 [NFC] Use [MC]Register
Differential Revision: https://reviews.llvm.org/D90795
2020-11-09 08:37:14 -08:00
Stanislav Mekhanoshin d5a465866e [AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
2020-11-09 08:05:20 -08:00
Paul C. Anagnostopoulos 91d2e5c81a [TableGen] Add the !filter bang operator.
Add a test. Update the Programmer's Reference.

Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D91008
2020-11-09 10:56:55 -05:00
Sebastian Neubauer a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Jay Foad 55ea017759 [AMDGPU] Remove unused DisableDecoder machinery. NFC.
This has been unused since D24738.
2020-11-09 13:53:27 +00:00
Carl Ritson 8e8a54c7e9 [AMDGPU] SIWholeQuadMode fix mode insertion when SCC always defined
Fix a crash when SCC is defined until end of block and mode change
must be inserted in SCC live region.

Reviewed By: mceier

Differential Revision: https://reviews.llvm.org/D90997
2020-11-08 11:14:57 +09:00
Jay Foad d61f2cfb9f [AMDGPU] Simplify exp target parsing
Treat any identifier as a potential exp target and diagnose them all the
same way as "invalid exp target"s.

Differential Revision: https://reviews.llvm.org/D90947
2020-11-06 16:09:34 +00:00
Konstantin Pyzhov 41e74e400d [AMDGPU] Corrected declaration of VOPC instructions with SDWA addressing mode.
Removed "implicit def VCC" from declarations of AMDGPU VOPC instructions since they do not implicitly write to VCC in SDWA mode.

Differential Revision: https://reviews.llvm.org/D89168
2020-11-05 11:15:50 -05:00
Michael Liao 23c6d1501d [amdgpu] Add `llvm.amdgcn.endpgm` support.
- `llvm.amdgcn.endpgm` is added to enable "abort" support.

Differential Revision: https://reviews.llvm.org/D90809
2020-11-05 19:06:50 -05:00
Stanislav Mekhanoshin f738aee0bb [AMDGPU] Add default 1 glc operand to rtn atomics
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.

Improves asm parser diagnostics.

Differential Revision: https://reviews.llvm.org/D90730
2020-11-05 10:41:59 -08:00
Sander de Smalen d57bba7cf8 [SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.

The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.

This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90018
2020-11-05 11:02:18 +00:00
Mircea Trofin 5dc47541f9 [NFC] Use Register/MCRegister
Differential Revision: https://reviews.llvm.org/D90724
2020-11-04 12:20:17 -08:00
Joe Nash 58adab34c4 [AMDGPU] Resolve pseudo registers at encoding uses
Pseudo-registers allow different register encodings
between gpu generations. Make sure we resolve the
pseudo regs to real regs whenever we get their
hardware encoding.
Using the correct encodings revealed a register
bank conflict and an unnecessary write dependency.
Tests have been updated to match.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90721

Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca
2020-11-04 12:52:32 -05:00
Sebastian Neubauer 31a0b2834f [AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.

This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.

Differential Revision: https://reviews.llvm.org/D90596
2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos d56cd4291e [TableGen] Add !interleave operator to concatenate a list of values with delimiters
Add a test. Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D90469
2020-11-04 09:23:54 -05:00
Sebastian Neubauer 1124bf4ab7 [AMDGPU] Set rsrc1 flags for graphics shaders
Before they were only set for compute kernels and compute shaders but
not for other shaders.

Differential Revision: https://reviews.llvm.org/D89399
2020-11-04 12:25:41 +01:00
Sebastian Neubauer 76313288cd [AMDGPU] Fix ieee mode default value
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.

This commit changes the default to be
- on for compute kernels,
- off for shaders.

This aligns the default value with the settings that are actually in
use.  To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.

Differential Revision: https://reviews.llvm.org/D89388
2020-11-04 12:25:38 +01:00
Tim Renouf 89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Jay Foad 040c50278c [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Petar Avramovic 0031418dce AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner
Change match/apply functions into methods of new target specific combiner
helper class. Use reference to MachineIRBuilder from helper instead of
constructing new MachineIRBuilder each time new instruction needs to made.
Allows correct tracking of newly created instructions.

Differential Revision: https://reviews.llvm.org/D90623
2020-11-03 09:24:50 +01:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Matt Arsenault 86b8f6919b AMDGPU: Reorder checks 2020-11-02 10:21:48 -05:00
Jay Foad 0892d2a311 Revert "Fix ds_read2/write2 unaligned offsets"
This reverts commit 2e7e898c8f.

It was committed by mistake.
2020-11-02 14:01:33 +00:00
Jay Foad 2e7e898c8f Fix ds_read2/write2 unaligned offsets 2020-11-02 13:57:13 +00:00
Christudasan Devadasan d6aa4aa29a [AMDGPU] Some refactoring after D90404. NFC. 2020-11-01 13:18:53 +05:30
Christudasan Devadasan 9bb2b4f0aa [AMDGPU] Add alignment check for v3 to v4 load type promotion
It should be enabled only when the load alignment is at least 8-byte.

Fixes: SWDEV-256824

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90404
2020-11-01 12:05:34 +05:30
Matt Arsenault 790f5771fd AMDGPU: Fix missing writelane cases to skip with exec=0 2020-10-30 11:15:11 -04:00
alex-t a4f7e4264c [AMDGPU] SILowerControlFlow::removeMBBifRedundant. Refactoring plus fix for the null MBB pointer in MF->splice
Detailed description: This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90314
2020-10-30 14:46:08 +03:00
Jay Foad 9cee87d72a [AMDGPU] Fix double space in disassembly of ds_gws_sema_* with gds
By setting up the AsmStrings correctly we can remove some special cases
from AMDGPUInstPrinter::printOffset.

Differential Revision: https://reviews.llvm.org/D90307
2020-10-29 17:31:59 +00:00
Jay Foad 58de4b2053 [AMDGPU] Use pseudo instructions for readlane/writelane
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".

All the codegen changes are caused by the post-RA scheduler no longer
treating readlane/writelane as scheduling barriers due to having
unmodelled side effects. (The pseudos are hasSideEffects = 0, but the
real instructions are hasSideEffects = ? which TableGen conservatively
treats as 1.)

Differential Revision: https://reviews.llvm.org/D90401
2020-10-29 16:00:53 +00:00
Jay Foad 7a79921edd [AMDGPU] Remove gds operand from ds_gws_* MachineInstrs
The operand value was always 1 (except in some bad MIR tests) so it was
redundant.

Differential Revision: https://reviews.llvm.org/D90378
2020-10-29 15:04:23 +00:00
Jay Foad a442fad911 [AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode
Differential Revision: https://reviews.llvm.org/D90374
2020-10-29 14:54:33 +00:00
Jay Foad e9dd2c4fe2 [AMDGPU] Fix double space in disassembly of some DPP instructions
Differential Revision: https://reviews.llvm.org/D90373
2020-10-29 14:54:33 +00:00
Jay Foad 69f5105f5c [AMDGPU] Simplify insertNoops functions. NFC. 2020-10-29 10:55:20 +00:00
Austin Kerbow de51867343 [AMDGPU] Add Reset function to GCNHazardRecognizer
Reset the tracked emitted instructions when starting scheduling on a new
region.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90347
2020-10-28 16:32:32 -07:00
Jay Foad 5b91a6a88b [AMDGPU] Allow some modifiers on VOP3B instructions
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.

This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".

Differential Revision: https://reviews.llvm.org/D90296
2020-10-28 21:54:14 +00:00
Jay Foad 50ee22d791 [AMDGPU] Fix double space in disassembly of SDWA instructions with vcc
Differential Revision: https://reviews.llvm.org/D90317
2020-10-28 21:39:39 +00:00
Austin Kerbow 8b127a8661 [AMDGPU] Fix inserting combined s_nop in bundles
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90334
2020-10-28 14:34:04 -07:00
Paul C. Anagnostopoulos 9d72065cf6 [TableGen] [AMDGPU] Add !sub operator for subtraction
Use it in the AMDGPU target to eliminate !add(value1, !mul(value2, -1))

Differential Revision: https://reviews.llvm.org/D90107
2020-10-28 12:27:53 -04:00
Jay Foad 9e634bc22f [AMDGPU] Omit needless string concatenations. NFC. 2020-10-28 12:56:52 +00:00
Carl Ritson 057934a6d7 [AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90236
2020-10-28 12:15:15 +09:00
Stanislav Mekhanoshin 78ae1f6c90 [AMDGPU] Change predicate for fma/fmac legacy
I do not exactly like the use of a negative predicate to
enable instructions' support. Change HasNoMadMacF32Insts
with HasFmaLegacy32.

Differential Revision: https://reviews.llvm.org/D90250
2020-10-27 12:03:52 -07:00
Michael Liao 46c3d5cb05 [amdgpu] Add the late codegen preparation pass.
Summary:
- Teach that pass to widen naturally aligned but not DWORD aligned
  sub-DWORD loads.

Reviewers: rampitec, arsenm

Subscribers:

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80364
2020-10-27 14:07:59 -04:00
Michael Liao 0d092303b4 [amdgpu] Enable use of AA during codegen.
- Add an internal option `-amdgpu-use-aa-in-codegen` to enable or
  disable this feature. By Default, it's enabled.

Differential Revision: https://reviews.llvm.org/D89320
2020-10-27 09:46:23 -04:00
Jay Foad 6539ebe97d [AMDGPU] Use DPP instead of Ext in a couple of class names. NFC. 2020-10-27 10:22:30 +00:00
Carl Ritson 7a880ab388 [AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling.  Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation.  These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88081
2020-10-27 10:25:53 +09:00
Stanislav Mekhanoshin d176e13ca5 Fixed release build after D89170 2020-10-26 16:00:57 -07:00
Stanislav Mekhanoshin 038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Stanislav Mekhanoshin ad8131bb03 [AMDGPU] Fix VC warning about singed/unsigned comparison. NFC.
This is the warning reported in https://reviews.llvm.org/D89599
2020-10-26 11:55:57 -07:00
Benjamin Kramer b777d30496 [AMDGPU] Avoid unused variable warning in Release builds. NFC.
SIRegisterInfo.cpp:480:19: error: unused variable 'SOffset'
2020-10-26 18:11:57 +01:00
Jay Foad 0ca4124798 [AMDGPU] Make more use of printNamedBit in AMDGPUInstPrinter. NFC. 2020-10-26 14:03:35 +00:00
Sebastian Neubauer a094b4fa4b [AMDGPU] Emit new pal metadata by default
If no pal metadata is given, default to the msgpack format instead of
the legacy metadata. This makes tests better readable.

Differential Revision: https://reviews.llvm.org/D90035
2020-10-26 10:16:17 +01:00
Christudasan Devadasan 5a061041ec [AMDGPU] Avoid offset register in MUBUF for direct stack object accesses
We use an absolute address for stack objects and
it would be necessary to have a constant 0 for soffset field.

Fixes: SWDEV-228562

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89234
2020-10-26 11:08:37 +05:30
dfukalov 9068c20965 [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.

Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89973
2020-10-24 19:53:08 +03:00
Stanislav Mekhanoshin 2e64ad9494 [AMDGPU] Fixed isLegalRegOperand() with physregs
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.

That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.

Differential Revision: https://reviews.llvm.org/D90064
2020-10-23 11:33:34 -07:00
vpykhtin 00255f4192 [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse.
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393.

First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter.

I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among.
modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary.

Differential Revision: https://reviews.llvm.org/D89386
2020-10-23 19:17:48 +03:00
Jay Foad 958130dfda [AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.

Differential Revision: https://reviews.llvm.org/D90028
2020-10-23 16:16:13 +01:00
Matt Arsenault 8a59d4b654 AMDGPU: Don't query for TII in TII 2020-10-23 10:34:24 -04:00
Matt Arsenault d61996473d AMDGPU: Increase branch size estimate with offset bug
This will be relaxed to insert a nop if the offset hits the bad value,
so over estimate branch instruction sizes.
2020-10-23 10:34:24 -04:00
Jay Foad 86a480e9ce [AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy
Differential Revision: https://reviews.llvm.org/D88955
2020-10-23 09:31:00 +01:00
Tim Corringham 3c1273d737 [AMDGPU] Add amdgpu specific loop threshold metadata
Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU
specific unroll threshold value to be specified on a loop by loop basis.

The intention is to be able to to allow more nuanced hints, e.g. specifying a
low threshold value to indicate that a loop may be unrolled if cheap enough
rather than using the all or nothing llvm.loop.unroll.disable metadata.

Differential Revision: https://reviews.llvm.org/D84779
2020-10-22 17:21:32 +01:00
Piotr Sobczak 7ae0033ca8 [AMDGPU] Fix expansion of i16 MULH
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.

Differential Revision: https://reviews.llvm.org/D89965
2020-10-22 17:05:06 +02:00
Matt Arsenault d5c0561667 AMDGPU: Fix not always reserving VGPRs used for SGPR spilling
The VGPRs used for SGPR spills need to be reserved, even if we aren't
speculatively reserving one.

This was broken by 117e5609e9.
2020-10-22 10:19:19 -04:00
Matt Arsenault d3bcfe2a36 AMDGPU: Implement getNoPreservedMask
We don't support funclets for exception handling and I hit this when
manually reducing MIR.
2020-10-22 10:17:31 -04:00
Tony 8e8cc587a5 [NFC][AMDGPU] Reorder SIMemoryLegalizer functions to be consistent
- Make the SIMemoryLegalizer insertAcquire function be in the same
  order for each target to be consistent.

Differential Revision: https://reviews.llvm.org/D89880
2020-10-22 05:39:18 +00:00
Stanislav Mekhanoshin 611959f004 [AMDGPU] Fixed v_swap_b32 match
1. Fixed liveness issue with implicit kills.
2. Fixed potential problem with an indirect mov.

Fixes: SWDEV-256848

Differential Revision: https://reviews.llvm.org/D89599
2020-10-21 10:14:24 -07:00
Joe Nash f6d7832f4c [AMDGPU] Refactor SOPC & SOPP .td for extension
We use the Real vs Pseudo instruction abstraction for other
types of instructions to facilitate changes in opcode
between gpu generations.
This patch introduces that abstraction to SOPC and SOPP.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89738

Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138
2020-10-21 12:35:52 -04:00
Matt Arsenault 1ed4caff1d AMDGPU: Lower the threshold reported for maximum stack size exceeded
Check the actual maximum supported stack size for a kernel.
2020-10-21 12:06:27 -04:00
Matt Arsenault 53c43431bc AMDGPU: Propagate amdgpu-flat-work-group-size attributes
Fixes being overly conservative with the register counts in called
functions. This should try to do a conservative range merge, but for
now just clone.

Also fix not being able to functionally run the pass standalone.
2020-10-21 12:06:24 -04:00
Nicholas Guy 9a2d2bedb7 Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.

Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose

Differential Revision: https://reviews.llvm.org/D88494
2020-10-21 11:52:47 +01:00
Sebastian Neubauer 5290f50e44 [AMDGPU] Fix off by one in assert
D89217 did not subtract one when accessing SubRegFromChannelTable in one
place.

Differential Revision: https://reviews.llvm.org/D89804
2020-10-21 12:37:52 +02:00
Jay Foad f6a5699c6c [AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC. 2020-10-21 09:56:43 +01:00
Carl Ritson 324a15cead [AMDGPU][NFC] Fix missing size in comment 2020-10-21 11:38:21 +09:00
Austin Kerbow ebdcef20ce [AMDGPU] Avoid inserting noops during scheduling
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.

Depends on D89753.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89754
2020-10-20 17:11:36 -07:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Tony 1bc7bfffdb [AMDGPU] Optimize waitcnt insertion for flat memory operations
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.

In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.

This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.

Differential Revision: https://reviews.llvm.org/D89618
2020-10-20 22:55:12 +00:00
Paul C. Anagnostopoulos 332ff48dee [AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans
Differential Revision: https://reviews.llvm.org/D89796
2020-10-20 16:23:15 -04:00
vnalamot 89f7ccea6f [AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly
Remove getAllVGPR32() interface and update the SGPR spill code to use
a proper method to get the relevant VGPR registers list.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89806
2020-10-20 23:15:24 +05:30
Jay Foad 4913e3627c [AMDGPU] Remove unused declaration. NFC.
The implementation of this method was removed in D89706.
2020-10-20 16:31:42 +01:00
Michael Liao 2a0e4d1c01 [amdgpu] Enhance AMDGPU AA.
- In general, a generic point may alias to pointers in all other address
  spaces. However, for certain cases enforced by the programming model,
  we may found a generic point won't alias to pointers to local objects.
  * When a generic pointer is loaded from the constant address space, it
    could only be a pointer to the GLOBAL or CONSTANT address space.
    Thus, it won't alias to pointers to the PRIVATE or LOCAL address
    space.
  * When a generic pointer is passed as a kernel argument, it also could
    only be a pointer to the GLOBAL or CONSTANT address space. Thus, it
    also won't alias to pointers to the PRIVATE or LOCAL address space.

Differential Revision: https://reviews.llvm.org/D89525
2020-10-20 09:54:12 -04:00
Carl Ritson be2afbd019 [AMDGPU] Remove fix up operand from SI_ELSE
Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified.  Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.

This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D89644
2020-10-20 19:15:21 +09:00
Carl Ritson 6aabbeadae [AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility
Remove duplicate code and move things around to make it easier to
add additional optimisations to the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89619
2020-10-20 17:22:43 +09:00
Stanislav Mekhanoshin 6ddadf9901 [AMDGPU] flat scratch ST addressing mode on gfx10
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.

Differential Revision: https://reviews.llvm.org/D89501
2020-10-19 15:29:52 -07:00
Tony 6be9c7d2dc [AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp 2020-10-19 18:50:28 +00:00
Jay Foad 56f6bf1a8d [AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24
These were introduced in r279902 on the grounds that using separate
MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple
uses of the operands, which would prevent SimplifyDemandedBits from
simplifying the operands.

This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations"

No functional change intended. At least it has no effect on lit tests.

Differential Revision: https://reviews.llvm.org/D89706
2020-10-19 19:15:34 +01:00
Tony 151e297034 [AMDGPU] Simplify cumode handling in SIMemoryLegalizer
Differential Revision: https://reviews.llvm.org/D89663
2020-10-19 17:13:45 +00:00
Piotr Sobczak c872faf6e0 [AMDGPU] Do not generate S_CMP_LG_U64 on gfx7
S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64().

Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that
do not support 64-bit scalar compare.

Differential Revision: https://reviews.llvm.org/D89536
2020-10-19 14:44:31 +02:00
Austin Kerbow 978fbd8268 [AMDGPU] Run hazard recognizer pass later
If instructions were removed in peephole passes after the hazard recognizer was
run it is possible that new hazards could be introduced.

Fixes: SWDEV-253090

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D89077
2020-10-16 12:15:51 -07:00
Jay Foad 1417abe54c [AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic
Differential Revision: https://reviews.llvm.org/D89558
2020-10-16 17:10:21 +01:00
Matt Arsenault ce16b6835b AMDGPU: Don't kill super-register with overlapping copy
This would end up killing part of the result super-register, resulting
in a verifier error on a later use of the overlapping registers.  We
could add kills of any non-aliasing registers, but we should be moving
away from relying on kill flags.
2020-10-16 09:34:35 -04:00
Sebastian Neubauer dd3f7a494a [AMDGPU] Add a message to an assert 2020-10-16 13:03:20 +02:00
Tony e2af9bd611 [AMDGPU] Correct comment typo in AMDGPUSubtarget.h 2020-10-16 08:49:02 +00:00
alex-t 42ed388120 [AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough
removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB.
         It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB.
         It only may be allowed in case the new successor itself has no successors to which it fall through.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89397
2020-10-15 23:20:54 +03:00
Stanislav Mekhanoshin d1beb95d12 [AMDGPU] gfx1032 target
Differential Revision: https://reviews.llvm.org/D89487
2020-10-15 12:41:18 -07:00
Matt Arsenault 663f16684d AMDGPU: Fix verifier error on killed spill of partially undef register
This does unfortunately end up with extra waitcnts getting inserted
that were avoided before. Ideally we would avoid the spills of these
undef components in the first place.
2020-10-15 09:45:44 -04:00
Carl Ritson b70cb50204 [AMDGPU] Minimize number of s_mov generated by copyPhysReg
Generate the minimal set of s_mov instructions required when
expanding a SGPR copy operation in copyPhysReg.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89187
2020-10-15 22:35:02 +09:00
David Sherwood 47f2dc7e5f [SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.

Differential Revision: https://reviews.llvm.org/D89101
2020-10-15 09:01:21 +01:00
Tony b3a38bc2dc [AMDGPU] Correct typos in SIMemoryLegalizer.cpp comments 2020-10-15 02:07:56 +00:00
Konstantin Zhuravlyov 3fdf3b1539 AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
2020-10-14 13:04:27 -04:00
Carl Ritson 01549dd976 [AMDGPU] Base getSubRegFromChannel on TableGen data
Generate (at runtime) the table used to drive getSubRegFromChannel,
base on AMDGPUSubRegIdxRanges from TableGen data.
The is a step closer to it being staticly generated by TableGen and
allows getSubRegFromChannel handle all bitwidths in the mean time.

Reviewed By: rampitec, arsenm, foad

Differential Revision: https://reviews.llvm.org/D89217
2020-10-14 20:25:09 +09:00
Tony 907d799070 [AMDGPU] Cleanup memory legalizer interfaces
- Rename interfaces to be in terms of acquire and release.
- Improve comments.

Differential Revision: https://reviews.llvm.org/D89355
2020-10-14 06:07:51 +00:00
Jay Foad edc37baca6 [AMDGPU] Add MC layer support for v_fmac_legacy_f32
This instruction was introduced in GFX10.3, reusing the opcode of
v_mac_legacy_f32 from GFX10.1.

Differential Revision: https://reviews.llvm.org/D89247
2020-10-13 21:57:33 +01:00
Jay Foad b59d8d7c72 [AMDGPU][GlobalISel] Compute known bits for zero-extending loads
Implement computeKnownBitsForTargetInstr for G_AMDGPU_BUFFER_LOAD_UBYTE
and G_AMDGPU_BUFFER_LOAD_USHORT. This allows generic combines to remove
some unnecessary G_ANDs.

Differential Revision: https://reviews.llvm.org/D89316
2020-10-13 16:22:00 +01:00
Jay Foad cdf0214845 [AMDGPU] v_mac_legacy_f32 does not support DPP
Differential Revision: https://reviews.llvm.org/D89245
2020-10-13 10:03:00 +01:00
Ruiling Song b215a26628 [AMDGPU] Update LiveVariables in convertToThreeAddress()
This can fix an asan failure like below.
==15856==ERROR: AddressSanitizer: use-after-poison on address ...
READ of size 8 at 0x6210001a3cb0 thread T0
    #0 llvm::MachineInstr::getParent()
    #1 llvm::LiveVariables::VarInfo::findKill()
    #2 TwoAddressInstructionPass::rescheduleMIBelowKill()
    #3 TwoAddressInstructionPass::tryInstructionTransform()
    #4 TwoAddressInstructionPass::runOnMachineFunction()

We need to update the Kills if we replace instructions. The Kills
may be later accessed within TwoAddressInstruction pass.

Differential Revision: https://reviews.llvm.org/D89092
2020-10-13 08:12:20 +08:00
Mircea Trofin 43d347995c [NFC][MC] Use MCRegister in LiveRangeMatrix
The change starts from LiveRangeMatrix and also checks the users of the
APIs are typed accordingly.

Differential Revision: https://reviews.llvm.org/D89145
2020-10-12 08:54:36 -07:00
Sebastian Neubauer 7f2a641aad [AMDGPU] Insert waterfall loops for divergent calls
Extend loadSRsrcFromVGPR to allow moving a range of instructions into
the loop. The call instruction is surrounded by copies into physical
registers which should be part of the waterfall loop.

Differential Revision: https://reviews.llvm.org/D88291
2020-10-12 17:16:11 +02:00
Tim Renouf 666ef0db20 [AMDGPU] Add gfx602, gfx705, gfx805 targets
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:

* The "Oland" and "Hainan" variants of gfx601 are now split out into
  gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
  that to avoid using the shaderZExport workaround on gfx602.

* One variant of gfx703 is now split out into gfx705. LLPC and other
  front-ends could use that to avoid using the
  shaderSpiCsRegAllocFragmentation workaround on gfx705.

* The "TongaPro" variant of gfx802 is now split out into gfx805.
  TongaPro has a faster 64-bit shift than its former friends in gfx802,
  and a subtarget feature could be set up for that to take advantage of
  it. This commit does not make that change; it just adds the target.

V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
    so fix the GPUKind order.

Differential Revision: https://reviews.llvm.org/D88916

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-10-10 17:22:22 +01:00
Jay Foad 1dfbc2ea14 [AMDGPU] Only enable mad/mac legacy f32 patterns if denormals may be flushed
Following on from D88890, this makes the newly added patterns
conditional on NoFP32Denormals. mad/mac f32 instructions always flush
denormals regardless of the MODE register setting, and I believe the
legacy variants do the same.

Differential Revision: https://reviews.llvm.org/D89123
2020-10-09 17:08:38 +01:00
Fangrui Song 2c4c2dc2d9 [MCRegister] Simplify isStackSlot & isPhysicalRegister and delete isPhysical. NFC 2020-10-08 22:08:33 -07:00
Austin Kerbow a4f35ab232 [AMDGPU] Fix mai hazard VALU to LD/ST
Fixes: SWDEV-251863

Differential Revision: https://reviews.llvm.org/D89079
2020-10-08 17:13:02 -07:00
Jay Foad 7238faa4ae [AMDGPU] Add patterns for mad/mac legacy f32 instructions
Note that all subtargets up to GFX10.1 have v_mad_legacy_f32, but GFX8/9
lack v_mac_legacy_f32. GFX10.3 has no mad/mac f32 instructions at all.

Differential Revision: https://reviews.llvm.org/D88890
2020-10-08 15:20:06 +01:00
Sebastian Neubauer f53b43c00a [AMDGPU] Use isLegalMUBUFImmOffset more
Instead of hardcoding isUInt<12>.

Differential Revision: https://reviews.llvm.org/D88961
2020-10-08 14:31:44 +02:00
Dmitry Preobrazhensky 1e75668821 [AMDGPU][MC][GFX1030] Disabled v_mac_f32
See bug 47741 <https://bugs.llvm.org/show_bug.cgi?id=47741>

Reviewers: nhaehnle, rampitec

Differential Revision: https://reviews.llvm.org/D89000
2020-10-08 14:00:52 +03:00
Mirko Brkusanin 7c88d13fd1 [AMDGPU] Prefer SplitVectorLoad/Store over expandUnalignedLoad/Store
ExpandUnalignedLoad/Store can sometimes produce unnecessary copies to
temporary stack slot. We should prefer splitting vectors if possible.

Differential Revision: https://reviews.llvm.org/D88882
2020-10-08 10:17:15 +02:00
Stanislav Mekhanoshin 45014ce36f [AMDGPU] Add tied operand to d16 scratch loads
This is still no-op because there is no selection for these
opcodes.

Differential Revision: https://reviews.llvm.org/D88927
2020-10-07 11:13:01 -07:00
Stanislav Mekhanoshin 7361ce73ef [AMDGPU] Use default zero flag operands in flat scratch
This is no-op so far because we do not select these yet.

Differential Revision: https://reviews.llvm.org/D88920
2020-10-07 10:56:47 -07:00
Ronak Chauhan 528057c197 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder, jhenderson, kzhuravl

Differential Revision: https://reviews.llvm.org/D80713
2020-10-07 20:39:43 +05:30
Dmitry Preobrazhensky 4a7e7620d6 [AMDGPU][MC] Improved diagnostics for instructions with missing features
Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D88887
2020-10-07 16:31:29 +03:00
Rodrigo Dominguez f71f5f39f6 [AMDGPU] Implement hardware bug workaround for image instructions
Summary:
This implements a workaround for a hardware bug in gfx8 and gfx9,
where register usage is not estimated correctly for image_store and
image_gather4 instructions when D16 is used.

Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81172
2020-10-07 07:39:52 -04:00
Scott Linder e4a9e4ef55 [AMDGPU] Emit correct kernel descriptor on big-endian hosts
Previously we wrote multi-byte values out as-is from host memory. Use
the `emitIntN` helpers in `MCStreamer` to produce a valid descriptor
irrespective of the host endianness.

Reviewed By: arsenm, rochauha

Differential Revision: https://reviews.llvm.org/D88858
2020-10-06 17:29:38 +00:00
Stanislav Mekhanoshin acce6b6082 [AMDGPU] Create isGFX9Plus utility function
Introduce a utility function to make it more
convenient to write code that is the same on
the GFX9 and GFX10 subtargets.

Use isGFX9Plus in the AsmParser for AMDGPU.

Authored By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D88908
2020-10-06 10:18:43 -07:00
Sebastian Neubauer b4264210f2 [AMDGPU] Remove SIInstrInfo::calculateLDSSpillAddress
This function does not seem to be used anymore.

Differential Revision: https://reviews.llvm.org/D88904
2020-10-06 18:45:22 +02:00
Dmitry Preobrazhensky e2452f57fa [AMDGPU][MC] Added detection of unsupported instructions
Implemented identification of unsupported instructions; improved errors reporting.

See bug 42590.

Reviewers: rampitec

Differential Revision: https://reviews.llvm.org/D88211
2020-10-06 16:44:27 +03:00
Sebastian Neubauer 9fc535f987 [AMDGPU] Fix gcc warnings
uint8_t types are implicitly promoted to int, leading to a
unsigned-signed comparison.

Thanks for the heads-up @uabelho.

Differential Revision: https://reviews.llvm.org/D88876
2020-10-06 10:55:08 +02:00
Carl Ritson c3e07a0018 [AMDGPU] SIInsertSkips: Refactor early exit block creation
Refactor exit block creation to a single call ensureEarlyExitBlock.
Add support for generating an early exit block which clears the
exec mask, but only add this instruction when required.
These changes are to facilitate adding more forms of early
termination for PS shaders in the near future.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88775
2020-10-06 09:44:55 +09:00
Sebastian Neubauer 6a089ce0e4 [AMDGPU] Use tablegen for argument indices
Use tablegen generic tables to get the index of image intrinsic
arguments.
Before, the computation of which image intrinsic argument is at which
index was scattered in a few places, tablegen, the SDag instruction
selection and GlobalISel. This patch changes that, so only tablegen
contains code to compute indices and the ImageDimIntrinsicInfo table
provides these information.

Differential Revision: https://reviews.llvm.org/D86270
2020-10-05 11:50:52 +02:00
Jay Foad 16778b19f2 [AMDGPU] Make bfe patterns divergence-aware
This tends to increase code size but more importantly it reduces vgpr
usage, and could avoid costly readfirstlanes if the result needs to be
in an sgpr.

Differential Revision: https://reviews.llvm.org/D88580
2020-10-05 09:55:10 +01:00
Jay Foad 0d5989bb24 [AMDGPU] Split R600 and GCN bfe patterns
This is in preparation for making the GCN patterns divergence-aware.
NFC.

Differential Revision: https://reviews.llvm.org/D88579
2020-10-05 09:55:10 +01:00
Roman Lebedev 03bd5198b6
[OldPM] Pass manager: run SROA after (simple) loop unrolling
I have stumbled into this pretty accidentally, when rewriting
some spaghetti-like code into something more structured,
which involved using some `std::array<>`s. And to my surprise,
the `alloca`s remained, causing about `+160%` perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions
suggests that this has geomean compile-time cost of `+0.08%`.

Note that D68593 / cecc0d27ad
already did this chage for NewPM, but left OldPM in a pessimized state.

This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 | PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 | PR42794 ]] and probably some other reports.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D87972
2020-10-04 11:53:50 +03:00
Tres Popp bfd7ee92cc Handle unused variable without asserts 2020-10-02 10:22:55 +02:00
Carl Ritson 2ef9d21e1a [AMDGPU] SIInsertSkips: Tidy block splitting to use splitAt
Convert to use new MachineBasicBlock splitAt function.
Place code in splitBlock function for reuse in future changes.
Should yield no functional change.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88537
2020-10-02 11:10:55 +09:00
Stanislav Mekhanoshin caeb13aba8 [AMDGPU] Allow SOP asm mnemonic to differ
Allows the creation of real SOP1 instructions with
assembler mnemonics that differ from their
pseudo-instruction mnemonics. The default behavior
keeps the mnemonics matching.

Corrects a subtarget label typo in a comment.

Authored By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D88708
2020-10-01 16:00:04 -07:00
Jay Foad e20f459229 [AMDGPU] Simplify getNumFlatOffsetBits. NFC.
Remove some checks that have already been done in the only caller.
2020-10-01 15:24:09 +01:00
Jay Foad 866d9b03f2 [AMDGPU] Tiny cleanup in isLegalFLATOffset. NFC. 2020-10-01 14:26:03 +01:00
Stanislav Mekhanoshin 722d792499 [AMDGPU] Reorganize VOP3P encoding
This changes width of encoding and opcode fields to match the
documentation.

Differential Revision: https://reviews.llvm.org/D88619
2020-09-30 15:27:06 -07:00
Mirko Brkusanin 0249df33fe [AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when
-amdgpu-atomic-optimizer is present.

Differential Revision: https://reviews.llvm.org/D88315
2020-09-30 11:09:18 +02:00
Stanislav Mekhanoshin 61b3106965 [AMDGPU] Remove SIEncodingFamily.GFX10_B
It turns to be not needed anymore.

Differential Revision: https://reviews.llvm.org/D88520
2020-09-29 15:59:49 -07:00
Mirko Brkusanin 8b08fa0103 Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"
This reverts commit f5cd7ec9f3.

Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.
2020-09-29 15:33:34 +02:00
Jay Foad 0e0a0c8d2c [AMDGPU] Reformat AMDGPUTargetLowering::isSDNodeAlwaysUniform. NFC. 2020-09-28 16:24:16 +01:00
Jay Foad d3a8e333ec [AMDGPU] Reformat SITargetLowering::isSDNodeSourceOfDivergence. NFC. 2020-09-28 14:42:05 +01:00
Jay Foad bab1a17ad7 [AMDGPU] Add bfi immediate pattern
Differential Revision: https://reviews.llvm.org/D88246
2020-09-28 10:16:51 +01:00
Jay Foad 2806f586dc [AMDGPU] Make bfi patterns divergence-aware
This tends to increase code size but more importantly it reduces vgpr
usage, and could avoid costly readfirstlanes if the result needs to be
in an sgpr.

Differential Revision: https://reviews.llvm.org/D88245
2020-09-28 10:16:51 +01:00
Jay Foad 286d3fc750 [AMDGPU] Split R600 and GCN bfi patterns
This is in preparation for making the GCN patterns divergence-aware.
NFC.

Differential Revision: https://reviews.llvm.org/D88244
2020-09-28 10:16:51 +01:00
Fangrui Song 20e9c36c01 Internalize functions from various tools. NFC
And internalize some classes if I noticed them:)
2020-09-26 15:57:13 -07:00
Jay Foad f11f382523 [AMDGPU] Fix declaration parameter names to match definition
This fixes the declaration of AMDGPULegalizerInfo::legalizeBufferLoad to
match the definition. It is still confusing that that parameter order is
different from legalizeBufferStore.

https://bugs.llvm.org/show_bug.cgi?id=47535
2020-09-25 11:38:17 +01:00
Stanislav Mekhanoshin 27a62f6317 [AMDGPU] global-isel support for RT
Differential Revision: https://reviews.llvm.org/D87847
2020-09-24 10:29:45 -07:00
Jay Foad c05cf1ca3c [AMDGPU] Use cast instead of dyn_cast 2020-09-24 15:20:49 +01:00
Sebastian Neubauer 6f7cd16d29 [AMDGPU] Fix v3f16 handling for getresinfo
v3f32 should not be expanded to v4f32. getresinfo with a dmask of 7
created an image sample with a v3f32 return value, which was bitcasted
to a v4f32 in constructRetValue.

Differential Revision: https://reviews.llvm.org/D88206
2020-09-24 16:03:02 +02:00
Pushpinder Singh 41d6669f1f [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH
Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D85653
2020-09-23 22:25:29 -04:00
Carl Ritson 1e0500d4f7 [AMDGPU] Consider all SGPR uses as unique in constant bus verify
Fix the verifier so that overlapping SGPR operands are counted
independently.  We cannot assume that overlapping SGPR accesses
only count as a single constant bus use.
The exception is implicit uses which do not add to constant bus
usage (only) when overlapping.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87748
2020-09-24 10:52:40 +09:00
Sebastian Neubauer a343b9b032 Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57.

According to michel.daenzer,
> This completely broke the Mesa radeonsi driver on Navi 14. Xorg +
> xterm come up with major corruption & psychedelic colours.
2020-09-23 17:16:39 +02:00
Matt Arsenault af0207f2ba AMDGPU: Check global FP atomics match default FP mode
We would always select global FP atomics from atomicrmw fadd, although
they have a hardcoded FP mode.
2020-09-23 09:07:50 -04:00
Sebastian Neubauer ca907bfb57 [AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the
caller or the callee can insert a waitcnt to ensure that all reads are
finished.
Calls need some time to be executed, so if the callee inserts the
waitcnt, filling the instruction buffer and waiting for memory will be
interleaved, hiding some latency. This comes at the cost of having a
waitcnt inside functions that may not be needed as no memory operations
are outstanding.

For function calls, this is already implemented. The same principal
applies to returns: If the caller inserts a waitcnt after the call, the
callee does not have to wait and the return and memory operation can be
run in parallel.

This commit implements waiting in the caller after returning from a
function call.

Differential Revision: https://reviews.llvm.org/D87674
2020-09-23 12:17:59 +02:00
Piotr Sobczak 8d7fd73c3a [AMDGPU] Fix merging m0 inits
Fix incorrect merges of m0 inits in loops.

It was assumed that if a clobbering instruction appears in
the same block as an init and the clobbering instruction
does not dominate the init then it does not interfere with
init.

This does not work in the presence of loops, where in this
scenario, the clobbering instruction does interfere with
the init in another iteration.

To fix this, do not check for block equality and defer the
decision to the predecessor check.

Differential Revision: https://reviews.llvm.org/D87882
2020-09-23 09:13:43 +02:00
Fangrui Song 49f2744931 Change LoopInfo::empty to isInnermost after D82895 2020-09-22 14:07:40 -07:00
Jay Foad 892ef2e3c0 [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector
It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.

Differential Revision: https://reviews.llvm.org/D88028
2020-09-22 10:41:38 +01:00
Matt Arsenault 6daddc213f AMDGPU: Don't add frame register to frame pseudos
We no longer treat the frame register like a function argument, so the
problem this avoided is no longer relevant.
2020-09-21 16:18:47 -04:00
Alexander Belyaev 17dc729bd4 Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de6.

Google internal backend uses EntrySU, we are looking into removing
dependency on it.

Differential Revision: https://reviews.llvm.org/D88018
2020-09-21 13:33:05 +02:00
Matt Arsenault 3105d0f84b CodeGen: Move split block utility to MachineBasicBlock
AMDGPU needs this in several places, so consolidate them here.
2020-09-18 14:05:18 -04:00
Matt Arsenault 0576f436e5 AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.

This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.

In a future change, this should always split the block.
2020-09-18 13:43:01 -04:00
Francis Visoiu Mistrih 0345d88de6 [NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.

Differential Revision: https://reviews.llvm.org/D87867
2020-09-18 09:50:47 -07:00
Matt Arsenault 27df165270 Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."
This reverts commit c3492a1aa1.

I think this is the wrong strategy and wrong place to do this
transform anyway. Also reverts follow up commit
7d593d0d69.
2020-09-18 09:48:33 -04:00
Mirko Brkusanin ae36c02ad0 [AMDGPU] Set DS alignment requirements to be more strict
Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are
now the same as for other GCN subtargets. This way we can avoid any
unintentional use of these instructions on systems that do not support dword
alignment and instead require natural alignment.
This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default.

Differential Revision: https://reviews.llvm.org/D87821
2020-09-18 15:26:24 +02:00
Bogdan Graur 7d593d0d69 [amdgpu] Compilation fix for Release
Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D87838
2020-09-17 18:04:53 +02:00
Michael Liao c3492a1aa1 [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.
- Need to lower COPY from SGPR to VGPR to a real instruction as the
  standard COPY is used where the source and destination are from the
  same register bank so that we potentially coalesc them together and
  save one COPY. Considering that, backend optimizations, such as CSE,
  won't handle them. However, the copy from SGPR to VGPR always needs
  materializing to a native instruction, it should be lowered into a
  real one before other backend optimizations.

Differential Revision: https://reviews.llvm.org/D87556
2020-09-17 11:04:17 -04:00
alex-t 0efbb70b71 [AMDGPU] should expand ROTL i16 to shifts.
Instruction combining pass turns library rotl implementation to llvm.fshl.i16.
In the selection dag the intrinsic is turned to ISD::ROTL node that cannot be selected.
Need to expand it to shifts again.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D87618
2020-09-17 17:34:33 +03:00
Simon Pilgrim f026812110 InstCombiner.h - remove unnecessary KnownBits.h include. NFCI.
Move the include down to cpp files with an implicit dependency.
2020-09-17 14:28:42 +01:00
Simon Pilgrim 8adf92e2d1 [AMDGPU] Remove orphan SITargetLowering::LowerINT_TO_FP declaration. NFCI.
Method implementation no longer exists.
2020-09-17 10:45:53 +01:00
Stanislav Mekhanoshin 91f503c3af [AMDGPU] gfx1030 RT support
Differential Revision: https://reviews.llvm.org/D87782
2020-09-16 11:40:58 -07:00
Matt Arsenault 367248956e AMDGPU: Clear offset register when using local stack area
eliminateFrameIndex won't fix up the offset register when the direct
frame index reference is moved to a separate move instruction. Switch
the offset to a base 0 (which it probably should be to begin with).
2020-09-16 12:56:40 -04:00
Jay Foad cb64455faa [AMDGPU] Remove obsolete comment
Obsoleted by e4464bf3d4 "AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR"
2020-09-16 17:03:55 +01:00
Dmitry Preobrazhensky 06d058afec [AMDGPU] Corrected directive to use for ELF weak refs
WeakRefDirective should specify a directive to declare "a global as being a weak undefined symbol".
The directive used by AMDGPU was incorrect - ".weakref" was intended for other purposes.
The correct directive is ".weak" and it is already defined as default for ELF.
So the redefinition was removed.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87762
2020-09-16 18:51:26 +03:00
Mircea Trofin 6e85c3d5c7 [NFC][Regalloc] accessors for 'reg' and 'weight'
Also renamed the fields to follow style guidelines.

Accessors help with readability - weight mutation, in particular,
is easier to follow this way.

Differential Revision: https://reviews.llvm.org/D87725
2020-09-16 08:28:57 -07:00
Matt Arsenault 71131db689 AMDGPU: Improve <2 x i24> arguments and return value handling
This was asserting for GlobalISel. For SelectionDAG, this was
passing this on the stack. Instead, scalarize this as if it were a
32-bit vector.
2020-09-16 11:21:56 -04:00
Sebastian Neubauer 833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Jay Foad 90777e2924 [AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.

Differential Revision: https://reviews.llvm.org/D87446
2020-09-16 16:10:47 +01:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Simon Pilgrim 97a23ab28a AMDGPUPrintfRuntimeBinding.cpp - drop unnecessary casts/dyn_casts. NFCI.
GetElementPtrInst::Create returns a GetElementPtrInst* so we don't need to cast. Similarly IntegerType inherits from the Type base class.

Also, I've used auto* in a few places to cleanup the code.

Helps fix some clang-tidy warnings which saw the dyn_casts and warned that these can return null.
2020-09-15 14:49:04 +01:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Austin Kerbow f859c30ecb [AMDGPU] Add XDL resource to scheduling model
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87621
2020-09-14 13:48:54 -07:00
Eric Astor 20201dc76a [ms] [llvm-ml] Add support for size queries in MASM
Add support for size inference, sizeof, typeof, and lengthof.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86947
2020-09-14 14:27:06 -04:00
Jay Foad c799f873cb [AMDGPU] Don't cluster stores
Clustering loads has caching benefits, but as far as I know there is no
advantage to clustering stores on any AMDGPU subtargets.

The disadvantage is that it tends to increase register pressure and
restricts scheduling freedom.

Differential Revision: https://reviews.llvm.org/D85530
2020-09-14 13:40:17 +01:00
Petar Avramovic 6e2a86ed5a AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN
Check for NoNaNsFPMath function attribute in isKnownNeverSNaN.
Function attributes are in held in 'TargetMachine.Options'.
Among other things, this allows selection of some patterns imported
in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN
returns true in lowerFMinNumMaxNum.

However we notice some incorrect results since function attributes are
not correctly written in TargetMachine.Options when next function is
processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0,
it has "no-nans-fp-math"="false" but TargetMachine.Options still has it
set to true since first function in test file had this attribute set to
true. This will be fixed in D87511.

Differential Revision: https://reviews.llvm.org/D87456
2020-09-14 12:11:00 +02:00
Petar Avramovic 09b8871f8d AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands
Predicates with 'let PredicateCodeUsesOperands = 1' want to examine
matched operands. When we encounter predicate code that uses operands,
analyze its named operand arguments and create a map between argument
index and name. Later, when leaf node with name is encountered, emit
GIM_RecordNamedOperand that will store that operand at its argument
index in operand list. This operand list will be an argument to c++
code of the predicate.

Differential Revision: https://reviews.llvm.org/D87285
2020-09-14 10:39:56 +02:00
Matt Arsenault e15215e041 AMDGPU: Hoist check for VGPRs 2020-09-09 19:45:40 -04:00
Matt Arsenault 85490874b2 AMDGPU: Skip all meta instructions in hazard recognizer
This was not adding a necessary nop due to thinking the kill counted.
2020-09-09 19:45:40 -04:00
Matt Arsenault 82cbc9330a AMDGPU: Fix inserting waitcnts before kill uses 2020-09-09 19:45:40 -04:00
dfukalov c259d3a061 [AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
2020-09-10 01:39:25 +03:00
Amara Emerson e5784ef8f6 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Amara Emerson cc76da7ada [GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less.
This combine previously tried to take sequences like:
  %cond = G_ICMP pred, a, b
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.

I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.

Differential Revision: https://reviews.llvm.org/D86664
2020-09-09 13:08:16 -07:00
Jay Foad 649bde488c [AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter
NFC.
2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky 95b7040e43 [AMDGPU][MC] Improved diagnostic messages for invalid registers
Corrected parser to issue meaningful error messages for invalid and malformed registers.

See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87234
2020-09-09 16:44:03 +03:00
Ronak Chauhan f078577f31 Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit 487a805310.

Tests fail on big endian machines.
2020-09-09 18:01:28 +05:30
Mirko Brkusanin 43af2a6faa [AMDGPU] Workaround for LDS Misalignment bug on GFX10
Add subtarget feature check to avoid using ds_read/write_b96/128 with too
low alignment if a bug is present on that specific hardware.
Add this "feature" to GFX 10.1.1 as it is also affected.
Add global-isel test.
2020-09-09 11:46:09 +02:00
Ronak Chauhan 487a805310 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder, jhenderson, kzhuravl

Differential Revision: https://reviews.llvm.org/D80713
2020-09-08 21:26:11 +05:30
alex-t 2480a31e5d [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block
optimizeEndCF removes EXEC restoring instruction case this instruction is the only one except the branch to the single successor and that successor contains EXEC mask restoring instruction that was lowered from END_CF belonging to IF_ELSE.
As a result of such optimization we get the basic block with the only one instruction that is a branch to the single successor.
In case the control flow can reach such an empty block from S_CBRANCH_EXEZ/EXECNZ it might happen that spill/reload instructions that were inserted later by register allocator are placed under exec == 0 condition and never execute.
Removing empty block solves the problem.

This change require further work to re-implement LIS updates. Recently, LIS is always nullptr in this pass. To enable it we need another patch to fix many places across the codegen.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D86634
2020-09-07 19:37:27 +03:00
vnalamot aff94ec0f4 [AMDGPU] Remove the dead spill slots while spilling FP/BP to memory
During the PEI pass, the dead TargetStackID::SGPRSpill spill slots
are not being removed while spilling the FP/BP to memory.

Fixes: SWDEV-250393

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87032
2020-09-06 07:04:25 +05:30
Jonas Paulsson 714ceefad9 [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.

This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.

This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.

There is more that needs to be done in this area as discussed here:

Differential Revision: https://reviews.llvm.org/D86871

Review: Ulrich Weigand, Sanjay Patel
2020-09-05 10:30:38 +02:00
Matt Arsenault 3c2a7bd286 AMDGPU: Remove code to handle tied si_else operands
This has not used tied operands for a long time.
2020-09-03 19:46:05 -04:00
Michael Liao bf41c4d29e [codegen] Ensure target flags are cleared/set properly. NFC.
- When an operand is changed into an immediate value or like, ensure their
  target flags being cleared or set properly.

Differential Revision: https://reviews.llvm.org/D87109
2020-09-03 18:37:39 -04:00
Simon Pilgrim 1673a08044 SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI.
Use forward declarations and move the include down to dependent files that actually use it.

This also exposes a number of implicit dependencies on KnownBits.h
2020-09-03 18:33:25 +01:00
Dmitry Preobrazhensky ecde200209 [AMDGPU][MC] Corrected parser to avoid generation of excessive error messages
Summary of changes:
- Changed parser to eliminate generation of excessive error messages;
- Corrected lit tests to match all expected error messages;
- Corrected lit tests to guard against unwanted extra messages (added option "--implicit-check-not=error:");
- Added missing checks and fixed some typos in tests.

See bug 46907: https://bugs.llvm.org/show_bug.cgi?id=46907

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D86940
2020-09-02 19:42:18 +03:00
Jay Foad 4bdab2e86a [AMDGPU] Fix offset for REL32_HI relocs
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.

Differential Revision: https://reviews.llvm.org/D86938
2020-09-02 10:55:55 +01:00
Michael Liao 1f4e7463b5 [amdgpu] Run SROA after loop unrolling.
Summary: - There are promotable `alloca`s after loop unrolling.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, nikic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84252
2020-09-01 16:09:56 -04:00
Matt Arsenault 9145d75226 AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples
The implicit def of the super register would appear to kill any live
uses of components before the spill, and would be deleted by
MachineCopyPropagation. We need to add implicit uses of the super
register, similarly to what copyPhysReg does. VGPR tuples appear to be
correctly handled already. I need to double check the SGPR->memory
path.
2020-08-28 17:50:37 -04:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Matt Arsenault af1c1e20f4 AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize 2020-08-27 19:39:44 -04:00
Matt Arsenault 9d3dc276a6 AMDGPU: Fix broken switch braces 2020-08-27 19:39:39 -04:00
Matt Arsenault a1bc37c9e5 AMDGPU: Use caller subtarget, not intrinsic declaration
Intrinsic declarations use the default subtarget, but this should be
using the subtarget for the calling function. I haven't been able to
come up with a case where it matters though.
2020-08-27 16:42:09 -04:00
Matt Arsenault 6c770a09be AMDGPU: Hoist subtarget lookup 2020-08-27 10:27:56 -04:00
Jay Foad 45eeb8c2a9 [AMDGPU] Remove unused variable introduced in r251860 2020-08-27 13:28:32 +01:00
Piotr Sobczak 4e9d207117 [AMDGPU] Preserve vcc_lo when shrinking V_CNDMASK
There is no justification for changing vcc_lo to vcc
when shrinking V_CNDMASK, and such a change could
later confuse live variable analysis.

Make sure the original register is preserved.

Differential Revision: https://reviews.llvm.org/D86541
2020-08-27 10:22:50 +02:00
Matt Arsenault f78687df9b AMDGPU: Don't assert on misaligned DS read2/write2 offsets
This would assert with unaligned DS access enabled. The offset may not
be aligned. Theoretically the pattern predicate should check the
memory alignment, although it is possible to have the memory be
aligned but not the immediate offset.

In this case I would expect it to use ds_{read|write}_b64 with
unaligned access, but am not clear if there's a reason it doesn't.
2020-08-26 14:08:05 -04:00
Jay Foad a75e67b3b4 [AMDGPU] Make more use of Subtarget reference in SIInstrInfo 2020-08-26 15:04:00 +01:00
Matt Arsenault ff34116cf0 AMDGPU: Use Subtarget reference in SIInstrInfo 2020-08-26 09:18:41 -04:00
Matt Arsenault 21ccedc24f AMDGPU/GlobalISel: Tolerate negated control flow intrinsic outputs
If the condition output is negated, swap the branch targets. This is
similar to what SelectionDAG does for when SelectionDAGBuilder
decides to invert the condition and swap the branches.

This is leaving behind a dead constant def for some reason.
2020-08-26 08:58:54 -04:00
Jay Foad 831457c6d5 [AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guaranteed
to come to the same point at the same time.

This is the same optimization that was implemented for SelectionDAG in
D31731.

Differential Revision: https://reviews.llvm.org/D86609
2020-08-26 13:47:51 +01:00
Stanislav Mekhanoshin b7760c3e5d [AMDGPU] Remove unsound dependency on ISA version in waitcnt
Differential Revision: https://reviews.llvm.org/D86566
2020-08-25 14:01:42 -07:00
Stanislav Mekhanoshin 817c831f02 [AMDGPU] Switch to named simm16 in vscnt insertion
Differential Revision: https://reviews.llvm.org/D86568
2020-08-25 13:05:27 -07:00
Matt Arsenault 0d2fe90063 AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge
Most notably, we were incorrectly reporting <3 x s16> as a legal type
for these. Make sure these aren't legal to help make progress on
fixing the artifact combiner and vector legalizer
rules. Unfortunately, this means spreading the -global-isel-abort=0
hack, although this doesn't change the legalizer result in any
situation.
2020-08-25 09:40:20 -04:00
Matt Arsenault ef8f3b5a78 AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors
The selection patterns will currently fail on these.
2020-08-25 09:37:41 -04:00
Matt Arsenault 77e5a195f8 AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands.
We would still need to waterfall if the value were somehow an AGPR,
and also need to explicitly copy to a VGPR.
2020-08-24 17:54:34 -04:00
Matt Arsenault 75e6f0b3d4 AMDGPU: Add flag to disable promotion of uniform i16 ops
This interferes with GlobalISel's much better handling of the
situation.

This should really be disable for GlobalISel. However, the fallback
only re-runs the selection passes, and doesn't go back and rerun any
codegen IR passes. I haven't come up with a good solution to this
problem.
2020-08-24 14:39:27 -04:00
Matt Arsenault 62d1fb828f AMDGPU/GlobalISel: Use unmerge instead of extract in addrspace queries
This is a bit more consistent with regular operation legalization.
2020-08-24 11:07:51 -04:00
Matt Arsenault 70cd9f5b77 AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr
Handle workitem intrinsics. There isn't really away to adequately test
this right now, since none of the known bits users are fine grained
enough to test the edge conditions. This triggers a number of
instances of the new 64-bit to 32-bit shift combine in the existing
tests.
2020-08-24 09:53:27 -04:00
Matt Arsenault e1644a3779 GlobalISel: Reduce G_SHL width if source is extension
shl ([sza]ext x, y) => zext (shl x, y).

Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:

This is a port of an AMDGPU DAG combine added in
5fa289f0d8. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.

TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.

Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
2020-08-24 09:42:40 -04:00
Stanislav Mekhanoshin 9a9a092e61 [AMDGPU] Avoid sorting stalls in regbank-reassign
This is the slowest operation in the already slow pass.
Instead of sorting just put a stall list into an ordered
map.

Differential Revision: https://reviews.llvm.org/D86253
2020-08-21 11:49:41 -07:00
Mirko Brkusanin 0654ff703d [AMDGPU] Use ds_read/write_b96/b128 when possible for SDag
Do not break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.

Differential Revision: https://reviews.llvm.org/D84403
2020-08-21 12:26:31 +02:00
Mirko Brkusanin d17ea67b92 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Mirko Brkusanin f5cd7ec9f3 [AMDGPU] Reorganize GCN subtarget features for unaligned access
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.

Differential Revision: https://reviews.llvm.org/D84522
2020-08-21 12:26:31 +02:00
Mirko Brkusanin 5bd1febe21 [AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.

Differential Revision: https://reviews.llvm.org/D82788
2020-08-21 12:26:31 +02:00
Jay Foad 98de0d22f5 [AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy 2020-08-21 10:14:35 +01:00
Michael Liao 5257a60ee0 [amdgpu] Add codegen support for HIP dynamic shared memory.
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
  the dynamic shared memory, which size is not known at the
  compile time.

Reviewers: arsenm, yaxunl, kpyzhov, b-sumner

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82496
2020-08-20 21:29:18 -04:00
Matt Arsenault 79ce9bb380 CodeGen: Don't drop AA metadata when splitting MachineMemOperands
Assuming this is used to split a memory access into smaller pieces,
the new access should still have the same aliasing properties as the
original memory access. As far as I can tell, this wasn't
intentionally dropped. It may be necessary to drop this if you are
moving the operand outside of the bounds of the original object in
such a way that it may alias another IR object, but I don't think any
of the existing users are doing this. Some of the uses widen into
unused alignment padding, which I think is OK.
2020-08-20 16:17:30 -04:00
Matt Arsenault 18b218007d AMDGPU/GlobalISel: Legalize odd sized loads with widening
Custom lower and widen odd sized loads up to the alignment. The
default set of legalization actions doesn't have a way to represent
this. This fixes naturally aligned <3 x s8> and <3 x s16> loads.

This also starts moving towards eliminating the buggy and
overcomplicated legalization rules for narrowing. All the memory size
changes should be done in the lower or custom action, not NarrowScalar
/ FewerElements. These currently have redundant and ambiguous code
with the lower action.
2020-08-20 16:15:53 -04:00
vnalamot 54d8ded4b1 allSGPRSpillsAreDead() should use actual FP/BP frame indices
The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead()
make sure there are no SGPR spills pending during PEI. But the FP/BP
spills happen during PEI and are exceptions.

Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to
accommodate the exceptions.

Differential Revision: https://reviews.llvm.org/D86291
2020-08-20 16:15:53 -04:00
Jay Foad 3497860203 [AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
2020-08-20 17:59:11 +01:00
Sebastian Neubauer b8d1994778 [AMDGPU] Add A16/G16 to InstCombine
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.

An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.

Differential Revision: https://reviews.llvm.org/D85887
2020-08-20 10:51:49 +02:00
dfukalov 33e2f69a24 [AMDGPU][LoopUnroll] Increase BB size to analyze for complete unroll.
The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no
information about a loop body BB cost. In some cases, e.g. for simple loop
```
for(int i=0; i<32; ++i){
  D = Arr2[i*8 + C1];
  Arr1[i*64 + C2] += C3 * D;
  Arr1[i*64 + C2 + 2048] += C4 * D;
}

```
current default parameter value is not enough to run deeper cost analyze so the
loop is not completely unrolled.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D86248
2020-08-20 10:41:47 +03:00
Matt Arsenault 31adc28d24 GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources
This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.
2020-08-19 18:53:24 -04:00
Matt Arsenault 9e8d59a9b8 AMDGPU/GlobalISel: Remove hack for combines forming illegal extloads
Previously we weren't adding the LegalizerInfo to the post-legalizer
combiner. Since that's fixed, we don't need to try to filter out the
one case that was breaking.
2020-08-19 14:15:38 -04:00
Ronak Chauhan fdf71d486c Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit cacfb02d28.

Reverting due to buildbot failures.
2020-08-19 13:12:29 +05:30
Ronak Chauhan cacfb02d28 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D80713
2020-08-19 08:49:07 +05:30
Changpeng Fang e7081d117a AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc
Summary:
  When the resource descriptor is of vgpr, we need a waterfall loop
to read into a sgpr. In this patchm we generalized the  implementation
to work for any regster class sizes, and extend the work to MIMG
instructions.

Fixes: SWDEV-223405

Reviewers:
  arsenm, nhaehnle

Differential Revision:
  https://reviews.llvm.org/D82603
2020-08-18 16:27:36 -07:00
Matt Arsenault 5a15f6628e GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT
Add unit tests since AMDGPU will only trigger this for gigantic
vectors, and won't use the annoying odd sized breakdown case.
2020-08-18 13:51:19 -04:00
Matt Arsenault 2f5f5febf3 AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize
Previously, it would successfully select and assert if not HSA or PAL
when expanding the pseudoinstruction. We don't need the
pseudoinstruction anymore since we know the total size after
legalization.
2020-08-18 09:28:01 -04:00
Matt Arsenault 3ba7777b94 AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT
The code to determine the value size was overcomplicated and only
correct in the case where the result register already had a register
class assigned. We can always take the size directly from the
register's type.
2020-08-18 09:28:01 -04:00
Matt Arsenault a9ee0589a8 AMDGPU/GlobalISel: Match global saddr addressing mode 2020-08-17 15:48:06 -04:00
Matt Arsenault e1a2f4713c AMDGPU: Match global saddr addressing mode
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
2020-08-17 15:28:14 -04:00
Stanislav Mekhanoshin 24182f14b6 [AMDGPU] Define spill opcodes for all AGPR sizes
Since we have defined all these sizes I believe we shall be
able to spill these as well.

Differential Revision: https://reviews.llvm.org/D86098
2020-08-17 12:17:23 -07:00
Matt Arsenault c8a9872259 AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset
We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.

Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.
2020-08-17 12:31:38 -04:00
Steven Perron eed6476a87 Reset PAL metadata when AMDGPU traget stream finishes
If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one.  See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC.

No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side.  Let me know if there is a good way to test this.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D85667
2020-08-17 10:56:11 -04:00
Matt Arsenault c7b9cd31bf AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping 2020-08-17 09:53:26 -04:00
Matt Arsenault af162ac785 AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics
This should use the default mapping and insert a copy to the vcc bank,
and not try to insert a readfirstlane.
2020-08-17 09:53:25 -04:00
Matt Arsenault da3f357de6 AMDGPU: Don't look at dbg users for foldable operands
These would have always failed to fold, so checking them or adding
them to the fold candidates is useless.
2020-08-17 09:53:25 -04:00
Matt Arsenault 66ffa0e91f AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo 2020-08-17 09:19:22 -04:00
Matt Arsenault e0375dbcb3 AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics
Global instructions have the signed offsets.
2020-08-17 09:19:15 -04:00
Matt Arsenault f0af434b79 AMDGPU: Remove register class params from flat memory patterns 2020-08-15 12:12:33 -04:00
Matt Arsenault a7455652c0 AMDGPU: Fix global atomic saddr operand class 2020-08-15 12:12:28 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00
Matt Arsenault e5077b5c2a AMDGPU: Fix matching wrong offsets for global atomic loads
These used signed offsets with a different size.
2020-08-15 12:12:17 -04:00
Matt Arsenault 8cb022982a AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
2020-08-15 12:12:01 -04:00
Matt Arsenault 47af1ac69a AMDGPU: Correct definitions for global saddr instructions
The VGPR component is a 32-bit offset, not 64-bits.

I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
2020-08-15 12:11:57 -04:00
Matt Arsenault 79298a5067 AMDGPU: Remove SIFixupVectorISel pass
This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.
2020-08-15 12:11:51 -04:00
Stanislav Mekhanoshin 43a38dc251 [AMDGPU] Fix MAI ld/st hazard handling
It did not process hazard for ds_permute because it does not
load or store even though it is DS.

Differential Revision: https://reviews.llvm.org/D86003
2020-08-14 17:07:37 -07:00
Craig Topper c7a0b2684f [X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165
2020-08-14 15:31:50 -07:00
Matt Arsenault 40a142fa57 AMDGPU/GlobalISel: Match andn2/orn2 for more types
Unfortunately this ends up not working as expected on targets with
16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform
16-bit ops to i32.

The vector case annoyingly requires switching the checked opcode,
since constants for vectors aren't directly handled.

I also need to think more carefully about whether this is valid for i1.
2020-08-14 13:18:03 -04:00
Sebastian Neubauer 9aa0ff77bd [AMDGPU] Enable .rodata for amdpal os
PAL recently got support for multiple ELF sections and relocations,
therefore we can now use .rodata sections instead of forcing constants
into .text.

Differential Revision: https://reviews.llvm.org/D85895
2020-08-14 09:05:48 +02:00
Austin Kerbow 7d1cb187fb [AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded
If we need a scratch register for the spill don't use the same scratch
register that is being used for the MBUF offset.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85772
2020-08-13 14:12:00 -07:00
Stanislav Mekhanoshin 0462aef5f3 [AMDGPU] Inhibit SDWA if target instruction has FI
Differential Revision: https://reviews.llvm.org/D85918
2020-08-13 11:34:28 -07:00
Stanislav Mekhanoshin d25cb5a8a2 [AMDGPU] Fix misleading SDWA verifier error. NFC.
The old error from GFX9 shall be updated to GFX9+.
2020-08-13 11:32:17 -07:00
Carl Ritson d538c5837a [AMDGPU] Fix missed SI_RETURN_TO_EPILOG in pre-emit peephole
SIPreEmitPeephole does not process all terminators, which means
it can fail to handle SI_RETURN_TO_EPILOG if immediately preceeded
by a branch to the early exit block.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D85872
2020-08-13 21:52:41 +09:00
Ruiling Song 18b1e67523 [AMDGPU] Fix crash when dag-combining bitcast
From the code after the 'break', they are processing 64bit scalar and
vector bitcast. So I think the break-condition should be (cond1 || cond2)
This means we only execute following code if (64bit and dest-is-vector).

Also remove a previous fix which is not needed with this new fix.
(introduced in: 1349a04ef5)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85804
2020-08-13 10:23:13 +08:00
Matt Arsenault e14474a39a AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd
Remove the intermediate transform in the DAG path. I believe this is
the last non-deprecated intrinsic that needs handling.
2020-08-12 10:04:53 -04:00
Matt Arsenault 701228c411 AMDGPU: Handle intrinsics in performMemSDNodeCombine
This avoids a possible regression in a future patch
2020-08-12 10:04:53 -04:00
Matt Arsenault 0dc4c36d3a AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane
Fixup the special case constant bus handling pre-gfx10.
2020-08-11 11:56:16 -04:00
Matt Arsenault 076305568c AMDGPU/GlobalISel: Prepare for more custom load lowerings
Slight restructuring of the code to avoid formatting changes when more
cases are handled here.
2020-08-11 11:09:05 -04:00
Matt Arsenault e2f1b48f86 GlobalISel: Implement bitcast action for G_INSERT_VECTOR_ELT
This mirrors the support for the equivalent extracts. This also
creates a huge mess that would be greatly improved if we had any bit
operation combines.
2020-08-11 10:39:14 -04:00
Matt Arsenault 53f21e0fb7 TableGen/GlobalISel: Hack the operand order for atomic_store
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.

Some work started in 20aafa3156 to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.

Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
2020-08-11 10:22:44 -04:00
Kerry McLaughlin 85c7e89f3b [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00