Commit Graph

7180 Commits

Author SHA1 Message Date
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Simon Pilgrim 3cf48963ff [AMDGPU] Remove old isCheapToSpeculateCttz FIXME
As confirmed on D132520 - this should always return true
2022-08-24 15:53:38 +01:00
Pierre van Houtryve 59cf9dd923 [AMDGPU][GISel] Enable Selection of ADD3 for G_PTR_ADD
Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be
simplified into a single ADD3 instruction instead of two adds.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D131254
2022-08-24 14:44:19 +00:00
Alex Richardson 38107171ed [RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI
This commit moves the information on whether a register is constant into
the Tablegen files to allow generating the implementaiton of
isConstantPhysReg(). I've marked isConstantPhysReg() as final in this
generated file to ensure that changes are made to tablegen instead of
overriding this function, but if that turns out to be too restrictive,
we can remove the qualifier.

This should be pretty much NFC, but I did notice that e.g. the AMDGPU
generated file also includes the LO16/HI16 registers now.

The new isConstant flag will also be used by D131958 to ensure that
constant registers are marked as call-preserved.

Differential Revision: https://reviews.llvm.org/D131962
2022-08-24 14:16:20 +00:00
Jay Foad 1bca81c12e [AMDGPU] Remove unused S_ADD_U64_CO_PSEUDO and S_SUB_U64_CO_PSEUDO 2022-08-24 10:28:35 +01:00
Raghav 79d2529c10 AMDGPU/MetaData: Restrict address space key to only be emitted for "global_buffer" and "dynamic_shared_pointer"
This matches .address_space docs at https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v3

Differential Revision: https://reviews.llvm.org/D132145
2022-08-23 14:01:01 -04:00
Thomas Symalla 5ee0fb7ed2 [NFC][AMDGPU] Some cleanups in the SIOptimizeExecMasking pass.
Fix typos and remove an unused argument.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132292
2022-08-23 18:16:47 +02:00
Philip Reames 104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata 8b1b0d1d81 Revert "Use std::is_same_v instead of std::is_same (NFC)"
This reverts commit c5da37e42d.

This patch seems to break builds with some versions of MSVC.
2022-08-20 23:00:39 -07:00
Kazu Hirata c5da37e42d Use std::is_same_v instead of std::is_same (NFC) 2022-08-20 22:36:26 -07:00
Thomas e565e2fa5c [NFC][AMDGPU] Fix typo. 2022-08-20 08:30:42 +02:00
Austin Kerbow b0f4678b90 [AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.

Reviewed By: jrbyrnes

Differential Revision: https://reviews.llvm.org/D132079
2022-08-19 15:38:36 -07:00
jeff 20cf170e68 [InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics
Certain address space dependent optimizations, like SeperateConstOffsetFromGEP, assume agreement between the address space of the recursive uses and the address space of the def. If this assumption is invalid, then optimizations may or may not be correct depending on properties of an address space for a given target, the address spaces of recursive uses, and the optimization being done.

This patch infers the previous address space for flat_atomic ptr arguments. As a result, the address spaces of the uses in flat_atomic cases will agree with the address space in recursive defs. If this results in non-flat address space, then isel may infer a different intrinsic. For example, if the result is a flat_atomic using global address space, then it will be lowered to the corresponding global_atomic intrinsic.

Change-Id: Ifcd981709dc2ea94d4acbcb84efe7176593ec8c7
2022-08-19 11:37:20 -07:00
Joe Nash 063ee26ea3 [AMDGPU] Update comment on shrinking dpp. NFC 2022-08-18 11:29:32 -04:00
Jeffrey Byrnes 1c8d7ea973 [AMDGPU] Implement pipeline solver for non-trivial pipelines
Requested SchedGroup pipelines may be non-trivial to satisify. A minimimal example is if the requested pipeline is {2 VMEM, 2 VALU, 2 VMEM} and the original order of SUnits is {VMEM, VALU, VMEM, VALU, VMEM}. Because of existing dependencies, the choice of which SchedGroup the middle VMEM goes into impacts how closely we are able to match the requested pipeline. It seems minimizing the degree of misfit (as measured by the number of edges we can't add) w.r.t the choice we make when mapping an instruction -> SchedGroup is an NP problem. This patch implements the PipelineSolver class which produces a solution for the defined problem for the sched_group_barrier mutation. The solver has both an exponential time exact algorithm and a greedy algorithm. The patch includes some controls which allows the user to select the greedy/exact algorithm.

Differential Revision: https://reviews.llvm.org/D130797
2022-08-17 16:21:59 -07:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
Ivan Kosarev 7a355e9027 [AMDGPU][MC][NFC] Refine SMEM store, probe and discard definitions.
Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D131968
2022-08-17 13:54:26 +01:00
Kazu Hirata 6d9cd9199a Use llvm::all_of (NFC) 2022-08-14 16:25:36 -07:00
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
David Stuttard 1d1cc05539 AMDGPU: mbcnt allow for non-zero src1 for known-bits
Src1 for mbcnt can be a non-zero literal or register. Take this into account
when calculating known bits.

Differential Revision: https://reviews.llvm.org/D131478
2022-08-11 13:23:43 +01:00
Evgenii Stepanov 8ea1cf3111 Revert "[AMDGPU] SIFixSGPRCopies refactoring"
Breaks ASan tests.

This reverts commit 3f8ae7efa8.
2022-08-10 11:32:46 -07:00
Venkata Ramanaiah Nalamothu 486594119d [AMDGPU] Fix prologue/epilogue markers in .debug_line table for trivial functions
All the prologue instructions should have unknown source location
co-ordinates while the epilogue instructions should have source
location of last non-debug instruction after which epilogue
instructions are insrted.

This ensures the prologue/epilogue markers are generated correctly
in the line table.

Changes are brought in from the downstream CFI patches.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D131485
2022-08-10 23:00:19 +05:30
alex-t 3f8ae7efa8 [AMDGPU] SIFixSGPRCopies refactoring
This change finalizes the series of patches aiming to replace old
strategy of VGPR to SGPR copies loweriong.  Following the
https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code
parts that are no longer used were removed.  Pass main loop is no longer used
for the MIR changes but collect information for further analysis.  Actual MIR
lowering happens further according the analysys result in the set of separate
functions. Another important change concerns the order of lowering: VGPR to
SGPR copies lowering is done first to have priority on the rest of the MIR
changes.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D131246
2022-08-10 00:51:57 +02:00
Yaxun (Sam) Liu e780648a15 [AMDGPU] Unify unreachable intrinsics
si-annotate-control-flow does depth first traversal of BB's of
a function to insert amdgcn if intrinsics for conditional
branches so that isel can generate correct instructions later.

si-annotate-control-flow checks whether the successor BB for the 'else'
branch of a conditional branch has been visited. If it has been
visited, si-annotate-control-flow assumes the conditional
branch has been handled and will not try to insert if intrinsic
for it.

This assumption is not correct when the IR contains multiple
unreachable BB's. Then 'if' intrinscs are not inserted and incorrect
ISA are generated.

This patch fixes the issue by let amdgpu-unify-divergent-exit-nodes
unify unreachables even if they are uniformly reached. In this way
the IR will not contain multiple exits, and structurizer is able to
structurize the IR containing one unified exit.

Reviewed by: Ruiling Song, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D131181

Fixes: SWDEV-343244
2022-08-09 10:23:32 -04:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Kazu Hirata e20d210eef [llvm] Qualify auto (NFC)
Identified with readability-qualified-auto.
2022-08-07 23:55:27 -07:00
Kazu Hirata ba0407ba86 [llvm] Use range-based for loops (NFC) 2022-08-07 00:16:21 -07:00
Kazu Hirata d0ec61c9ff [Target] Remove unused forward declarations (NFC) 2022-08-07 00:16:16 -07:00
Leon Clark 6a275cd53c Transform illegal intrinsics to V_ILLEGAL
Related tasks:

- SWDEV-240194
- SWDEV-309417
- SWDEV-334876

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123693
2022-08-06 08:59:00 +01:00
Mirko Brkusanin 19bb535ed9 [AMDGPU] Remove unused MIMG tablegen variants
There are no AMDGPUSampleVariant versions for _G16, it is treated more like a
modifier for derivatives (_D) (also for intrinsics where it is overloaded type
instead of part of instrinsic name) so we ended up making more variants for
these instruction then we actually needed.

32-bit derivatives need 6 dwords at most, while 16-bit need 4 at most. Using
same AMDGPUSampleVariant for both, we ended up creating 2 extra variants per
instruction than were necessary.

In total this deletes 260 unused tablegen records.

Differential Revision: https://reviews.llvm.org/D131252
2022-08-05 15:30:47 +02:00
Mingming Liu bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Dmitry Preobrazhensky 05b3aadfff [AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16
Enable SGPRs for the following operands of these opcodes:

- src operands of VOP3 variant.
- src2 operand of DPP variants.

Differential Revision: https://reviews.llvm.org/D130989
2022-08-03 15:08:23 +03:00
Dmitry Preobrazhensky ae553f9e49 [AMDGPU][MC][GFX10] Correct encoding of VOP3 v_cmpx* opcodes
Encode dst=EXEC but allow disassembler accept any dst value.

Differential Revision: https://reviews.llvm.org/D130978
2022-08-03 15:03:44 +03:00
Austin Kerbow 3dfa562643 [AMDGPU] Add CL option for max-ilp scheduler.
When compiling for multiple targets the scheduler that is selected via the
-misched option is applied globally. This patch adds a target CL option instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D131022
2022-08-02 16:52:14 -07:00
Austin Kerbow 40eec27618 [AMDGPU] Add llvm_unreachable to switch statement added in d7100b398. 2022-08-02 13:45:38 -07:00
Austin Kerbow d7100b398b [AMDGPU] Add GCNMaxILPSchedStrategy
Creates a new scheduling strategy that attempts to maximize ILP for a single
wave.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130869
2022-08-02 13:21:24 -07:00
Alexander Timofeev a321d95b59 [AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs
In the 2e29b0138c we introduce a specific solving algorithm
that analyzes the VGPR to SGPR copies use chains and either lowers
the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms.
Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs
in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs
are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert
copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues.
At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain
were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF
because of the weird mixing of SALU and VALU instructions.
This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact
that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs,
we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common
VGPR to SGPR lowering algorithm.
This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130367
2022-08-02 18:37:57 +02:00
Jay Foad e301e071ba [AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline
This pass seems to have very little effect because all it does is hoist
some instructions, but it is followed later in the codegen pipeline by
the IR CodeSinking pass which does the opposite.

Differential Revision: https://reviews.llvm.org/D130258
2022-08-02 17:35:20 +01:00
Jay Foad c24d68fff1 [AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress
This improves a corner case where v_fmac can be converted to v_fma on
GFX10+ even if it has a literal operand.

Differential Revision: https://reviews.llvm.org/D130992
2022-08-02 17:27:11 +01:00
Vang Thao 7fc52d7c8b [AMDGPU] Fix DGEMM hazard for GFX90a
For VALU write and memory (VM, L/DS, FLAT) instructions, SQ would insert
wait-states to avoid data hazard. However when there is a DGEMM instruction
in-between them, SQ incorrectly disables the wait-states thus the data hazard
needs to be handled with this workaround.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130677
2022-08-01 11:56:22 -07:00
Piotr Sobczak f29a19b0b8 [AMDGPU] Extend cases for ReadM0MovRelInterpHazard
Extend hazard recognizer of ReadM0MovRelInterpHazard with
DS_READ_ADDTID and DS_WRITE_ADDTID, as they also
require a manually inserted S_NOP after SALU writing m0.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D130783
2022-08-01 17:59:33 +02:00
Dmitry Preobrazhensky 3aae8cd842 [AMDGPU][MC] Verify selection of LDS MUBUF opcodes
Differential Revision: https://reviews.llvm.org/D130761
2022-08-01 16:44:39 +03:00
Dmitry Preobrazhensky bb901dcc5a [AMDGPU][MC][GFX940] Correct disassembly of MFMA opcodes
Add a decoder table for GFX940 MFMA opcodes.

Differential Revision: https://reviews.llvm.org/D130759
2022-08-01 16:00:47 +03:00
Pierre van Houtryve a847e3dc52 [NFC][AMDGPU] Fix typo in SIRegisterInfo.cpp 2022-08-01 07:01:33 -04:00
Petar Avramovic e8d260753e [AMDGPU] gfx11 allow dlc for MUBUF atomics
Add MC support for dlc in gfx11 MUBUF atomic instructions.

Differential Revision: https://reviews.llvm.org/D129075
2022-08-01 12:18:01 +02:00
Austin Kerbow 7898426a72 [AMDGPU] Remove unused function 2022-07-30 07:47:35 -07:00
Simon Pilgrim 49c0980eac Fix Wdocumentation warning. NFC.
warning: '\returns' command used in a comment that is attached to a function returning void
2022-07-30 15:41:13 +01:00
Simon Pilgrim 276480b1d3 [AMDGPU] Fix || vs && precedence warning. NFC. 2022-07-30 14:02:54 +01:00