Commit Graph

5333 Commits

Author SHA1 Message Date
Matt Arsenault cc2e2b80a1 AMDGPU: Update test checks to include -NEXT 2022-03-31 16:30:01 -04:00
Thomas Symalla 1a6aa8b195 [AMDGPU] Add missing use check in SIOptimizeExecMasking pass.
Whenever a v_cmp, s_and_saveexec instruction sequence shall be
transformed to an equivalent s_mov, v_cmpx sequence, it needs
to be detected if the v_cmp target register is used between
the two instructions as the v_cmp result gets omitted by
using the v_cmpx instruction, resulting in invalid code.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D122797
2022-03-31 19:25:35 +02:00
Abinav Puthan Purayil 898d5776ec [AMDGPU][GlobalISel] Scalarize add/sub with overflow ops in the legalizer
Differential Revision: https://reviews.llvm.org/D122803
2022-03-31 21:46:34 +05:30
Abinav Puthan Purayil db17ebd593 [AMDGPU][GlobalISel] Add end to end IR tests for add/sub with overflow
Differential Revision: https://reviews.llvm.org/D122818
2022-03-31 21:46:34 +05:30
Jay Foad e8e32e5714 [AMDGPU] Fix typo in RUN line 2022-03-31 16:23:40 +01:00
Changpeng Fang 1711020c37 AMDGPU: Use isLiteralConstantLike to check whether the operand could ever be literal
Summary:
  To compute the size of a VALU/SALU instruction, we need to check whether an operand
could ever be literal. Previously isLiteralConstant was used, which missed cases
like global variables or external symbols. These misses lead to under-estimation of
the instruction size and branch offset, and thus incorrectly skip the necessary branch
relaxation when the branch offset is actually greater than what the branch bits can hold.
In this work, we use isLiteralConstantLike to check the operands. It maybe conservative,
but it is safe.

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D122778
2022-03-31 08:06:31 -07:00
Jay Foad fdaf606c8e [AMDGPU] Fix last remaining checks in perfhint.ll
Unfortunately this just shows that the test case for D47740 never
really tested what it was supposed to test.

Differential Revision: https://reviews.llvm.org/D122664
2022-03-31 13:39:15 +01:00
Abinav Puthan Purayil 2f284b0ff9 [AMDGPU] Regenerate checks in some mir tests 2022-03-31 17:49:00 +05:30
Stanislav Mekhanoshin f311f934e1 [AMDGPU] gfx940 VALU hazard recognizer
Differntial Revision: https://reviews.llvm.org/D122339
2022-03-29 10:57:54 -07:00
Jay Foad 2b754384ad [AMDGPU] Generate checks in atomic_optimizations_*.ll
This had already been done for some of these files but not all.
2022-03-29 11:05:23 +01:00
Thomas Symalla 3bd15c03c6 [AMDGPU] Fix adding modifiers when creating v_cmpx instructions.
Revision https://reviews.llvm.org/D122332 added a pattern transformation
where v_cmpx instructions are introduced. However, the modifiers are
not correctly inherited from the original operands. The patch
adds the source modifiers, if they are exist, or sets them to 0.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D122489
2022-03-28 17:52:53 +02:00
Carl Ritson 1f52d02ceb [AMDGPU] Split waterfall loop exec manipulation
Split waterfall loops into multiple blocks so that exec mask
manipulation (s_and_saveexec) does not occur in the middle of
a block.

VGPR live range optimizer is updated to handle waterfall loops
spanning multiple blocks.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D122200
2022-03-28 17:44:54 +09:00
Johannes Doerfert a81fff8afd Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes"
This reverts commit c5f789050d and
reapplies 7aea3ea8c3 with additional test
changes.
2022-03-25 09:36:50 -05:00
Thomas Symalla 718aec209c [AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Same as D119696 including a buildbot and MIR test fix.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D122332
2022-03-25 11:40:18 +01:00
Stanislav Mekhanoshin cad9de71d7 [AMDGPU] gfx940 MAI hazard recognizer
Differential Revision: https://reviews.llvm.org/D122263
2022-03-24 12:59:52 -07:00
Stanislav Mekhanoshin 6e3e14f600 [AMDGPU] Support gfx940 smfmac instructions
Differential Revision: https://reviews.llvm.org/D122191
2022-03-24 12:40:42 -07:00
Stanislav Mekhanoshin 27439a7642 [AMDGPU] New gfx940 mfma instructions
Differential Revision: https://reviews.llvm.org/D122044
2022-03-24 12:12:52 -07:00
Austin Kerbow 1e15adba62 [AMDGPU] Add s_nop WaitStates between neighboring mfma
In some cases padding bubbles between sequential MFMA instructions may
lead to increased inter-wave performance. Add option to request to pad
some portion of these stall cycles with s_nops.

Fixes: SWDEV-326925

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D121437
2022-03-23 13:56:09 -07:00
hsmahesha f5b6866d7e [AMDGPU] Add missing testcase for SGPR to AGPR copy
and, also update the function indirectCopyToAGPR() to ensure that it is called only on GFX908 sub-target.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D122286
2022-03-23 21:38:04 +05:30
Stanislav Mekhanoshin 72c1a0d9c2 [AMDGPU] Allow v_accvgpr_write to use SGPR on gfx90a
This is undocumented, but it should work.

Differential Revision: https://reviews.llvm.org/D122252
2022-03-22 13:52:29 -07:00
Stanislav Mekhanoshin 631a643940 [AMDGPU] Update mfma test to run gfx940 checks. NFC. 2022-03-22 12:43:57 -07:00
alex-t 0a488cba2c [AMDGPU] use scalar shift for SALU users in frame index elimination
In the frame index lowering we have to insert shift and add
instructions to adjust stack object access.  We need to take care of the stack
object user kind and use scalar shift/add for scalar users.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D121524
2022-03-22 11:43:23 +01:00
Carl Ritson 8e64d84995 [MachineSink] Check block prologue interference
Sinking must check for interference between the block prologue
and the instruction being sunk.
Specifically check for clobbering of uses by the prologue, and
overwrites to prologue defined registers by the sunk instruction.

Reviewed By: rampitec, ruiling

Differential Revision: https://reviews.llvm.org/D121277
2022-03-22 11:15:37 +09:00
alex-t a0ea7ec90f [AMDGPU] divergence patterns for the BUILD_VECTOR i16, undef expansion.
BUILD_VECTOR of i16 and undef gets expanded to the COPY_TO_REGCLASS.
         The latter is further lowererd to the copy instructions.
	 We need to provide the correct register class for the uniform and divergent BUILD_VECTOR nodes
	 to avoid VGPR to SGPR copies.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D122068
2022-03-21 21:11:20 +01:00
Jay Foad 321c8ab81b [AMDGPU] Add an agpr copy propagation test 2022-03-21 11:42:57 +00:00
Jay Foad 692341e998 [AMDGPU] Update checks in agpr-copy-propagation.mir 2022-03-21 11:42:56 +00:00
Thomas Symalla 7de6107dce Revert "[AMDGPU] Improve v_cmpx usage on GFX10.3."
This reverts commit 011c64191e and
e725e2afe0.

Differential Revision: https://reviews.llvm.org/D122117
2022-03-21 09:50:44 +01:00
Thomas Symalla 011c64191e [AMDGPU] Improve v_cmpx usage on GFX10.3.
On GFX10.3 targets, the following instruction sequence

v_cmp_* SGPR, ...
s_and_saveexec ..., SGPR

leads to a fairly long stall caused by a VALU write to a SGPR and having the
following SALU wait for the SGPR.

An equivalent sequence is to save the exec mask manually instead of letting
s_and_saveexec do the work and use a v_cmpx instruction instead to do the
comparison.

This patch modifies the SIOptimizeExecMasking pass as this is the last position
where s_and_saveexec instructions are inserted. It does the transformation by
trying to find the pattern, extracting the operands and generating the new
instruction sequence.

It also changes some existing lit tests and introduces a few new tests to show
the changed behavior on GFX10.3 targets.

Reviewed By: sebastian-ne, critson

Differential Revision: https://reviews.llvm.org/D119696
2022-03-21 09:31:59 +01:00
Stanislav Mekhanoshin d898c9563e [AMDGPU] Add gfx940 run line to gfx90a mfma test. NFC. 2022-03-18 15:23:47 -07:00
Abinav Puthan Purayil aee3684995 [AMDGPU] Use COPY_TO_REGCLASS for buffer_atomic_cmpswap selection
GlobalISel was selecting the av_* regclass for some cases.

Differential Revision: https://reviews.llvm.org/D121933
2022-03-18 08:56:23 +05:30
Stanislav Mekhanoshin 275b0c5a5a [AMDGPU] Add 2 gfx940 mfma tests. NFC. 2022-03-17 15:47:13 -07:00
Changpeng Fang dd5895cc39 AMDGPU: Use the implicit kernargs for code object version 5
Summary:
  Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265
2022-03-17 14:12:36 -07:00
Stanislav Mekhanoshin 522b259976 [AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940
Differential Revision: https://reviews.llvm.org/D121843
2022-03-17 12:12:06 -07:00
Vang Thao 27e1931508 [AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs
When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these.

Differential Revision: https://reviews.llvm.org/D121874
2022-03-17 11:38:53 -07:00
Matt Arsenault 8d66603a48 Revert "RegAllocGreedy: Fix last chance recolor assert in impossible case"
This reverts commit c46aab01c0.

This evidently blocks compiling in some cases that used to work
before. I'm also not fully convinced this is the correct place to fix
this problem.
2022-03-17 13:12:01 -04:00
Abinav Puthan Purayil f59cb41ba1 [AMDGPU] Select buffer_atomic_cmpswap* in tblgen
This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFInstructions.td. This allows us to select the return and
no-return variants in tblgen.

Differential Revision: https://reviews.llvm.org/D121770
2022-03-17 10:12:32 +05:30
Christudasan Devadasan 6dd21d1db1 [AMDGPU][SIFoldOperands] Consider the alignment constraints
Enforced an alignment check while folding the operands.
2022-03-17 08:27:53 +05:30
Christudasan Devadasan af717d4aca [AMDGPU][MachineVerifier] Alignment check for fp32 packed math instructions
The fp32 packed math instructions are introduced in gfx90a.
If their vector register operands are not properly aligned, the
verifier should flag them. Currently, the verifier failed to
report it and the compiler ended up emitting a broken assembly.
This patch fixes that missed case in TII::verifyInstruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D121794
2022-03-17 08:21:35 +05:30
Florian Hahn e5822ded56
[FunctionAttrs] Infer argmemonly .
This patch adds initial argmemonly inference, by checking the underlying
objects of locations returned by MemoryLocation.

I think this should cover most cases, except function calls to other
argmemonly functions.

I'm not sure if there's a reason why we don't infer those yet.

Additional argmemonly can improve codegen in some cases. It also makes
it easier to come up with a C reproducer for 7662d1687b (already fixed,
but I'm trying to see if C/C++ fuzzing could help to uncover similar
issues.)

Compile-time impact:
NewPM-O3: +0.01%
NewPM-ReleaseThinLTO: +0.03%
NewPM-ReleaseLTO+g: +0.05%

https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D121415
2022-03-16 10:24:33 +00:00
Nikita Popov 57d57b1afd [AAEval] Make compatible with opaque pointers
With opaque pointers, we cannot use the pointer element type to
determine the LocationSize for the AA query. Instead, -aa-eval
tests are now required to have an explicit load or store for any
pointer they want to compute alias results for, and the load/store
types are used to determine the location size.

This may affect ordering of results, and sorting within one result,
as the type is not considered part of the sorted string anymore.

To somewhat minimize the churn, printing still uses faux typed
pointer notation.
2022-03-16 10:02:11 +01:00
Joe Nash 687d20de7f [AMDGPU] Regen checks again no-remat-indirect-mov
NFC. Update script does not behave right since the run lines have
identical output. Delete the duplicated check prefix added in
22cfbf7eca
2022-03-15 13:44:41 -04:00
Joe Nash 4cf86bd744 [AMDGPU] Regen checks for schedule-barrier
NFC. Hasn't been updated since script added check-next
2022-03-15 13:35:43 -04:00
Joe Nash 22cfbf7eca [AMDGPU] Regen checks for no-remat-indirect-mov
NFC. Hasn't been updated since the update script started adding
check-next.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D121719
2022-03-15 13:33:42 -04:00
Stanislav Mekhanoshin c4500de255 [AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin 1f53f20fc1 [AMDGPU] Support gfx940 v_lshl_add_u64 instruction
Differential Revision: https://reviews.llvm.org/D121401
2022-03-14 15:45:42 -07:00
Stanislav Mekhanoshin 36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Stanislav Mekhanoshin 47bac63d3f [AMDGPU] gfx940 memory model
Differential Revision: https://reviews.llvm.org/D121242
2022-03-14 15:01:46 -07:00
Stanislav Mekhanoshin 72a9e5f891 [AMDGPU] Restrict machine copy propagation from creating unaligned classes
Fixes: SWDEV-326366

Differential Revision: https://reviews.llvm.org/D121491
2022-03-14 14:09:40 -07:00
Austin Kerbow 62bcfcb5a5 [AMDGPU] Add llvm.amdgcn.s.setprio intrinsic
Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D120976
2022-03-12 22:15:42 -08:00
Stanislav Mekhanoshin 31f215ab0c [AMDGPU] Support v_mov_b64 in dpp combine
Differential Revision: https://reviews.llvm.org/D121411
2022-03-11 11:37:32 -08:00