Commit Graph

3403 Commits

Author SHA1 Message Date
Austin Kerbow eab9a4f119 [AMDGPU] Don't assert on partial exec copy
After Machine CSE and coalescing we can end up with copies of exec to
subregister SGPRs.

Differential Revision: https://reviews.llvm.org/D77992
2020-04-12 21:14:36 -07:00
Matt Arsenault 96819011ca AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts
These need to be promoted and scalarized for the SALU.
2020-04-11 20:55:33 -04:00
Matt Arsenault ac8d51a3c6 AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16
The current selector depends on 16-bit shifts using 16-bit shift
amount types, but really it should accept either for all types.
2020-04-11 18:12:26 -04:00
Matt Arsenault c5497e5399 AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects 2020-04-11 15:59:51 -04:00
Matt Arsenault cf29333f40 AMDGPU/GlobalISel: Work around forming illegal zextload after legalize
Selection would fail after the post legalize combiner put an illegal
zextload back together.

The base combiner has parameter to only allow legal operations, but
they appear to not be used. I also don't see a nice way to remove a
single entry from all_combines, so just hack around this.
2020-04-11 10:52:58 -04:00
Matt Arsenault 49ae0fc2f0 GlobalISel: Fix incorrect lowering G_FCOPYSIGN
In the basic case, this was reading the sign from the wrong operand.
2020-04-10 21:00:25 -04:00
Stanislav Mekhanoshin 44920e8566 [AMDGPU] Disable sub-dword scralar loads IR widening
These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.

Differential Revision: https://reviews.llvm.org/D77835
2020-04-10 08:20:49 -07:00
Simon Pilgrim 54502476e7 [AMDGPU] Refresh fmin_legacy.ll checks to fix issue reported on D77354
Some of the condition codes had inverted since this was generated
2020-04-08 17:05:29 +01:00
Simon Pilgrim 7e62684251 [AMDGPU] Regenerate vector-extract-insert test checks to fix issue reported on D77354 2020-04-08 13:18:32 +01:00
Simon Pilgrim 98181a1f98 [AMDGPU] Regenerate si-annotate-cfg-loop-assert test checks to fix issue reported on D77354 2020-04-08 13:09:08 +01:00
Dominik Montada c8393240ab [GlobalISel] combine trunc(trunc) pattern
Summary:
Legalization can introduce the trunc(trunc) pattern. This can cause
problems if one of these intermediate truncs is not legal.
Combine truncs of this pattern, if the resulting trunc is legal.

Reviewers: arsenm, aemerson, dsanders

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76601
2020-04-08 11:58:28 +02:00
Dominik Montada 432720f1c4 [GlobalISel] Combine sext([sz]ext) -> [sz]ext, zext(zext) -> zext
Summary:
Combine sext(zext x) to (zext x) since the sign-bit is 0
after the zero-extension.

Combine sext(sext x) to (sext x) and ext(zext x) to (zext x)
since the intermediate step is not needed.

Reviewers: arsenm, volkan, aemerson, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77210
2020-04-08 11:24:29 +02:00
Dominik Montada 35950fea8d [GlobalISel] support narrow G_IMPLICIT_DEF for DstSize % NarrowSize != 0
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.

To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.

Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.

Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76598
2020-04-08 11:00:07 +02:00
Stanislav Mekhanoshin f96810ff34 [AMDGPU] Expand vector trunc stores from i16 to i8
Differential Revision: https://reviews.llvm.org/D77693
2020-04-07 21:47:45 -07:00
Stanislav Mekhanoshin 96e51ed005 [AMDGPU] Implement copyPhysReg for 16 bit subregs
Differential Revision: https://reviews.llvm.org/D74937
2020-04-07 14:22:46 -07:00
Graham Sellers a19a56f6a1 [AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
2020-04-07 14:37:16 -04:00
Matt Arsenault f524194ffd AMDGPU: Cleanup test MIR 2020-04-07 14:09:25 -04:00
Stanislav Mekhanoshin 12a324393d [AMDGPU] Limit endcf-collapase to simple if
We can only collapse adjacent SI_END_CF if outer statement
belongs to a simple SI_IF, otherwise correct mask is not in the
register we expect, but is an argument of an S_XOR instruction.

Even if SI_IF is simple it might be lowered using S_XOR because
lowering is dependent on a basic block layout. It is not
considered simple if instruction consuming its output is
not an SI_END_CF. Since that SI_END_CF might have already been
lowered to an S_OR isSimpleIf() check may return false.

This situation is an opportunity for a further optimization
of SI_IF lowering, but that is a separate optimization. In the
meanwhile move SI_END_CF post the lowering when we already know
how the rest of the CFG was lowered since a non-simple SI_IF
case still needs to be handled.

Differential Revision: https://reviews.llvm.org/D77610
2020-04-07 10:27:23 -07:00
Stanislav Mekhanoshin 9f09550c50 [AMDGPU] Remove clutter from endcf test. NFC. 2020-04-06 16:32:21 -07:00
Konstantin Pyzhov 72e8754916 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 09:05:58 -04:00
Konstantin Pyzhov 51dc028314 Revert e1730cfeb3 2020-04-06 05:56:11 -04:00
Konstantin Pyzhov e1730cfeb3 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 05:10:37 -04:00
Jonathan Roelofs 7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Matt Arsenault 8a5f0dafd4 AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale 2020-04-06 11:50:19 -04:00
Matt Arsenault e87ec66762 AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll 2020-04-06 11:50:16 -04:00
Matt Arsenault 08772f1742 AMDGPU/GlobalISel: Add unmerge of concat tests 2020-04-06 11:03:55 -04:00
Matt Arsenault 70726cec5b DAG: Combine extract_vector_elt of concat_vectors
Fixes extra canonicalize regressions when legalizing
vector fminnum/fmaxnum.
2020-04-06 09:26:29 -04:00
Matt Arsenault 9620fe02df AMDGPU/GlobalISel: Add some G_INSERT/G_EXTRACT select tests 2020-04-05 10:54:24 -04:00
Matt Arsenault 6bfe28e92f AMDGPU: Fix annotate kernel features through casted calls
I thought I was testing this before, but the workitem id x
case isn't great since it's mandatory in the parent kernel.
2020-04-04 20:44:44 -04:00
Matt Arsenault b801577c59 AMDGPU: Fix a few more tests with old denormal subtarget features 2020-04-03 23:42:13 -04:00
Stanislav Mekhanoshin 8c5dc084e5 [AMDGPU] Added label to test. NFC. 2020-04-03 11:36:32 -07:00
Stanislav Mekhanoshin 0462795095 [AMDGPU] Propagate AGPR RC from PHI to its PHI operands
We can fix register class of PHI based on its all AGPR uses.
That leaves behind all PHIs which were already processed
earlier. Propagate RC back to PHI operands of a PHI.

Differential Revision: https://reviews.llvm.org/D77344
2020-04-03 11:23:02 -07:00
Jay Foad c7e1fc8496 [AMDGPU] Fix CHECK lines 2020-04-03 10:07:21 +01:00
Austin Kerbow 30f18ed387 [AMDGPU] Handle SMRD signed offset immediate
Summary:
This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a
signed byte offset immediate, however we can overflow into a negative since we
treat it as unsigned.

Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We
sometimes tried to use negative values here.

Third, S_BUFFER instructions should never use a signed offset immediate.

Differential Revision: https://reviews.llvm.org/D77082
2020-04-02 17:41:52 -07:00
Matt Arsenault 2680e88069 AMDGPU: Fix broken check lines 2020-04-02 18:52:49 -04:00
Matt Arsenault f68cc2a7ed AMDGPU: Use 128-bit DS operations by default 2020-04-02 17:17:47 -04:00
Matt Arsenault 192cccb152 AMDGPU: Add some tests for exotic denormal mode combinations 2020-04-02 17:17:12 -04:00
Matt Arsenault 5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Matt Arsenault 75cf30918f AMDGPU: Assume f32 denormals are enabled by default
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
2020-04-02 17:17:12 -04:00
Matt Arsenault c3d3c22a58 AMDGPU: Hack out noinline on functions using LDS globals
This is a workaround for clang adding noinline to all functions at
-O0. Previously, we would just add alwaysinline, and the verifier
would complain about having both noinline and alwaysinline. We
currently can't truly codegen this case as a freestanding function, so
override the user forcing noinline.
2020-04-02 14:12:07 -04:00
Stanislav Mekhanoshin f2334a7ef2 [AMDGPU] Fix crash in SILoadStoreOptimizer
SILoadStoreOptimizer::checkAndPrepareMerge() expects base and
paired instruction to come in order and scans MBB from base to
the paired instruction. An original order can be changed if
there were a dependent instruction in between and base instruction
was moved.

Fixed by bailing the optimization. In theory it might be possible
still to perform a merge by swapping instructions, but on practice
it bails anyway because it finds dependency on that same instruction
which has resulted in the base move.

Differential Revision: https://reviews.llvm.org/D77245
2020-04-02 10:26:47 -07:00
Matt Arsenault 68e283940a AMDGPU/GlobalISel: Switch test to checking final ISA
The naming convention is for unprefixed .ll tests to check the final
ISA instructions.
2020-04-01 13:03:02 -04:00
Matt Arsenault 5e4e8d0388 AMDGPU/GlobalISel: Change intrinsic ID for _L to _LZ opt
Still should handle the other case changes the opcode this way.
2020-04-01 13:03:02 -04:00
Matt Arsenault 43e576593e AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD 2020-03-31 19:57:06 -04:00
Stanislav Mekhanoshin 08682dcc86 [AMDGPU] Define 16 bit VGPR subregs
We have loads preserving low and high 16 bits of their
destinations. However, we always use a whole 32 bit register
for these. The same happens with 16 bit stores, we have to
use full 32 bit register so if high bits are clobbered the
register needs to be copied. One example of such code is
added to the load-hi16.ll.

The proper solution to the problem is to define 16 bit subregs
and use them in the operations which do not read another half
of a VGPR or preserve it if the VGPR is written.

This patch simply defines subregisters and register classes.
At the moment there should be no difference in code generation.
A lot more work is needed to actually use these new register
classes. Therefore, there are no new tests at this time.

Register weight calculation has changed with new subregs so
appropriate changes were made to keep all calculations just
as they are now, especially calculations of register pressure.

Differential Revision: https://reviews.llvm.org/D74873
2020-03-31 11:49:06 -07:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Matt Arsenault db9f0d1ce5 AMDGPU: Form v_cvt_ubyte* with f16 results
We get 2 conversion instructions anyway. Previously we would get a
conversion with SDWA reading from a byte source, which has a larger
encoding.
2020-03-30 17:59:49 -04:00
Matt Arsenault b27d255e1e AMDGPU/GlobalISel: Form CVT_F32_UBYTE0 2020-03-30 17:45:55 -04:00
Matt Arsenault bcb643c8af AMDGPU/GlobalISel: Handle image atomics 2020-03-30 17:41:04 -04:00
Matt Arsenault 48eda37282 AMDGPU/GlobalISel: Start selecting image intrinsics
Does not handled atomics yet.
2020-03-30 17:33:04 -04:00