Commit Graph

3244 Commits

Author SHA1 Message Date
Jay Foad 4917a9a965 [AMDGPU] Precommit some scheduler related test updates
Summary:
The point of this is to make some tests with manual checks robust
against scheduler tweaks, so that only autogenerated test updates will
be required when pushing D68338 "[AMDGPU] Remove dubious logic in
bidirectional list scheduler".

Reviewers: arsenm, rampitec, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75302
2020-02-28 11:20:58 +00:00
Stanislav Mekhanoshin 6b813f2762 [AMDGPU] Enable runtime unroll for LDS
We want to do unroll for LDS even for runtime trip count
to combine LDS operations.

Differential Revision: https://reviews.llvm.org/D75293
2020-02-27 12:59:35 -08:00
Sameer Sahasrabuddhe 0c8a218798 [AMDGPU] improve fragile test for divergent branches
Summary:
The affected LIT test intends to test the correct use of divergence
analysis to detect a divergent branch with a uniform predicate. The
passes involved are LLVM IR passes, but the test runs llc and tries to
match against generated ISA, which makes it hard to demonstrate that
the intended behavior was really tested. Replaced this with a test
that invokes opt on the required passes and then checks for the
appropriate changes in the LLVM IR.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D75267
2020-02-27 23:31:03 +05:30
Matt Arsenault 79493e721a AMDGPU/GlobalISel: Add missing test for G_UMULH 2020-02-26 22:30:13 -05:00
Matt Arsenault 6fc0d00823 GlobalISel: Fix lowering for G_UADDE/G_USUBE
The type parameter passed into lower is invalid and should be removed
from the function.
2020-02-26 19:10:52 -08:00
Matt Arsenault 6dcf43102c AMDGPU/GlobalISel: Add missing G_[US]ADDE/G_[US]SUBE tests
The s64 case currently crashes, so leave that for later.
2020-02-26 19:10:34 -08:00
Jay Foad 09a6b26753 AMDGPU: Fix some more incorrect check lines 2020-02-26 14:37:22 +00:00
Nicolai Hähnle 0f1df48925 AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible
Summary:
The old code made some incorrect assumptions about the order in which
basic blocks are laid out in a function. This could lead to incorrect
early-exits, especially when kills occurred inside of loops.

The new approach is to check whether the point where the conditional
kill occurs dominates all reachable code. If that is the case, there
cannot be any other threads in the wave that are waiting to rejoin
at a later point in the CFG, i.e. if exec=0 at that point, then all
threads really are dead and we can exit the wave.

Make some other minor cleanups to the pass while we're at it.

v2: preserve the dominator tree

Reviewers: arsenm, cdevadas, foad, critson

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74908

Change-Id: Ia0d2b113ac944ad642d1c622b6da1b20aa1aabcc
2020-02-26 15:30:42 +01:00
Jay Foad 80d7e473e0 AMDGPU: Fix some incorrect FUNC-LABEL checks 2020-02-26 09:43:13 +00:00
Jay Foad c66db21165 AMDGPU/GlobalISel: Un-XFAIL a test
This was missed in 12fe9b26ec
2020-02-25 16:46:46 +00:00
Matt Arsenault 86e13ec194 AMDGPU/GlobalISel: Use packed for G_ADD/G_SUB/G_MUL v2s16 2020-02-25 11:20:35 -05:00
Jay Foad 33cbd5ee08 AMDGPU/GlobalISel: Legalize s64 min/max by lowering
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75108
2020-02-25 16:00:43 +00:00
Jay Foad ab96ec41ea [AMDGPU] Precommit some test updates for D68338 "Remove dubious logic in bidirectional list scheduler" 2020-02-25 14:51:42 +00:00
Jay Foad dc78190811 AMDGPU/GlobalISel: add legalize tests for s64 max/min 2020-02-25 09:49:19 +00:00
Matt Arsenault fee41517fe AMDGPU/GlobalISel: Introduce post-legalize combiner
The current set of custom combines are only really useful after
legalization, so move them there. There is a lot of overlap in the
boilerplate here, but I think we do want a pretty different set of
combines before and after legalize. I think we will want a lot of
overlap between the post-legalize and a post-regbankselect combiner.
2020-02-24 22:12:12 -05:00
Matt Arsenault 0b46b078b6 AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding
We use some s32 values in VOP3P operands, and won't see any
intervening casts from a 32-bit fneg. Make sure it's really a packed
fneg before folding.
2020-02-24 21:20:35 -05:00
Matt Arsenault 11e3dde625 GlobalISel: Reimplement fewerElementsVectorBasic
Changes the handling of odd breakdowns, and avoids using
G_EXTRACT/G_INSERT. Pad with undef to a wider size, and unmerge. Also
avoid introducing instructions for the fully undef components.
2020-02-24 21:19:47 -05:00
Jay Foad 0ed4744bb5 AMDGPU/GlobalISel: Lower 64-bit uaddo/usubo
Summary: Add more test cases for signed and unsigned add/sub with overflow.

Reviewers: arsenm, rampitec, kerbowa

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75051
2020-02-24 23:08:14 +00:00
Mark Searles d3e170c438 Revert "[AMDGPU] Don’t marke the .note section as ALLOC"
This reverts commit 977cd661cf.

It breaks OpenCL testing. OpenCL Runtime is using PT_LOAD information
to calculate memory for global variables. This commit should be relanded once
the OpenCL runtime stops relying on PT_LOAD information for calculating global
variable memory size.

Differential Revision: https://reviews.llvm.org/D74995
2020-02-21 16:08:30 -08:00
Jay Foad b72f1448ce AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74987
2020-02-21 21:16:39 +00:00
Matt Arsenault 00955a62e4 AMDGPU/GlobalISel: Fix SALU mapping for v2s16 min/max
The legalizer helper functions are unusably awkward to perform the 3-5
part legalization. This needs to be widened, scalarized, lowered, and
we should avoid creating vector extends and truncates. Manually do all
of this and expand.
2020-02-21 14:02:16 -05:00
Matt Arsenault db06870dbd AMDGPU: Move dot intrinsic patterns to instruction def
I tried to use some of the new tablegen features to avoid creating
different operand list permutations, but I still don't see a way to
programmatically build a source pattern dag.

Also add GlobalISel tests, which now all import successfully.

Some of the fneg fold tests are incorrect, which need to be fixed in a
future commit
2020-02-21 13:35:40 -05:00
Matt Arsenault 4c1c9422a3 AMDGPU/GlobalISel: Select llvm.amdgcn.fdot2
I'm slighly worried about the generated checks, since they won't catch
incorrect modifiers being added at the end of the line.
2020-02-21 13:35:40 -05:00
Matt Arsenault dfce5fd50a AMDGPU/GlobalISel: Select VOP3P instructions
This only handles the basic cases. More work is needed to make better
use of op_sel.
2020-02-21 13:35:40 -05:00
Matt Arsenault 72eef820d5 AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR
G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel
for VOP3P instructions. Expand it in some other way in case it doesn't
fold into the use instructions.
2020-02-21 13:35:40 -05:00
Jay Foad cab39e4b8c GlobalISel: Fix narrowing of (G_ASHR i64:x, 32)
Reviewers: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74950
2020-02-21 16:51:03 +00:00
Matt Arsenault 6a479220b5 AMDGPU/GlobalISel: Commit test changes I forgot to squash
These should have been in ac7abe0ba9
2020-02-21 11:43:39 -05:00
Matt Arsenault 043ed2e22a AMDGPU/GlobalISel: Fix xnor matching
We should try the generated matchers before the manual selection. This
means the patterns are now handling the common cases, but the manual
selection code is not yet dead. It's still handling the non-s32/s64
cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice
way to have a single pattern that covers multiple types.
2020-02-21 11:42:49 -05:00
Matt Arsenault 89dc8fe622 AMDGPU/GlobalISel: Precommit xnor matching test 2020-02-21 11:09:59 -05:00
Matt Arsenault ac7abe0ba9 AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC
We have patterns for s_pack* selection, but they assume the inputs are
a build_vector with 16-bit inputs, not a truncating build
vector. Since there's still outstanding work for how to handle
mismatched result and source element vector operations, and since I'm
trying a different packed vector strategy than SelectionDAG, just
manually select this for now.
2020-02-21 10:34:11 -05:00
Matt Arsenault 79ff188add AMDGPU/GlobalISel: Legalize G_FPOW
There are few differences from the DAG handling. First, the DAG
handling uses a primitive selection pattern instead of custom
legalizing it. Because of this, this makes use of source modifiers
while the DAG does not.

Also instead of promoting f16, try to use the f16 log/exp. There's no
f16 fmul_legacy, so widen just for the multiply, although I'm not sure
that's the best solution.
2020-02-21 10:31:13 -05:00
Matt Arsenault fab4cdea39 AMDGPU/GlobalISel: Select llvm.amdgcn.fmul.legacy 2020-02-21 10:30:26 -05:00
Matt Arsenault b64aa8c715 AMDGPU/GlobalISel: Fix constant bus violation with source modifiers
This looked through copies to find the source modifiers, which may
have been SGPR->VGPR copies added to avoid potential constant bus
violations. Re-insert a copy to a VGPR if this happens.
2020-02-21 10:30:23 -05:00
Nicolai Hähnle 32e4e71966 test/CodeGen/AMDGPU: Add a test case that shows a miscompilation
Related to https://reviews.llvm.org/D74908

Change-Id: I6ebf3b5c7a32493016994f30d6796c41e95aecde
2020-02-21 13:38:24 +01:00
Simon Pilgrim fc2b4a02b1 [DAGCombine] visitEXTRACT_VECTOR_ELT - add SimplifyDemandedBits multi use support
Similar to what we already do with SimplifyDemandedVectorElts, call SimplifyDemandedBits across all the extracted elements of the source vector, treating it as single use.

There's a minor regression in store-weird-sizes.ll which will be addressed in an upcoming SimplifyDemandedBits patch.
2020-02-20 15:49:38 +00:00
Matt Arsenault 083717cf49 AMDGPU: Fix v2i64<->v4f32 bitcast
I'm not sure how to test the v2i64->v4f32 case since I can't think of
any v2i64 cases that won't legalize to v4i32.
2020-02-20 09:49:09 -05:00
Sebastian Neubauer 977cd661cf [AMDGPU] Don’t marke the .note section as ALLOC
Marking a section as ALLOC tells the ELF loader to load the section into memory.
As we do not want to load the notes into VRAM, the flag should not be there.

Differential Revision: https://reviews.llvm.org/D74600
2020-02-20 15:14:48 +01:00
Simon Pilgrim 6085593c12 [AMDGPU] simplifyI24 - replace GetDemandedBits with SimplifyMultipleUseDemandedBits
GetDemandedBits mostly just calls SimplifyMultipleUseDemandedBits now, but it does a very blunt constant simplification that SimplifyMultipleUseDemandedBits avoids.

If we need to demand bits from constants we should handle this through ShrinkDemandedConstant/targetShrinkDemandedConstant.

@arsenm confirmed that the sign extended immediates are better for code size.

Differential Revision: https://reviews.llvm.org/D74857
2020-02-20 12:03:08 +00:00
dfukalov dbfc682e2b SpeculativeExecution: fixed ingoring free execution
Summary:
After updating cost model in AMDGPU target (47a5c36b37) the pass started to
ignore some BBs since they got all instructions estimated as free.

Reviewers: arsenm, chandlerc, nhaehnle

Reviewed By: nhaehnle

Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74825
2020-02-20 14:45:02 +03:00
Matt Arsenault 4bb0c8f91c AMDGPU: Enable integer division bypass
We probably want this, and I've meant to turn this on for a long
time. SC actually emits a special case to early-out for a 1
denominator, which perhaps should also be considered.
2020-02-19 17:50:19 -05:00
Matt Arsenault 0b6ead018a AMDGPU/GlobalISel: Cleanup min/max RegBankSelect tests
Use common check prefix, although update_mir_test_checks makes this
unnecessarily annoying. Also make sure to have uses in case that ever
ends up mattering.
2020-02-19 17:32:25 -05:00
Simon Pilgrim 025ff5a4ea [AMDGPU] Regenerate immediate constant tests 2020-02-19 18:58:44 +00:00
Matt Arsenault ff4639f060 AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg
I'm not sure why this isn't a pattern, but the DAG manually selects
this.
2020-02-19 06:19:22 -08:00
Simon Pilgrim 4af8db317d [AMDGPU] performCvtF32UByteNCombine - add SHL and SimplifyMultipleUseDemandedBits support
This is part of the work to remove SelectionDAG::GetDemandedBits and just use SimplifyMultipleUseDemandedBits.

Recent experiments raised some v_cvt_f32_ubyte*_e32 regressions, so I've added some additional abilities to performCvtF32UByteNCombine to help unpack byte data more aggressively.

We still don't remove all OR(SHL,SRL) patterns as some of the regenerated nodes don't get combined again, but we are getting closer.

Differential Revision: https://reviews.llvm.org/D74786
2020-02-19 11:45:57 +00:00
Matt Arsenault 37c452a289 AMDGPU/GlobalISel: Adjust branch target when lowering loop intrinsic
This needs to steal the branch target like the other control flow
intrinsics.
2020-02-18 06:35:40 -08:00
Matt Arsenault 5e8792453d AMDGPU/GlobalISel: Fix RegBankSelect for G_SHUFFLE_VECTOR 2020-02-17 15:11:25 -05:00
Matt Arsenault f742a28ae3 AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM 2020-02-17 15:09:51 -05:00
Matt Arsenault e240b27d6d AMDGPU/GlobalISel: Allow arbitrary global values
Treat unknown address spaces as global
2020-02-17 11:32:28 -08:00
Matt Arsenault 54137bbaaf GlobalISel: Allow running localizer earlier
This required legal and regbankselected MIR for seemingly no
reason. For AMDGPU this wouldn't see legalized G_GLOBAL_VALUEs.
2020-02-17 11:24:06 -08:00
Matt Arsenault 96db12d507 AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM
AMDGPUCodeGenPrepare expands this most of the time, but not always. We
will always at least need a fallback option here. This is the 3rd
implementation of the same expansion in the backend. Eventually I
would like to eliminate the IR expansion (and the DAG version
obviously).

Currently the new legalizer path produces a better result, since the
IR expansion results in extra operations which need to be combined
out. Notably, the IR expansion results in multiplies by 0.
2020-02-17 11:05:50 -08:00