llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	4917a9a965	[AMDGPU] Precommit some scheduler related test updates Summary: The point of this is to make some tests with manual checks robust against scheduler tweaks, so that only autogenerated test updates will be required when pushing D68338 "[AMDGPU] Remove dubious logic in bidirectional list scheduler". Reviewers: arsenm, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75302	2020-02-28 11:20:58 +00:00
Stanislav Mekhanoshin	6b813f2762	[AMDGPU] Enable runtime unroll for LDS We want to do unroll for LDS even for runtime trip count to combine LDS operations. Differential Revision: https://reviews.llvm.org/D75293	2020-02-27 12:59:35 -08:00
Sameer Sahasrabuddhe	0c8a218798	[AMDGPU] improve fragile test for divergent branches Summary: The affected LIT test intends to test the correct use of divergence analysis to detect a divergent branch with a uniform predicate. The passes involved are LLVM IR passes, but the test runs llc and tries to match against generated ISA, which makes it hard to demonstrate that the intended behavior was really tested. Replaced this with a test that invokes opt on the required passes and then checks for the appropriate changes in the LLVM IR. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D75267	2020-02-27 23:31:03 +05:30
Matt Arsenault	79493e721a	AMDGPU/GlobalISel: Add missing test for G_UMULH	2020-02-26 22:30:13 -05:00
Matt Arsenault	6fc0d00823	GlobalISel: Fix lowering for G_UADDE/G_USUBE The type parameter passed into lower is invalid and should be removed from the function.	2020-02-26 19:10:52 -08:00
Matt Arsenault	6dcf43102c	AMDGPU/GlobalISel: Add missing G_[US]ADDE/G_[US]SUBE tests The s64 case currently crashes, so leave that for later.	2020-02-26 19:10:34 -08:00
Jay Foad	09a6b26753	AMDGPU: Fix some more incorrect check lines	2020-02-26 14:37:22 +00:00
Nicolai Hähnle	0f1df48925	AMDGPU/SIInsertSkips: Fix the determination of whether early-exit-after-kill is possible Summary: The old code made some incorrect assumptions about the order in which basic blocks are laid out in a function. This could lead to incorrect early-exits, especially when kills occurred inside of loops. The new approach is to check whether the point where the conditional kill occurs dominates all reachable code. If that is the case, there cannot be any other threads in the wave that are waiting to rejoin at a later point in the CFG, i.e. if exec=0 at that point, then all threads really are dead and we can exit the wave. Make some other minor cleanups to the pass while we're at it. v2: preserve the dominator tree Reviewers: arsenm, cdevadas, foad, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74908 Change-Id: Ia0d2b113ac944ad642d1c622b6da1b20aa1aabcc	2020-02-26 15:30:42 +01:00
Jay Foad	80d7e473e0	AMDGPU: Fix some incorrect FUNC-LABEL checks	2020-02-26 09:43:13 +00:00
Jay Foad	c66db21165	AMDGPU/GlobalISel: Un-XFAIL a test This was missed in `12fe9b26ec`	2020-02-25 16:46:46 +00:00
Matt Arsenault	86e13ec194	AMDGPU/GlobalISel: Use packed for G_ADD/G_SUB/G_MUL v2s16	2020-02-25 11:20:35 -05:00
Jay Foad	33cbd5ee08	AMDGPU/GlobalISel: Legalize s64 min/max by lowering Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75108	2020-02-25 16:00:43 +00:00
Jay Foad	ab96ec41ea	[AMDGPU] Precommit some test updates for D68338 "Remove dubious logic in bidirectional list scheduler"	2020-02-25 14:51:42 +00:00
Jay Foad	dc78190811	AMDGPU/GlobalISel: add legalize tests for s64 max/min	2020-02-25 09:49:19 +00:00
Matt Arsenault	fee41517fe	AMDGPU/GlobalISel: Introduce post-legalize combiner The current set of custom combines are only really useful after legalization, so move them there. There is a lot of overlap in the boilerplate here, but I think we do want a pretty different set of combines before and after legalize. I think we will want a lot of overlap between the post-legalize and a post-regbankselect combiner.	2020-02-24 22:12:12 -05:00
Matt Arsenault	0b46b078b6	AMDGPU/GlobalISel: Fix incorrect VOP3P fneg folding We use some s32 values in VOP3P operands, and won't see any intervening casts from a 32-bit fneg. Make sure it's really a packed fneg before folding.	2020-02-24 21:20:35 -05:00
Matt Arsenault	11e3dde625	GlobalISel: Reimplement fewerElementsVectorBasic Changes the handling of odd breakdowns, and avoids using G_EXTRACT/G_INSERT. Pad with undef to a wider size, and unmerge. Also avoid introducing instructions for the fully undef components.	2020-02-24 21:19:47 -05:00
Jay Foad	0ed4744bb5	AMDGPU/GlobalISel: Lower 64-bit uaddo/usubo Summary: Add more test cases for signed and unsigned add/sub with overflow. Reviewers: arsenm, rampitec, kerbowa Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75051	2020-02-24 23:08:14 +00:00
Mark Searles	d3e170c438	Revert "[AMDGPU] Don’t marke the .note section as ALLOC" This reverts commit `977cd661cf`. It breaks OpenCL testing. OpenCL Runtime is using PT_LOAD information to calculate memory for global variables. This commit should be relanded once the OpenCL runtime stops relying on PT_LOAD information for calculating global variable memory size. Differential Revision: https://reviews.llvm.org/D74995	2020-02-21 16:08:30 -08:00
Jay Foad	b72f1448ce	AMDGPU/GlobalISel: Better code for one case of G_SHUFFLE_VECTOR on v2i16 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74987	2020-02-21 21:16:39 +00:00
Matt Arsenault	00955a62e4	AMDGPU/GlobalISel: Fix SALU mapping for v2s16 min/max The legalizer helper functions are unusably awkward to perform the 3-5 part legalization. This needs to be widened, scalarized, lowered, and we should avoid creating vector extends and truncates. Manually do all of this and expand.	2020-02-21 14:02:16 -05:00
Matt Arsenault	db06870dbd	AMDGPU: Move dot intrinsic patterns to instruction def I tried to use some of the new tablegen features to avoid creating different operand list permutations, but I still don't see a way to programmatically build a source pattern dag. Also add GlobalISel tests, which now all import successfully. Some of the fneg fold tests are incorrect, which need to be fixed in a future commit	2020-02-21 13:35:40 -05:00
Matt Arsenault	4c1c9422a3	AMDGPU/GlobalISel: Select llvm.amdgcn.fdot2 I'm slighly worried about the generated checks, since they won't catch incorrect modifiers being added at the end of the line.	2020-02-21 13:35:40 -05:00
Matt Arsenault	dfce5fd50a	AMDGPU/GlobalISel: Select VOP3P instructions This only handles the basic cases. More work is needed to make better use of op_sel.	2020-02-21 13:35:40 -05:00
Matt Arsenault	72eef820d5	AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel for VOP3P instructions. Expand it in some other way in case it doesn't fold into the use instructions.	2020-02-21 13:35:40 -05:00
Jay Foad	cab39e4b8c	GlobalISel: Fix narrowing of (G_ASHR i64:x, 32) Reviewers: arsenm Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74950	2020-02-21 16:51:03 +00:00
Matt Arsenault	6a479220b5	AMDGPU/GlobalISel: Commit test changes I forgot to squash These should have been in `ac7abe0ba9`	2020-02-21 11:43:39 -05:00
Matt Arsenault	043ed2e22a	AMDGPU/GlobalISel: Fix xnor matching We should try the generated matchers before the manual selection. This means the patterns are now handling the common cases, but the manual selection code is not yet dead. It's still handling the non-s32/s64 cases (like v2s16 and v2s32). Currently tablegen doesn't have a nice way to have a single pattern that covers multiple types.	2020-02-21 11:42:49 -05:00
Matt Arsenault	89dc8fe622	AMDGPU/GlobalISel: Precommit xnor matching test	2020-02-21 11:09:59 -05:00
Matt Arsenault	ac7abe0ba9	AMDGPU/GlobalISel: Manually select G_BUILD_VECTOR_TRUNC We have patterns for s_pack* selection, but they assume the inputs are a build_vector with 16-bit inputs, not a truncating build vector. Since there's still outstanding work for how to handle mismatched result and source element vector operations, and since I'm trying a different packed vector strategy than SelectionDAG, just manually select this for now.	2020-02-21 10:34:11 -05:00
Matt Arsenault	79ff188add	AMDGPU/GlobalISel: Legalize G_FPOW There are few differences from the DAG handling. First, the DAG handling uses a primitive selection pattern instead of custom legalizing it. Because of this, this makes use of source modifiers while the DAG does not. Also instead of promoting f16, try to use the f16 log/exp. There's no f16 fmul_legacy, so widen just for the multiply, although I'm not sure that's the best solution.	2020-02-21 10:31:13 -05:00
Matt Arsenault	fab4cdea39	AMDGPU/GlobalISel: Select llvm.amdgcn.fmul.legacy	2020-02-21 10:30:26 -05:00
Matt Arsenault	b64aa8c715	AMDGPU/GlobalISel: Fix constant bus violation with source modifiers This looked through copies to find the source modifiers, which may have been SGPR->VGPR copies added to avoid potential constant bus violations. Re-insert a copy to a VGPR if this happens.	2020-02-21 10:30:23 -05:00
Nicolai Hähnle	32e4e71966	test/CodeGen/AMDGPU: Add a test case that shows a miscompilation Related to https://reviews.llvm.org/D74908 Change-Id: I6ebf3b5c7a32493016994f30d6796c41e95aecde	2020-02-21 13:38:24 +01:00
Simon Pilgrim	fc2b4a02b1	[DAGCombine] visitEXTRACT_VECTOR_ELT - add SimplifyDemandedBits multi use support Similar to what we already do with SimplifyDemandedVectorElts, call SimplifyDemandedBits across all the extracted elements of the source vector, treating it as single use. There's a minor regression in store-weird-sizes.ll which will be addressed in an upcoming SimplifyDemandedBits patch.	2020-02-20 15:49:38 +00:00
Matt Arsenault	083717cf49	AMDGPU: Fix v2i64<->v4f32 bitcast I'm not sure how to test the v2i64->v4f32 case since I can't think of any v2i64 cases that won't legalize to v4i32.	2020-02-20 09:49:09 -05:00
Sebastian Neubauer	977cd661cf	[AMDGPU] Don’t marke the .note section as ALLOC Marking a section as ALLOC tells the ELF loader to load the section into memory. As we do not want to load the notes into VRAM, the flag should not be there. Differential Revision: https://reviews.llvm.org/D74600	2020-02-20 15:14:48 +01:00
Simon Pilgrim	6085593c12	[AMDGPU] simplifyI24 - replace GetDemandedBits with SimplifyMultipleUseDemandedBits GetDemandedBits mostly just calls SimplifyMultipleUseDemandedBits now, but it does a very blunt constant simplification that SimplifyMultipleUseDemandedBits avoids. If we need to demand bits from constants we should handle this through ShrinkDemandedConstant/targetShrinkDemandedConstant. @arsenm confirmed that the sign extended immediates are better for code size. Differential Revision: https://reviews.llvm.org/D74857	2020-02-20 12:03:08 +00:00
dfukalov	dbfc682e2b	SpeculativeExecution: fixed ingoring free execution Summary: After updating cost model in AMDGPU target (`47a5c36b37`) the pass started to ignore some BBs since they got all instructions estimated as free. Reviewers: arsenm, chandlerc, nhaehnle Reviewed By: nhaehnle Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74825	2020-02-20 14:45:02 +03:00
Matt Arsenault	4bb0c8f91c	AMDGPU: Enable integer division bypass We probably want this, and I've meant to turn this on for a long time. SC actually emits a special case to early-out for a 1 denominator, which perhaps should also be considered.	2020-02-19 17:50:19 -05:00
Matt Arsenault	0b6ead018a	AMDGPU/GlobalISel: Cleanup min/max RegBankSelect tests Use common check prefix, although update_mir_test_checks makes this unnecessarily annoying. Also make sure to have uses in case that ever ends up mattering.	2020-02-19 17:32:25 -05:00
Simon Pilgrim	025ff5a4ea	[AMDGPU] Regenerate immediate constant tests	2020-02-19 18:58:44 +00:00
Matt Arsenault	ff4639f060	AMDGPU/GlobalISel: Select MUBUF path for global atomic cmpxchg I'm not sure why this isn't a pattern, but the DAG manually selects this.	2020-02-19 06:19:22 -08:00
Simon Pilgrim	4af8db317d	[AMDGPU] performCvtF32UByteNCombine - add SHL and SimplifyMultipleUseDemandedBits support This is part of the work to remove SelectionDAG::GetDemandedBits and just use SimplifyMultipleUseDemandedBits. Recent experiments raised some v_cvt_f32_ubyte*_e32 regressions, so I've added some additional abilities to performCvtF32UByteNCombine to help unpack byte data more aggressively. We still don't remove all OR(SHL,SRL) patterns as some of the regenerated nodes don't get combined again, but we are getting closer. Differential Revision: https://reviews.llvm.org/D74786	2020-02-19 11:45:57 +00:00
Matt Arsenault	37c452a289	AMDGPU/GlobalISel: Adjust branch target when lowering loop intrinsic This needs to steal the branch target like the other control flow intrinsics.	2020-02-18 06:35:40 -08:00
Matt Arsenault	5e8792453d	AMDGPU/GlobalISel: Fix RegBankSelect for G_SHUFFLE_VECTOR	2020-02-17 15:11:25 -05:00
Matt Arsenault	f742a28ae3	AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM	2020-02-17 15:09:51 -05:00
Matt Arsenault	e240b27d6d	AMDGPU/GlobalISel: Allow arbitrary global values Treat unknown address spaces as global	2020-02-17 11:32:28 -08:00
Matt Arsenault	54137bbaaf	GlobalISel: Allow running localizer earlier This required legal and regbankselected MIR for seemingly no reason. For AMDGPU this wouldn't see legalized G_GLOBAL_VALUEs.	2020-02-17 11:24:06 -08:00
Matt Arsenault	96db12d507	AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM AMDGPUCodeGenPrepare expands this most of the time, but not always. We will always at least need a fallback option here. This is the 3rd implementation of the same expansion in the backend. Eventually I would like to eliminate the IR expansion (and the DAG version obviously). Currently the new legalizer path produces a better result, since the IR expansion results in extra operations which need to be combined out. Notably, the IR expansion results in multiplies by 0.	2020-02-17 11:05:50 -08:00

1 2 3 4 5 ...

3244 Commits