llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	09ed0e44d9	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load	2020-01-27 13:40:37 -05:00
Stanislav Mekhanoshin	53eb0f8c07	[AMDGPU] Attempt to reschedule withou clustering We want to have more load/store clustering but we also want to maintain low register pressure which are oposit targets. Allow scheduler to reschedule regions without mutations applied if we hit a register limit. Differential Revision: https://reviews.llvm.org/D73386	2020-01-27 10:27:16 -08:00
Matt Arsenault	97711228fd	AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load.format	2020-01-27 13:23:35 -05:00
Matt Arsenault	ce7ca2caf2	AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load	2020-01-27 13:05:55 -05:00
Matt Arsenault	198624c39d	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format	2020-01-27 13:02:19 -05:00
Matt Arsenault	fc90222a91	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load Use intermediate instructions, unlike with buffer stores. This is necessary because of the need to have an internal way to distinguish between signed and unsigned extloads. This introduces some duplication and near duplication with the buffer store selection path. The store handling should maybe be moved into legalization to match and eliminate the duplication.	2020-01-27 12:49:23 -05:00
Matt Arsenault	e60d658260	AMDGPU/GlobalISel: Handle VOP3NoMods	2020-01-27 09:03:44 -08:00
Matt Arsenault	0968234590	AMDGPU/GlobalISel: Minor refactor of MUBUF complex patterns This will make it easier to support the small variants in the complex patterns for atomics.	2020-01-27 09:00:00 -08:00
Matt Arsenault	bef27175c7	AMDGPU: Fix not using f16 fsin/fcos I noticed this because this accidentally started working for GlobalISel.	2020-01-27 08:59:59 -08:00
Matt Arsenault	a1d33ce73a	AMDGPU/GlobalISel: Custom legalize v2s16 G_SHUFFLE_VECTOR Try to keep simple v2s16 cases as-is. This will more naturally map to how the VOP3P op_sel modifiers work compared to the expansion involving bitcasts and bitshifts. This could maybe try harder with wider source vector types, although that could be handled with a pre-legalize combine.	2020-01-27 08:28:05 -08:00
Matt Arsenault	4e69df091d	Revert "AMDGPU: Temporary drop s_mul_hi_i/u32 patterns" This reverts commit `fe23ed2c68`. It was never really clear this was responsible for the performance regressions that caused this to be reverted. It's been a long time, and we need to have scalar patterns for this to get GlobalISel working.	2020-01-27 08:07:21 -08:00
Matt Arsenault	bc3d900fa5	AMDGPU/GlobalISel: Fix not using global atomics on gfx9+ For some reason the flat/global atomics end up in the generated matcher table in a different order from SelectionDAG. Use AddedComplexity to prefer checking for global atomics first.	2020-01-27 07:42:42 -08:00
Matt Arsenault	ac0b9b4ccf	AMDPGPU/GlobalISel: Select more MUBUF global addressing modes The handling of the high bits of the resource descriptor seem weird to me, where the 3rd dword changes based on the instruction.	2020-01-27 07:28:36 -08:00
Matt Arsenault	fdaad485e6	AMDGPU/GlobalISel: Initial selection of MUBUF addr64 load/store Fixes the main reason for compile failures on SI, but doesn't really try to use the addressing modes yet.	2020-01-27 07:13:56 -08:00
Matt Arsenault	2214bc81d0	AMDGPU: Allow i16 shader arguments Not allowing this just creates unnecessary complications when writing simple tests.	2020-01-27 06:55:32 -08:00
Jay Foad	1bf00219fc	[AMDGPU] Handle multiple base operands in areMemAccessesTriviallyDisjoint Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73455. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73456	2020-01-27 14:45:21 +00:00
Jay Foad	6461eadf8f	[AMDGPU] Handle multiple base operands in shouldClusterMemOps Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73454. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73455	2020-01-27 14:45:21 +00:00
Jay Foad	fcf5254fa7	[AMDGPU] Handle frame index base operands in memOpsHaveSameBasePtr Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73454	2020-01-27 14:45:21 +00:00
vpykhtin	4332f1a4c8	[AMDGPU] Fix GCN regpressure trackers for INLINEASM instructions. Differential revision: https://reviews.llvm.org/D73338	2020-01-27 17:25:25 +03:00
Matt Arsenault	2a160ba5b0	GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results Only use shifts if the requested type exactly matches the source type, and create sub-unmerges otherwise.	2020-01-27 06:18:26 -08:00
Maheaha Shivamallappa	66f93071cd	AMDGPU/GlobalISel: Clean-up code around ISel for Intrinsics. Summary: A minor code clean-up around ISel for intrinsic llvm.amdgcn.end.cf() Reviewers: arsenm, mshivama Reviewed By: arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D73358	2020-01-26 14:09:31 +05:30
Tom Stellard	cb297050bb	AMDGPU/SILoadStoreOptimizer: Fix uninitialized variable error This was introduced by `86c944d790` and caught by the sanitizer-x86_64-linux-fast bot.	2020-01-24 21:53:05 -08:00
Tom Stellard	86c944d790	AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets Summary: This improves merging of sequences like: store a, ptr + 4 store b, ptr + 8 store c, ptr + 12 store d, ptr + 16 store e, ptr + 20 store f, ptr Prior to this patch the basic block was scanned in order to find instructions to merge and the above sequence would be transformed to: store4 <a, b, c, d>, ptr + 4 store e, ptr + 20 store r, ptr With this change, we now sort all the candidate merge instructions by their offset, so instructions are visited in offset order rather than in the order they appear in the basic block. We now transform this sequnce into: store4 <f, a, b, c>, ptr store2 <d, e>, ptr + 16 Another benefit of this change is that since we have sorted the mergeable lists by offset, we can easily check if an instruction is mergeable by checking the offset of the instruction that becomes before or after it in the sorted list. Once we determine an instruction is not mergeable we can remove it from the list and avoid having to do the more expensive mergeablilty checks. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: arsenm, nhaehnle Subscribers: kerbowa, merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65966	2020-01-24 19:45:56 -08:00
Matt Arsenault	3b93945587	AMDGPU/GlobalISel: Select wqm, softwqm and wwm intrinsics	2020-01-24 13:06:44 -08:00
Matt Arsenault	87c46a3129	AMDGPU: Don't error on ds.ordered intrinsic in function These should be assumed to be called from a compute context. Also don't use a 2 entry switch over constants.	2020-01-24 13:06:44 -08:00
Stanislav Mekhanoshin	be8e38cbd9	Correct NumLoads in clustering Scheduler sends NumLoads argument into shouldClusterMemOps() one less the actual cluster length. So for 2 instructions it will pass just 1. Correct this number. This is NFC for in tree targets. Differential Revision: https://reviews.llvm.org/D73292	2020-01-24 12:45:28 -08:00
Matt Arsenault	84e035d8f1	AMDGPU: Don't check constant address space for atomic stores We define a separate list for storable address spaces. This saves entry in the matcher table address space list.	2020-01-24 12:15:09 -08:00
Stanislav Mekhanoshin	555d8f4ef5	[AMDGPU] Bundle loads before post-RA scheduler We are relying on atrificial DAG edges inserted by the MemOpClusterMutation to keep loads and stores together in the post-RA scheduler. This does not work all the time since it allows to schedule a completely independent instruction in the middle of the cluster. Removed the DAG mutation and added pass to bundle already clustered instructions. These bundles are unpacked before the memory legalizer because it does not work with bundles but also because it allows to insert waitcounts in the middle of a store cluster. Removing artificial edges also allows a more relaxed scheduling. Differential Revision: https://reviews.llvm.org/D72737	2020-01-24 11:33:38 -08:00
Stanislav Mekhanoshin	44b865fa7f	[AMDGPU] Allow narrowing muti-dword loads Currently BE allows only a little load narrowing because of the fear it will produce sub-dword ext loads. However, we can always allow narrowing if we are shrinking one multi-dword load to another multi-dword load. In particular we were unable to reduce s_load_dwordx8 into s_load_dwordx4 if identity shuffle was used to extract low 4 dwords. Differential Revision: https://reviews.llvm.org/D73133	2020-01-24 11:03:41 -08:00
Austin Kerbow	c226646337	Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Resubmit with test updates since GPUDA was causing failures on Windows. Reviewers: rampitec, nhaehnle, arsenm, thakis Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73315	2020-01-24 10:39:40 -08:00
Guillaume Chatelet	805c157e8a	[Alignment][NFC] Deprecate Align::None() Summary: This is a follow up on https://reviews.llvm.org/D71473#inline-647262. There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case. Reviewers: xbolva00, courbet, bollu Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73099	2020-01-24 12:53:58 +01:00
Changpeng Fang	2531535984	AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv. In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath. FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request. RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal. Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed. 1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D71293	2020-01-23 16:57:43 -08:00
Matt Arsenault	86e5b56a7c	AMDGPU/GlobalISel: Fix RegBanKSelect for llvm.amdgcn.exp.compr This wasn't updated for the immarg handling change. We really need a verifier for this.	2020-01-23 13:30:46 -08:00
Matt Arsenault	fac9941e57	AMDGPU: Fix ubsan error Since register classes go up to 1024, 32 elements, all masks bits are needed and a 32-bit shift by 32 is illegal. We didn't have any instructions theoretically using a 32 element VGPR before `d1dbb5e471`	2020-01-23 15:05:47 -05:00
Matt Arsenault	618fa77ae4	AMDGPU/GlobalISel: Select V_ADD3_U32/V_XOR3_B32 The other 3-op patterns should also be theoretically handled, but currently there's a bug in the inferred pattern complexity. I'm not sure what the error handling strategy should be for potential constant bus violations. I think the correct strategy is to never produce mixed SGPR and VGPR operands in a typical VOP instruction, which will trivially avoid them. However, it's possible to still have hand written MIR (or erroneously transformed code) with these operands. When these fold, the restriction will be violated. We currently don't have any verifiers for reg bank legality. For now, just ignore the restriction. It might be worth triggering a DAG fallback on verifier error.	2020-01-23 12:04:20 -05:00
Guillaume Chatelet	59f95222d4	[Alignment][NFC] Use Align with CreateAlignedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73274	2020-01-23 17:34:32 +01:00
Matt Arsenault	dfec702290	AMDGPU: Check for other uses when looking through casted select Fixes mesa regression on ext_transform_feedback-max-varyings	2020-01-23 11:31:24 -05:00
Guillaume Chatelet	279fa8e006	[Alignement][NFC] Deprecate untyped CreateAlignedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73260	2020-01-23 13:34:32 +01:00
Matt Arsenault	4d14772f5c	AMDGPU/GlobalISel: Remove redundant or patterns These ended up with higher priority than or3 patterns in a future patch. This also fixes the using VOP2 forms.	2020-01-22 21:45:51 -05:00
Jan Vesely	1b8eab179d	AMDGPU/R600: Emit rodata in text segment R600 relies on this behaviour. Fixes: `6e18266aa4` ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since `6e18266` Differential Revision: https://reviews.llvm.org/D72991	2020-01-22 14:31:51 -05:00
Nico Weber	cd470717d1	Revert "[DA][TTI][AMDGPU] Add option to select GPUDA with TTI" This reverts commit `a90a6502ab`. Broke tests on Windows: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/13808	2020-01-22 12:56:19 -05:00
Matt Arsenault	1192d7b254	AMDGPU/GlobalISel: Handle 16-bank LDS llvm.amdgcn.interp.p1.f16 The pattern is also mishandled by the generated matcher, so workaround this as in the DAG path. The existing DAG tests aren't particularly targeted to just this one intrinsic. These also end up differing in scheduling from SGPR->VGPR operand constraint copies.	2020-01-22 12:10:59 -05:00
Matt Arsenault	c05f23e409	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp This is deprecated, but easy to support.	2020-01-22 11:43:53 -05:00
Matt Arsenault	dd09ec1208	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp8	2020-01-22 11:43:40 -05:00
Matt Arsenault	0bf434ccd5	AMDGPU: Fix element size assertion The GlobalISel usage called this with bits, but the DAG usage was incorrectly using bytes.	2020-01-22 11:18:45 -05:00
Matt Arsenault	bb562d1af0	AMDGPU/GlobalISel: Keep G_BITCAST out of waterfall loop The waterfall utility function blindly inserts a phi for every def in the loop. We don't need this one to be preserved for every iteration. Saves an extra phi and copy inside the loop body.	2020-01-22 11:16:19 -05:00
Matt Arsenault	52ec7379ad	AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT Move the subregister base like in the extract case.	2020-01-22 11:09:15 -05:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Matt Arsenault	3524d4412c	AMDGPU/GlobalISel: Fix RegBankSelect for G_INSERT_VECTOR_ELT The result and source vector are going to be tied, so these need to be the same bank. The inserted value also needs to be broken down based on the result bank, not the inserted value itself.	2020-01-22 10:57:50 -05:00
Matt Arsenault	e3d352c541	AMDGPU/GlobalISel: Fold constant offset vector extract indexes Handle dynamic vector extracts that use an index that's an add of a constant offset into moving the base subregister of the indexing operation. Force the add into the loop in regbankselect, which will be recognized when selected.	2020-01-22 10:50:59 -05:00
Matt Arsenault	e93e1b621c	AMDGPU: Fix typo	2020-01-22 10:17:46 -05:00
Matt Arsenault	2fe500ab5b	AMDGPU: Look through casted selects to constant fold bin ops The promotion of the uniform select to i32 interfered with this fold.	2020-01-22 10:16:39 -05:00
Matt Arsenault	bcd91778fe	AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare DAGCombiner does this, but divisions expanded here miss this optimization. Since `67aa18f165`, divisions have been expanded here and missed out on this optimization. Avoids test regressions in a future patch.	2020-01-22 10:16:39 -05:00
Matt Arsenault	a174f0da62	AMDGPU/GlobalISel: Add pre-legalize combiner pass Just copy the AArch64 pass as-is for now, except for removing the memcpy handling.	2020-01-22 10:16:39 -05:00
Jay Foad	e0f0d0e55c	[MachineScheduler] Allow clustering mem ops with complex addresses The generic BaseMemOpClusterMutation calls into TargetInstrInfo to analyze the address of each load/store instruction, and again to decide whether two instructions should be clustered. Previously this had to represent each address as a single base operand plus a constant byte offset. This patch extends it to support any number of base operands. The old target hook getMemOperandWithOffset is now a convenience function for callers that are only prepared to handle a single base operand. It calls the new more general target hook getMemOperandsWithOffset. The only requirements for the base operands returned by getMemOperandsWithOffset are: - they can be sorted by MemOpInfo::Compare, such that clusterable ops get sorted next to each other, and - shouldClusterMemOps knows what they mean. One simple follow-on is to enable clustering of AMDGPU FLAT instructions with both vaddr and saddr (base register + offset register). I've left a FIXME in the code for this case. Differential Revision: https://reviews.llvm.org/D71655	2020-01-22 14:28:24 +00:00
Matt Arsenault	70096ca111	AMDGPU/GlobalISel: Fix RegbankSelect for llvm.amdgcn.fmul.legacy	2020-01-22 09:26:17 -05:00
Matt Arsenault	a722cbf77c	AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec The intermediate instruction drops the extra volatile argument. We are missing an atomic ordering on these.	2020-01-22 09:26:17 -05:00
Matt Arsenault	9c928649a0	AMDGPU: Fix interaction of tfe and d16 This using the wrong result register, and dropping the result entirely for v2f16. This would fail to select on the scalar case. I believe it was also mishandling packed/unpacked subtargets.	2020-01-22 09:26:17 -05:00
Matt Arsenault	b94d3b9b77	AMDGPU/GlobalISel: RegBankSelect interp intrinsics Note this assumes the future use of immediates for immarg, not the current G_CONSTANT which will be emitted.	2020-01-22 09:01:34 -05:00
Matt Arsenault	64e9528201	AMDGPU: Fix missing immarg on llvm.amdgcn.interp.mov The first operand maps to an immediate field, so this should be immarg.	2020-01-22 09:01:34 -05:00
Austin Kerbow	a90a6502ab	[DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Reviewers: rampitec, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73049	2020-01-21 21:13:20 -08:00
Carl Ritson	6b4b3e2856	[AMDGPU] SIRemoveShortExecBranches should not remove branches exiting loops Summary: Check that a s_cbranch_execz is not a loop exit before removing it. As the pass is generating infinite loops. Reviewers: cdevadas, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits, dstuttard, foad Tags: #llvm Differential Revision: https://reviews.llvm.org/D72997	2020-01-22 13:18:40 +09:00
cdevadas	e53a9d96e6	Resubmit: [AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-22 13:18:32 +09:00
Amara Emerson	67a8775322	[AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible. In GlobalISel we may in some unfortunate circumstances generate PHIs with operands that are on separate banks. If-conversion doesn't currently check for that case and ends up generating a CSEL on AArch64 with incorrect register operands. Differential Revision: https://reviews.llvm.org/D72961	2020-01-21 16:51:31 -08:00
Matt Arsenault	e47965bf64	AMDGPU/GlobalISel: Merge trivial legalize rules Also move constant-like rules together	2020-01-21 17:37:19 -05:00
Matt Arsenault	9a5a6e9465	AMDGPU/GlobalISel: Merge G_PTR_ADD/G_PTR_MASK rules	2020-01-21 16:57:01 -05:00
Matt Arsenault	fd109308a7	AMDGPU/GlobalISel: Legalize G_PTR_ADD for arbitrary pointers Pointers of unrecognized address spaces shoudl be treated as global-like pointers. Even if loads and stores of them aren't handled, dumb operations that just operate on the bits should work.	2020-01-21 16:35:36 -05:00
Krzysztof Parzyszek	020041d99b	Update spelling of {analyze,insert,remove}Branch in strings and comments These names have been changed from CamelCase to camelCase, but there were many places (comments mostly) that still used the old names. This change is NFC.	2020-01-21 10:15:38 -06:00
Nicolai Hähnle	a80291ce10	Revert "[AMDGPU] Invert the handling of skip insertion." This reverts commit `0dc6c249bf`. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.	2020-01-21 09:17:25 +01:00
Fangrui Song	5721483b64	[AMDGPU] Fix -Wunused-variable after `e5823bf806`	2020-01-20 22:41:13 -08:00
Matt Arsenault	c72aa27f91	AMDDGPU/GlobalISel: Fix RegBankSelect for llvm.amdgcn.ps.live	2020-01-20 23:21:53 -05:00
Matt Arsenault	e5823bf806	AMDGPU: Don't create weird sized integers There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future patch.	2020-01-20 20:02:54 -05:00
Matt Arsenault	9b13b4a0e3	AMDGPU: Prepare to use scalar register indexing Define pseudos mirroring the the VGPR indexing ones, and adjust the operands in the s_movrel* instructions to avoid the result def.	2020-01-20 17:19:16 -05:00
Matt Arsenault	8615eeb455	AMDGPU: Partially merge indirect register write handling `a785209bc2` switched to using a pseudos instead of manually tying operands on the regular instruction. The VGPR indexing mode path should have the same problems that change attempted to avoid, so these should use the same strategy. Use a single pseudo for the VGPR indexing mode and movreld paths, and expand it based on the subtarget later. These have essentially the same constraints, reading the index from m0. Switch from using an offset to the subregister index directly, instead of computing an offset and re-adding it back. Also add missing pseudos for existing register class sizes.	2020-01-20 17:19:16 -05:00
Matt Arsenault	f6418d72f5	AMDGPU/GlobalISel: Add documentation for RegisterBankInfo Document some high level strategies that should be used for register bank selection. The constant bus restriction section hasn't actually been implemented yet.	2020-01-20 15:41:25 -05:00
Fangrui Song	8e8a75ad50	[TargetRegisterInfo] Default trackLivenessAfterRegAlloc() to true Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have problems), every target overrides it with true. PostMachineScheduler requires livein information. Not providing it can cause assertion failures in ScheduleDAGInstrs::addSchedBarrierDeps().	2020-01-19 14:20:37 -08:00
Michael Liao	6d0d86a64d	[DAG] Add helper for creating constant vector index with correct type. NFC.	2020-01-18 01:23:36 -05:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	ec9628318d	AMDGPU/GlobalISel: Select DS append/consume	2020-01-17 20:09:53 -05:00
Stanislav Mekhanoshin	eebdd85e7d	[AMDGPU] allow multi-dword flat scratch access since GFX9 This is supported starting with GFX9. Differential Revision: https://reviews.llvm.org/D72865	2020-01-17 10:47:03 -08:00
Matt Arsenault	886f9071c6	AMDGPU: Don't assert on a16 images on targets without FeatureR128A16 Currently the lowering for i16 image coordinates asserts on gfx10. I'm somewhat confused by this though. The feature is missing from the gfx10 feature lists, but the a16 bit appears to be present in the manual for MIMG instructions.	2020-01-17 11:07:00 -05:00
Matt Arsenault	117d4f1900	AMDGPU: Add register classes to MUBUF load patterns	2020-01-16 22:00:44 -05:00
Matt Arsenault	91e758b732	AMDGPU: Move permlane discard vdst_in optimization This case can be handled as a regular selection pattern, so move it out of the weird post-isel folding code which doesn't have an exactly equivalent place in GlobalISel. I think it doesn't make much sense to do this optimization here though, and it would be more useful in instcombine. There's not really any new information that will be gained during lowering since these inputs were known from the beginning.	2020-01-16 17:27:53 -05:00
Matt Arsenault	f5d98543b8	AMDGPU: Remove outdated comment	2020-01-16 14:54:27 -05:00
Matt Arsenault	e12b840abf	AMDGPU/GlobalISel: Improve lowering of G_SEXT_INREG Clamping the scalar is much better than lowering with superwide shifts for types > s64.	2020-01-16 14:29:37 -05:00
Matt Arsenault	4ca1ad85b7	AMDGPU/GlobalISel: Don't handle legacy buffer intrinsic	2020-01-16 11:31:12 -05:00
Matt Arsenault	9b2f3532c7	AMDGPU/GlobalISel: Select DS GWS intrinsics	2020-01-16 11:25:10 -05:00
Matt Arsenault	711a17afaf	AMDGPU/GlobalISel: Select exp with patterns This does produce slightly different code. Now a unique IMPLICIT_DEF is emitted for each of the implicit_def operands, rather than reusing the same one.	2020-01-15 18:33:15 -05:00
Matt Arsenault	eef92f25cc	AMDGPU: Remove custom node for exports I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic. Requires hacking out the illegal type on SI, so manually select that case during lowering.	2020-01-15 18:33:15 -05:00
Mircea Trofin	5466597fee	[NFC] Refactor InlineResult for readability Summary: InlineResult is used both in APIs assessing whether a call site is inlinable (e.g. llvm::isInlineViable) as well as in the function inlining utility (llvm::InlineFunction). It means slightly different things (can/should inlining happen, vs did it happen), and the implicit casting may introduce ambiguity (casting from 'false' in InlineFunction will default a message about hight costs, which is incorrect here). The change renames the type to a more generic name, and disables implicit constructors. Reviewers: eraman, davidxl Reviewed By: davidxl Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72744	2020-01-15 13:34:20 -08:00
Matt Arsenault	936483fb7d	GlobalISel: Implement lower for G_BITCAST Bitcast only really applies between scalars and vectors. Implement as an unmerge and remerge. The test needs to tolerate failure since one of the unmerges currently fails to legalize.	2020-01-15 08:58:58 -05:00
Matt Arsenault	bd7658a212	AMDGPU: Partially directly select llvm.amdgcn.interp.p1.f16 The 16 bank LDS case is complicated due to using multiple instructions. If I attempt to write a pattern for it, the generated selector incorrectly places the copy to m0 after the first instruction, so that needs to be separately addressed. Also fix not gluing the copy to m0 to the second operation in the second half of the 16 bank lowering.	2020-01-15 08:58:58 -05:00
cdevadas	0dc6c249bf	[AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-15 15:18:16 +05:30
Tom Stellard	0dbcb36394	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439	2020-01-14 19:46:52 -08:00
Michael Liao	01a4b83154	[codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU. Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709	2020-01-14 19:26:15 -05:00
Stanislav Mekhanoshin	ad741853c3	[AMDGPU] Model distance to instruction in bundle This change allows to model the height of the instruction within a bundle for latency adjustment purposes. Differential Revision: https://reviews.llvm.org/D72669	2020-01-14 01:18:59 -08:00
Stanislav Mekhanoshin	eca4474587	[AMDGPU] Fix getInstrLatency() always returning 1 We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655	2020-01-14 01:08:30 -08:00
Matt Arsenault	203801425d	AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add\|swap}	2020-01-13 13:09:38 -05:00
Matt Arsenault	3d8f1b2d22	AMDGPU/GlobalISel: Set insert point after waterfall loop The current users of the waterfall loop utility functions do not make use of the restored original insert point. The insertion is either done, or they set the insert point somewhere else. A future change will want to insert instructions after the waterfall loop, but figuring out the point after the loop is more difficult than ensuring the insert point is there after the loop.	2020-01-13 12:51:05 -05:00
Matt Arsenault	ca19d7a399	AMDGPU/GlobalISel: Fix branch targets when emitting SI_IF The branch target needs to be changed depending on whether there is an unconditional branch or not. Loops also need to be similarly fixed, but compiling a simple testcase end to end requires another set of patches that aren't upstream yet.	2020-01-13 12:51:05 -05:00
Matt Arsenault	7d9b0a61c3	AMDGPU/GlobalISel: Simplify assert	2020-01-13 12:51:05 -05:00
Matt Arsenault	555e7ee04c	AMDGPU/GlobalISel: Don't use XEXEC class for SGPRs We don't use the xexec register classes for arbitrary values anymore. Avoids a test variance beween GlobalISel and SelectionDAG>	2020-01-12 22:44:51 -05:00
Matt Arsenault	a10527cd37	AMDGPU/GlobalISel: Copy type when inserting readfirstlane getDefIgnoringCopies will fail to find any def if no type is set if we try to use it on the use's operand, so propagate the type.	2020-01-12 22:44:51 -05:00
Fangrui Song	6fdd6a7b3f	[Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction() The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.	2020-01-11 13:34:52 -08:00
Michael Bedy	4a32cd11ac	[AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering. Summary: - SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions. While this is necessary for the special handling of moving results out of WWM live ranges, it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every WQM operation. This change uses a COPY psuedo in these cases, which allows the register allocator to coalesce the moves away. Reviewers: tpr, dstuttard, foad, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71386	2020-01-10 23:01:19 -05:00
Stanislav Mekhanoshin	987bf8b6c1	Let targets adjust operand latency of bundles This reverts the AMDGPU DAG mutation implemented in D72487 and gives a more general way of adjusting BUNDLE operand latency. It also replaces FixBundleLatencyMutation with adjustSchedDependency callback in the AMDGPU, fixing not only successor latencies but predecessors' as well. Differential Revision: https://reviews.llvm.org/D72535	2020-01-10 14:56:53 -08:00
Matt Arsenault	bac995d978	AMDGPU/GlobalISel: Clamp G_ZEXT source sizes Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.	2020-01-10 09:42:49 -05:00
Matt Arsenault	35c3d101ae	AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.	2020-01-09 19:52:24 -05:00
Matt Arsenault	5cabb8357a	AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v case If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.	2020-01-09 19:46:54 -05:00
Stanislav Mekhanoshin	cd69e4c74c	[AMDGPU] Fix bundle scheduling Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487	2020-01-09 15:56:36 -08:00
Matt Arsenault	b4a647449f	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.	2020-01-09 17:37:52 -05:00
Matt Arsenault	0ea3c7291f	GlobalISel: Handle llvm.read_register Compared to the attempt in `bdcc6d3d26`, this uses intermediate generic instructions.	2020-01-09 17:37:52 -05:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Matt Arsenault	767aa507a4	AMDGPU/GlobalISel: Fix argument lowering for vectors of pointers When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.	2020-01-09 16:29:44 -05:00
Matt Arsenault	35ad66fae8	AMDGPU/GlobalISel: Widen 16-bit shift amount sources This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.	2020-01-09 16:29:44 -05:00
Matt Arsenault	9ffd0ed838	AMDGPU/GlobalISel: Fix import of integer med3 This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.	2020-01-09 10:29:32 -05:00
Matt Arsenault	c66b2e1c87	AMDGPU: Eliminate more legacy codepred address space PatFrags These should now be limited to R600 code.	2020-01-09 10:29:32 -05:00
Matt Arsenault	3766f4bacc	AMDGPU: Use new PatFrag system for d16 stores	2020-01-09 10:29:32 -05:00
Matt Arsenault	c1d4963b44	AMDGPU: Use new PatFrag system for d16 load nodes	2020-01-09 10:29:32 -05:00
Matt Arsenault	7d67742160	AMDGPU/GlobalISel: Fix import of zext of s16 op patterns	2020-01-09 10:29:32 -05:00
Matt Arsenault	e71af77568	AMDGPU/GlobalISel: Add IMMPopCount xform Partially fixes BFE pattern import.	2020-01-09 10:29:32 -05:00
Matt Arsenault	79450a4ea2	AMDGPU/GlobalISel: Add selectVOP3Mods_nnan This doesn't enable any new imports yet, but moves the fmed patterns from failing on this to hitting the "complex suboperand referenced more than once" limitation in tablegen.	2020-01-09 10:29:32 -05:00
Matt Arsenault	d964086c62	AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32 Only partially fixes one pattern import.	2020-01-09 10:29:31 -05:00
Matt Arsenault	3952748ffd	AMDGPU/GlobalISel: Fix add of neg inline constant pattern	2020-01-09 10:29:31 -05:00
Matt Arsenault	db7c920779	AMDGPU: Add register class to DS_SWIZZLE_B32 pattern Reduces diff for a future patch.	2020-01-09 10:29:31 -05:00
Ehud Katz	24b326cc61	[APFloat] Fix checked error assert failures `APFLoat::convertFromString` returns `Expected` result, which must be "checked" if the LLVM_ENABLE_ABI_BREAKING_CHECKS preprocessor flag is set. To mark an `Expected` result as "checked" we must consume the `Error` within. In many cases, we are only interested in knowing if an error occured, without the need to examine the error info. This is achieved, easily, with the `errorToBool()` API.	2020-01-09 09:42:32 +02:00
Michael Liao	07a569a053	[amdgpu] Remove unused header. NFC.	2020-01-08 11:32:09 -05:00
Matt Arsenault	22700f68e1	AMDGPU: Annotate EXTRACT_SUBREGs with source register classes This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.	2020-01-07 21:56:16 -05:00
Matt Arsenault	6652cc0cf7	AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointers `4e85ca9562` missed updating the legal condition type set for pointers with any unrecognized address space.	2020-01-07 16:36:31 -05:00
Matt Arsenault	4844bf0fe2	AMDGPU: Apply i16 add->sub pattern with zext to i32 This was only applying the deeper nested zext pattern, and missing the special case code size fold.	2020-01-07 16:36:31 -05:00
Matt Arsenault	c3a10faadc	AMDGPU: Remove VOP3Mods0Clamp0OMod Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.	2020-01-07 15:10:08 -05:00
Matt Arsenault	de46ab698b	AMDGPU: Fix misleading, misplaced end block comments	2020-01-07 15:10:08 -05:00
Matt Arsenault	bd8d696c14	AMDGPU: Use ImmLeaf	2020-01-07 15:10:07 -05:00
Matt Arsenault	68e70fb098	AMDGPU: Fix not using v_cvt_f16_[iu]16 We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.	2020-01-07 15:10:07 -05:00
Matt Arsenault	78b30a54c9	AMDGPU/GlobalISel: Fix readfirstlane pattern import The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.	2020-01-07 11:07:08 -05:00
Matt Arsenault	e699c03c9b	AMDGPU/GlobalISel: Fix import of s_abs_i32 pattern	2020-01-07 10:32:07 -05:00
Matt Arsenault	9150d6bd73	AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.vote	2020-01-07 10:15:29 -05:00
Matt Arsenault	a428386d4a	AMDGPU/GlobalISel: Partially fix llvm.amdgcn.kill pattern import Tests deferred since the existing DAG test depends on some other operations, but isn't far from working as-is.	2020-01-07 10:09:59 -05:00
Simon Pilgrim	6ff1ea3244	Fix "use of uninitialized variable" static analyzer warning. NFCI.	2020-01-07 12:06:54 +00:00
Fangrui Song	3d87d0b925	[MC] Add parameter `Address` to MCInstrPrinter::printInstruction Follow-up of D72172. Reviewed By: jhenderson, rnk Differential Revision: https://reviews.llvm.org/D72180	2020-01-06 20:44:14 -08:00
Fangrui Song	aa708763d3	[MC] Add parameter `Address` to MCInstPrinter::printInst printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172	2020-01-06 20:42:22 -08:00
Matt Arsenault	dc7b84c66c	AMDGPU/GlobalISel: Fix unused variable warning in release	2020-01-06 22:31:33 -05:00
Matt Arsenault	452f6243c9	AMDGPU: Select llvm.amdgcn.interp.p2.f16 directly This will enable automatic GlobalISel support in a future commit.	2020-01-06 20:34:21 -05:00
Matt Arsenault	e93b1ffc84	AMDGPU: Use default operands for clamp/omod We have a lot of complex pattern variants that just set the source modifiers that are really handled, and then set the output modifiers to 0. We're unlikely to ever match output modifiers from the use instruction side, and we already match clamp/omod in a separate pass.	2020-01-06 20:22:13 -05:00
Matt Arsenault	52afc93c38	AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTER	2020-01-06 19:16:32 -05:00
Matt Arsenault	d4c9e13324	AMDGPU/GlobalISel: Select G_UADDE/G_USUBE	2020-01-06 18:27:52 -05:00
Matt Arsenault	4e85ca9562	AMDGPU/GlobalISel: Replace handling of boolean values This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.	2020-01-06 18:26:42 -05:00
Matt Arsenault	f3de8ab5cc	GlobalISel: Implement lower for G_INTRINSIC_ROUND Mostly copied from AMDGPU lowering implementation, except used G_SITOFP instead of directly creating a select on -1.0, 0.0.	2020-01-06 18:26:42 -05:00
Matt Arsenault	ee6b8722ff	GlobalISel: Fix unsupported legalize action This would complain about invalid legalizer rules otherwise. Mark some operations as unsupported for AMDGPU. This currently seems to produce the same legalize error as when no rules are defined, but eventually this should produce a proper user facing error.	2020-01-06 17:21:51 -05:00
Matt Arsenault	7f2db2917d	AMDGPU: Fix legalizing f16 fpow The existing test only covered one case for r600. The use of mul_legacy also looks suspicious to me, but leave it for now. The patterns are also not making use of source modifiers.	2020-01-06 17:21:51 -05:00
Matt Arsenault	a506efff18	AMDGPU: Use ImmLeaf This solves one GlobalISel importer error, but the pattern still fails for another reason.	2020-01-06 17:21:51 -05:00
Matt Arsenault	14d25052a2	AMDGPU: Use ImmLeaf for inline immediate predicates	2020-01-06 17:21:51 -05:00
Simon Pilgrim	ea2c159f96	[AMDGPU] Fix "use of uninitialized variable" static analyzer warning. NFCI. Add "unreachable" default case to AMDGPUTargetStreamer::getArchNameFromElfMach	2020-01-06 16:36:56 +00:00
Matt Arsenault	e4464bf3d4	AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR	2020-01-06 11:19:33 -05:00
Matt Arsenault	f1c85ecdfc	AMDGPU/GlobalISel: Select more G_EXTRACTs correctly This assumed a 32-bit extract size, which would produce invalid copies with 64-bit extracts. Handle the easy case. Ideally we would have a way to get the proper subreg index for any 32-bit offset, but there should probably be a tablegenerated way of getting the subreg index for any size and offset.	2020-01-06 11:10:13 -05:00
James Henderson	d68904f957	[NFC] Fix trivial typos in comments Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.	2020-01-06 10:50:26 +00:00
Ehud Katz	c5fb73c5d1	[APFloat] Add recoverable string parsing errors to APFloat Implementing the APFloat part in PR4745. Differential Revision: https://reviews.llvm.org/D69770	2020-01-06 10:09:01 +02:00
Matt Arsenault	d12f2a2998	GlobalISel: Scalarize all division operations This only handled G_SDIV, but they all are trivially scalarizable. Also define placeholder AMDGPU division legalizer rules.	2020-01-04 13:47:10 -05:00
Matt Arsenault	4e972224c4	AMDGPU/GlobalISel: Refine SMRD selection rules Fix selecting these for volatile global loads, and ensure the loads are constant enough.	2020-01-04 12:40:35 -05:00
Matt Arsenault	d9b5063b25	AMDGPU/GlobalISel: Legalize more odd sized loads The attempts to widen sufficently aligned, odd sized loads wasn't consistently applied.	2020-01-04 12:38:39 -05:00
Matt Arsenault	5fb59f16e2	AMDGPU/GlobalISel: Assume vcc phis for any vcc input This produces more intelligible looking results, more comparabble to the DAG output in the simplest cases. This is probably wrong in complex control flow, but RegBankSelect doesn't attempt analyzing if this is on a masked path for selecting the bank yet.	2020-01-04 12:38:11 -05:00
Matt Arsenault	5eed4e2664	AMDGPU/GlobalISel: Implement applyMappingImpl less incorrectly We're checking the current register bank of the registers in the instruction, but the mapping may have inserted cross bank copies and is expecting to replace the registers. We mostly get away with this currently, because VGPR->SGPR copies are illegal, and we assume this won't happen. In a future change, we'll start relying on more cross register bank copies being inserted, and this starts to break down.	2020-01-04 12:36:05 -05:00
alex-t	ca8b20ca3b	[AMDGPU] need to insert wait between the scalar load and vector store to the same address to avoid WAR conflict. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D71934	2020-01-04 18:23:14 +03:00
Stanislav Mekhanoshin	4aa7fb7752	[AMDGPU] Revert scheduling to reduce spilling We can revert region schedule if new schedule decreases occupancy. However, if we already have only one wave we would accept any new schedule even if it blows up register pressure. Such schedule may result in quite heavy spilling which can be avoided if we reject this new schedule. Differential Revision: https://reviews.llvm.org/D72181	2020-01-03 15:20:21 -08:00
Matt Arsenault	21309eafde	GlobalISel: Add type argument to getRegBankFromRegClass AMDGPU can't unambiguously go back from the selected instruction register class to the register bank without knowing if this was used in a boolean context.	2020-01-03 16:25:10 -05:00
Michael Liao	3566c75ca8	[amdgpu] Skip non-instruction values in CF user tracing. Summary: - CF users won't be non-instruction values. Skip them to save the compilation time. It's especially true when there are multiple functions in that module, where, says, a constant may be used in most functions. The current CF user tracing adds significant overhead. Reviewers: alex-t, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72174	2020-01-03 16:02:47 -05:00
Matt Arsenault	9861a8538c	AMDGPU/GlobalISel: Add new utils file There are some things that are shareable between the legalizer, regbankselect, and the selector that don't have an obvious place to go.	2020-01-03 15:25:50 -05:00
Matt Arsenault	92ff017a85	AMDGPU: Only allow regs for s_movrel_{b32\|b64} This would incorrectly allowing folding immediates. These currently aren't selectable, but will be from GlobalISel soon.	2020-01-03 15:25:49 -05:00
Reid Kleckner	9c2b72821b	Move tail call disabling code to target independent code When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in `d9699bc7bd` (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118	2020-01-03 11:27:41 -08:00
Matt Arsenault	53fc484067	AMDGPU/GlobalISel: Fix off by one in operand index This should be looking at the RHS of the add for a constant.	2020-01-03 10:30:30 -05:00
Matt Arsenault	25e7da0c24	AMDGPU/GlobalISel: Remove manual G_FENCE selection The tablegen emitter now handles the immediate operand correctly, so let the generatedd matcher works.	2020-01-02 17:16:10 -05:00
Matt Arsenault	0d9f919b73	DAG: Use TargetConstant for FENCE operands	2020-01-02 17:16:10 -05:00
Jay Foad	13a7a4ccbf	Remove unneeded extra variable realArgIdx. NFC.	2020-01-02 14:27:32 +00:00
Michael Liao	79d401905f	[amdgpu] Fix scoreboard updating on `s_waitcnt_vscnt`. Summary: - Other counters are accidentally cleared. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71866	2019-12-31 14:20:30 -05:00
Craig Topper	787e078f3e	[TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.	2019-12-30 19:36:04 -08:00
Matt Arsenault	7fa0bfe7d5	AMDGPU/GlobalISel: Select mul24 intrinsics	2019-12-30 14:24:25 -05:00
Matt Arsenault	48e0e68edb	AMDGPU/GlobalISel: Re-use MRI available in selector	2019-12-30 13:00:17 -05:00
Matt Arsenault	1247865fe0	AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftz	2019-12-30 11:12:35 -05:00
Matt Arsenault	9fd31fdbd3	GlobalISel: moreElementsVector for FP min/max	2019-12-30 10:39:53 -05:00
Matt Arsenault	9e1a2a668b	AMDGPU: Improve llvm.round.f64 lowering for CI+ The path already used for f16/f32 works a lot better when v_trunc_f64 is available.	2019-12-30 09:55:46 -05:00
Matt Arsenault	491cfa4250	AMDGPU/GlobalISel: Account for G_PHI result bank Sometimes the result bank of the phi is already assigned to something, and should not be ignored. This is in preparation for additional boolean phi handling changes. Also refine the logic to fix some cases that were incorrectly deciding to use SGPRs.	2019-12-30 09:55:21 -05:00
Matt Arsenault	5ce2ca524e	AMDGPU/GlobalISel: Use SReg_32 for readfirstlane constraining This matches the DAG behavior where we don't use SReg_32_XM0 everywhere anymore, and fixes not coalescing the copies into m0.	2019-12-27 17:52:12 -05:00
Matt Arsenault	e29ae3799b	TII: Fix using Register for a subregister index argument	2019-12-27 16:53:29 -05:00
Matt Arsenault	9acd9544db	AMDGPU: Use Register	2019-12-27 16:53:21 -05:00
Matt Arsenault	e088846712	AMDGPU/GlobalISel: Fix extra result register in fdiv64 lowering There ended up being two result registers, which would fail on select. It was really defing a new temp register in the correct def position, instead of the correct result register.	2019-12-27 08:49:43 -05:00
Matt Arsenault	ed9a56b0f2	AMDGPU/GlobalISel: Select some 128-bit load/stores	2019-12-27 08:49:43 -05:00
Matt Arsenault	a37e958558	AMDGPU: Use correct DebugLoc	2019-12-27 08:49:43 -05:00
Matt Arsenault	1f054d667e	AMDGPU/GlobalISel: Fix mapping and selection of llvm.amdgcn.div.fixup	2019-12-24 15:36:29 -05:00
Matt Arsenault	df5c2159d0	AMDGPU/GlobalISel: Legalize some 16-bit round instructions	2019-12-24 09:53:01 -05:00
Matt Arsenault	9035fa6b54	AMDGPU/GlobalISel: Lower llvm.amdgcn.else	2019-12-24 09:53:01 -05:00
Jay Foad	c7c05b0c8a	[AMDGPU] Don't create MachinePointerInfos with an UndefValue pointer Summary: The only useful information the UndefValue conveys is the address space, which MachinePointerInfo can represent directly without referring to an IR value. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71838	2019-12-23 15:58:19 +00:00
Mark de Wever	2d903cc965	[AMDGPU] Fixes -Wrange-loop-analysis warnings This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71815	2019-12-22 19:39:28 +01:00
Matt Arsenault	42a26445f9	AMDGPU/GlobalISel: Fix misuse of div_scale intrinsics Confusingly, the intrinsic operands do not match the instruction/custom node. The order is shuffled, and the 3rd operand is an immediate to select operands. I'm not 100% sure I did this right, but fdiv still doesn't select end to end and it will be easier to tell when it does. This at least avoids an assertion in RegBankSelect and allows hitting the fallback on selection.	2019-12-21 04:55:36 -05:00
Matt Arsenault	dff3f8d742	AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xor	2019-12-21 04:55:36 -05:00
Matt Arsenault	d688a6739d	AMDGPU/GlobalISel: Simplify code This can directly access the register bank, and doesn't need to get it through the ID.	2019-12-21 04:55:36 -05:00
Jay Foad	c5c935ab66	Make more use of MachineInstr::mayLoadOrStore.	2019-12-19 11:51:52 +00:00
Stanislav Mekhanoshin	58578f7056	[AMDGPU] Implemented fma cost analysis Differential Revision: https://reviews.llvm.org/D71676	2019-12-18 23:54:20 -08:00
Stanislav Mekhanoshin	b8ac5894a1	[AMDGPU] Fixed cost model for packed 16 bit ops Differential Revision: https://reviews.llvm.org/D71622	2019-12-17 15:14:17 -08:00
Tom Stellard	c3bc805f4f	AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct Summary: Modify CombineInfo to only store information about a single instruction. This is a little easier to work with and removes a lot of duplicate initialization code. Reviewers: arsenm, nhaehnle Reviewed By: arsenm, nhaehnle Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71045	2019-12-17 13:43:10 -08:00
Jay Foad	0412f518dc	[AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtr Summary: The typo has been present since memOpsHaveSameBasePtr was introduced in r313208. It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than it was supposed to. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71616	2019-12-17 18:54:27 +00:00
Kristof Beyls	870f39d310	Fix assertion failure in getMemOperandWithOffsetWidth This fixes an assertion failure that triggers inside getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr that is not a memory operation. Different backends implement getMemOperandWithOffset differently: some return false on non-memory MachineInstrs, others assert. The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck relies on getMemOperandWithOffset to return false on non-memory MachineInstrs, instead of asserting. This patch updates the documentation on getMemOperandWithOffset that it should return false on any MachineInstr it cannot handle, instead of asserting. It also adapts the in-tree backends accordingly where necessary. Differential Revision: https://reviews.llvm.org/D71359	2019-12-17 10:56:09 +00:00
Guillaume Chatelet	531c1161b9	Resubmit "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove" Summary: This is a resubmit of D71473. This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients. Functions will be deprecated one by one and as in tree code is cleaned up. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: aaron.ballman, courbet Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71547	2019-12-17 10:07:46 +01:00
Guillaume Chatelet	4658da10e4	Revert "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove" This reverts commit `181ab91efc`.	2019-12-16 15:19:49 +01:00
Guillaume Chatelet	181ab91efc	[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove Summary: This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients. Functions will be deprecated one by one and as in tree code is cleaned up. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71473	2019-12-16 13:35:55 +01:00
Jay Foad	f8495017f0	Fix whitespace.	2019-12-16 10:42:34 +00:00
Jay Foad	4f17b1784e	Fix for AMDGPU MUL_I24 known bits calculation Summary: At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number". In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8. Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Patch by Eugene Kuznetsov. Differential Revision: https://reviews.llvm.org/D70367	2019-12-16 10:25:57 +00:00
Tim Renouf	fce1a6f584	Revert "AMDGPU: Try to commute sub of boolean ext" This reverts commit `69fcfb7d35`. As shown in the test I attached to this commit, the change I reverted causes a problem with "zext(cc1) - zext(cc2)". It commuted the operands to the sub and used different logic to select the addc/subc instruction: sub zext (setcc), x => addcarry 0, x, setcc sub sext (setcc), x => subcarry 0, x, setcc ... but that is bogus. I believe it is not possible to fold those commuted patterns into any form of addcarry or subcarry. It may have worked as intended before "AMDGPU: Change boolean content type to 0 or 1" because the setcc was considered to be -1 rather than 1. Differential Revision: https://reviews.llvm.org/D70978 Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43	2019-12-13 12:49:06 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Michael Liao	11b2b2f4b1	[amdgpu] Fix `-Wenum-compare` warning. NFC.	2019-12-12 11:44:16 -05:00
Tom Stellard	bf13a71095	AMDGPU/SILoadStoreOptimizer: Simplify function Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71044	2019-12-12 06:53:03 -08:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Guillaume Chatelet	1b2842bf90	[Alignment][NFC] CreateMemSet use MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71213	2019-12-10 15:17:44 +01:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
Jay Foad	7847986ceb	[AMDGPU][MC] Remove duplicate code introduced in r359316.	2019-12-04 11:44:12 +00:00
David Stuttard	46db606834	AMDGPU: Avoid folding 2 constant operands into an SALU operation Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, with neither operand able to be encoded as an inline constant. Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70896	2019-12-04 10:25:34 +00:00
Tim Renouf	3d5ba7c60f	AMDGPU: Fixed indeterminate map iteration in SIPeepholeSDWA Differential Revision: https://reviews.llvm.org/D70783 Change-Id: Ic26f915a4acb4c00ecefa9d09d7c24cec370ed06	2019-12-02 12:08:49 +00:00
Matt Arsenault	269c1c703d	Fix broken comment phrasing and indentation	2019-12-02 12:23:20 +05:30
Austin Kerbow	cfbbdc83b4	AMDGPU/GlobalISel: Add AGPR bank and RegBankSelect mfma intrinsics Differential Revision: https://reviews.llvm.org/D70871	2019-12-01 22:15:48 -08:00
Austin Kerbow	256ad954a9	AMDGPU: Reuse carry out register during FI elimination Summary: Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add. If only one SGPR was available this crashed when trying to scavenge another 32bit SGPR to materialize the offset. Instead, reuse a 32-bit SGPR from the carry out as the offset register. Also prefer to use vcc for the unused carry out when it is available. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70614	2019-11-28 10:13:48 -08:00
vpykhtin	008e65a7bf	[AMDGPU] Fix emitIfBreak CF lowering: use temp reg to make register coalescer life easier. Differential revision: https://reviews.llvm.org/D70405	2019-11-26 18:59:37 +03:00
Jay Foad	357bd914a1	[AMDGPU] Fix function name in debug output	2019-11-25 15:22:04 +00:00
Austin Kerbow	fef69706dc	AMDGPU: Handle waitcnt overflow Summary: The waitcnt pass can overflow the counters when the number of outstanding events for a type exceed the capacity of the counter. This can lead to inefficient insertion of waitcnts, or to waitcnt instructions with max values for each type. The last situation can cause an instruction which when disassembled appears to be an illegal waitcnt without an operand. In these cases we should add a wait for the 'counter maximum' - 1, and update the waitcnt brackets accordingly. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70418	2019-11-23 09:34:23 -08:00
Tom Stellard	ab411801b8	[cmake] Explicitly mark libraries defined in lib/ as "Component Libraries" Summary: Most libraries are defined in the lib/ directory but there are also a few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining "Component Libraries" as libraries defined in lib/ that may be included in libLLVM.so. Explicitly marking the libraries in lib/ as component libraries allows us to remove some fragile checks that attempt to differentiate between lib/ libraries and tools/ libraires: 1. In tools/llvm-shlib, because llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of all libraries defined in the whole project, there was custom code needed to filter out libraries defined in tools/, none of which should be included in libLLVM.so. This code assumed that any library defined as static was from lib/ and everything else should be excluded. With this change, llvm_map_components_to_libnames(LIB_NAMES, "all") only returns libraries that have been added to the LLVM_COMPONENT_LIBS global cmake property, so this custom filtering logic can be removed. Doing this also fixes the build with BUILD_SHARED_LIBS=ON and LLVM_BUILD_LLVM_DYLIB=ON. 2. There was some code in llvm_add_library that assumed that libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or ARG_LINK_COMPONENTS set. This is only true because libraries defined lib lib/ use LLVMBuild.txt and don't set these values. This code has been fixed now to check if the library has been explicitly marked as a component library, which should now make it easier to remove LLVMBuild at some point in the future. I have tested this patch on Windows, MacOS and Linux with release builds and the following combinations of CMake options: - "" (No options) - -DLLVM_BUILD_LLVM_DYLIB=ON - -DLLVM_LINK_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON Reviewers: beanz, smeenai, compnerd, phosek Reviewed By: beanz Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70179	2019-11-21 10:48:08 -08:00
Tim Corringham	6821a3ccd6	[AMDGPU] Add attribute for target loop unroll threshold default Summary: Add a function attribute to allow the target specific default loop unroll threshold to be specified on a per-function basis. This allows a front-end to give guidance where it has insight that is not available to the back-end, while still allowing the target specific heuristics to also have an effect. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68873	2019-11-21 09:47:28 +00:00
Piotr Sobczak	4a801170f3	[AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69794	2019-11-20 22:59:30 +01:00
Michael Liao	4a308d302c	[AMDGPU] Keep consistent check of legal addressing mode. Summary: - Add test cases for GFX10, which has narrower offset range compared to GFX9. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70473	2019-11-20 15:08:17 -05:00
Dmitry Preobrazhensky	6778a62eb0	[AMDGPU][GFX10] Disabled v_movrel*[sdwa\|dpp] opcodes in codegen These opcodes use indirect register addressing so they need special handling by codegen (currently missing). Reviewers: vpykhtin, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70400	2019-11-20 17:57:50 +03:00
Dmitry Preobrazhensky	525f9c0be5	[AMDGPU][DPP] Corrected DPP combiner Added a check to make sure that the selected dpp opcode is supported by target. Reviewers: vpykhtin, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70402	2019-11-20 15:56:45 +03:00
Sameer Sahasrabuddhe	52c5014da0	[AMDGPU] add support for hostcall buffer pointer as hidden kernel argument Hostcall is a service that allows a kernel to submit requests to the host using shared buffers, and block until a response is received. This will eventually replace the shared buffer currently used for printf, and repurposes the same hidden kernel argument. This change introduces a new ValueKind in the HSA metadata to represent the hostcall buffer. Differential Revision: https://reviews.llvm.org/D70038	2019-11-20 15:53:55 +05:30
Austin Kerbow	f3225f2abe	AMDGPU/GlobalISel: Legalize FDIV64 Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70403	2019-11-19 21:02:27 -08:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
Matt Arsenault	ea23b6428b	AMDGPU: Be explicit about denormal mode in MIR tests Start checking the machine function in GlobalISel instead of the target directly. This temporarily breaks fcanonicalize selection in GlobalISel.	2019-11-19 19:55:43 +05:30
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
dfukalov	6fd11b14f6	[AMDGPU] Tune inlining parameters for AMDGPU target (part 2) Summary: Most of IR instructions got better code size estimations after commit `47a5c36b`. So default parameters values should be updated to improve inlining and unrolling for the target. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70391	2019-11-19 16:33:16 +03:00
Dmitry Preobrazhensky	edd9f70163	[AMDGPU][MC][GFX10] Enabled v_movrel*[sdwa\|dpp\|dpp8] opcodes See https://bugs.llvm.org/show_bug.cgi?id=43712 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70170	2019-11-18 17:23:40 +03:00
Nicolai Hähnle	d8f7c68e28	AMDGPU/SILoadStoreOptimizer: fix a likely bug introduced recently Summary: We should check for same instruction class before checking whether they have the same base address, else we might iterate out of bounds of a MachineInstr operands list. The InstClass check is also cheaper. This was introduced in SVN r373630. Reviewers: tstellar Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68690	2019-11-16 11:35:34 +01:00
Piotr Sobczak	02419ab5c7	[AMDGPU] Lower llvm.amdgcn.s.buffer.load.v3[i\|f]32 Summary: Add lowering support for 32-bit vec3 variant of s.buffer.load intrinsic. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70118	2019-11-15 15:01:15 +01:00
Matt Arsenault	31479d868e	AMDGPU: Change boolean content type to 0 or 1 The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.	2019-11-15 13:43:47 +05:30
Matt Arsenault	69fcfb7d35	AMDGPU: Try to commute sub of boolean ext Avoids another regression in a future patch.	2019-11-15 13:43:42 +05:30
Matt Arsenault	bc276c6379	GlobalISel: Lower s1 source G_SITOFP/G_UITOFP	2019-11-15 13:37:20 +05:30
Reid Kleckner	05da2fe521	Sink all InitializePasses.h includes This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211	2019-11-13 16:34:37 -08:00
Matt Arsenault	9d7bccab66	AMDGPU: Extend add x, (ext setcc) combine to sub This is the same as the add case, but inverts the operation type. This avoids regressions in a future patch.	2019-11-13 07:13:58 +05:30
Matt Arsenault	4b47213951	AMDGPU: Switch backend default max workgroup size to 1024 Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.	2019-11-13 07:11:02 +05:30
Matt Arsenault	25c5da5a42	AMDGPU Reduce reported maximum group size to 1024 While some targets allow encoding 2048, this was never tested or supported.	2019-11-13 06:34:28 +05:30
Fangrui Song	3c4f8bb108	AMDGPU/SI: make ~SIScheduleBlockCreator trivial	2019-11-11 21:51:59 -08:00
Matt Arsenault	e6c9a9af39	Use MCRegister in copyPhysReg	2019-11-11 14:42:33 +05:30
Simon Pilgrim	dda8015434	Remove duplicate MemVT to fix shadow variable warning. NFCI.	2019-11-09 13:01:04 +00:00
Simon Pilgrim	a35a44fd4b	Remove superfluous break after return. NFC.	2019-11-09 13:01:03 +00:00
Simon Pilgrim	59a14f9d4b	Fix shadow variable warning by reducing scope of CC/InverseCC CondCodes. NFCI.	2019-11-09 13:01:03 +00:00
Dmitry Preobrazhensky	e25bc5e024	[AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32 See https://bugs.llvm.org/show_bug.cgi?id=40903 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69888	2019-11-08 16:38:56 +03:00
dfukalov	6e8251046b	[AMDGPU] Fix bug introduced in `47a5c36b37` Summary: [AMDGPU] Fix bug introduced in `47a5c36b37` Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69915	2019-11-07 11:50:14 +03:00
Matt Arsenault	e16a71382d	AMDGPU: Select global atomicrmw fadd This only works if there is no use of the return value.	2019-11-06 16:06:38 -08:00
Stanislav Mekhanoshin	d17bcf2bb9	[AMDGPU] Add handling of 160 bit registers in analyzeResourceUsage This was omitted. Also SReg_96Reg missed IsSGPR assignment. Differential Revision: https://reviews.llvm.org/D69919	2019-11-06 15:47:32 -08:00
dfukalov	47a5c36b37	[AMDGPU] Improve code size cost model (part 2) Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629	2019-11-06 13:55:48 +03:00
Stanislav Mekhanoshin	521fc5e620	[AMDGPU] Add missing flags to DS_Real Differential Revision: https://reviews.llvm.org/D69867	2019-11-05 14:24:48 -08:00
Stanislav Mekhanoshin	f2e7679d0f	[AMDGPU] Removed dead code from R600ISelLowering.cpp This was added to inhibit a warning from gcc 7.3 according to the comment. However, it triggers warning from PVS. In addition I cannot reproduce it with gcc 7.4 and I also cannot reproduce it with gcc 7.3 using compiler explorer. Differential Revision: https://reviews.llvm.org/D69863	2019-11-05 12:02:48 -08:00
Stanislav Mekhanoshin	4f12ba50bb	[AMDGPU] Removed dead code handling M0CopyReg Static analyzer complains about always false condition. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69860	2019-11-05 11:05:13 -08:00
Daniel Sanders	e74c5b9661	[globalisel] Rename G_GEP to G_PTR_ADD Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734	2019-11-05 10:31:17 -08:00
Stanislav Mekhanoshin	de56a89072	[AMDGPU] return Fail instead of SolfFail from addOperand() addOperand() method of AMDGPU disassembler returns SoftFail on error. All instances which may lead to that place are an impossible encdoing, not something which is possible to encode, but semantically incorrect as described for SoftFail. Then tablegen generates a check of the following form: if (Decode...(..) == MCDisassembler::Fail) { return MCDisassembler::Fail; } Since we can only return Success and SoftFail that is dead code as detected by the static code analyzer. Solution: return Fail as it should be. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69819	2019-11-05 10:25:27 -08:00
Stanislav Mekhanoshin	1bfcc60828	[AMDGPU] Added assert in SIFoldOperands before ptr use. NFC.	2019-11-04 13:31:21 -08:00
Stanislav Mekhanoshin	4312c4afd4	[AMDGPU] deduplicate tablegen predicates We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815	2019-11-04 12:19:17 -08:00
Dávid Bolvanský	3fbd1c00b0	[SIMachineScheduler] Fixed ''then' statement is equivalent to the 'else' statement.' warning. NFCI.	2019-11-03 20:40:53 +01:00
Dávid Bolvanský	c3d6f0ddee	[SILoadStoreOptimizer] Fixed typo. NFCI.	2019-11-03 20:38:29 +01:00
Michael Liao	4531aee2ac	[amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D69735	2019-11-01 17:06:17 -04:00
Matt Arsenault	19e7f8a21d	AMDGPU: Add default denormal mode to MachineFunctionInfo The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineFunctionInfo to help move towards this goal.	2019-11-01 00:03:39 -07:00
Matt Arsenault	6221767055	DAG: Add DAG argument to isFPExtFoldable For AMDGPU this is dependent on the FP mode, which should eventually not be a property of the subtarget.	2019-10-31 22:32:45 -07:00
Matt Arsenault	1725f28841	DAG: Add new control for ISD::FMAD formation For AMDGPU this depends on whether denormals are enabled in the default FP mode for the function. Currently this is treated as a subtarget feature, so FMAD is selectively legal based on that. I want to move this out of the subtarget features so this can be controlled with a denormal mode attribute. Additionally, this will allow folding based on a future ftz fast math flag.	2019-10-31 07:51:38 -07:00
Matt Arsenault	bc56166281	AMDGPU: Simplify getAddressSpace calls These can be directly taken from the GlobalValue instead of going through the type.	2019-10-31 07:51:38 -07:00
Matt Arsenault	d9e0a2942a	AMDGPU: Disallow spill folding with m0 copies readlane and writelane instructions are not allowed to use m0 as the data operand, so spilling them is tricky and would require an intermediate SGPR to spill it. Constrain the virtual register class in this caes to disallow the inline spiller from folding the m0 operand directly into the spill instruction. I copied this hack from AArch64 which has the same problem for $sp.	2019-10-30 14:56:33 -07:00
Matt Arsenault	edca9ac0de	AMDGPU: Don't fold S_NOPs with implicit operands	2019-10-30 14:40:56 -07:00
Jay Foad	e5972f2a04	[AMDGPU] Simplify VCCZ bug handling Summary: VCCZBugHandledSet was used to make sure we don't apply the same workaround more than once to a single cbranch instruction, but it's not necessary because the workaround involves inserting an s_waitcnt instruction, which is enough for subsequent iterations to detect that no further workaround is necessary. Also beef up the test case to check that the workaround was only applied once. I have also manually verified that the test still passes even if I hack the big do-while loop in runOnMachineFunction to run a minimum of five iterations. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69621	2019-10-30 17:09:07 +00:00
Jay Foad	b592253ec6	[AMDGPU] Consolidate one more getGeneration check This one should have been done in r363902 when hasReadVCCZBug was introduced.	2019-10-30 11:16:42 +00:00
Austin Kerbow	2b88b344f2	AMDGPU/GlobalISel: Legalize FDIV32 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69581	2019-10-29 17:18:06 -07:00
Matt Arsenault	21bc8e5a13	AMDGPU: Make VReg_1 only include 1 artificial register When TableGen is inferring register classes from contexts, it uses a sorting function based on the number of registers in the class. Since this was being treated as an alias of VGPR_32, they had exactly the same size. The sort used wasn't a stable sort, and even if it were, I believe the tie breaker would effectively end up being the alphabetical ordering of the class name. There appear to be issues trying to use an empty set of registers, so add only one so this will always sort to the end. Also add a comment explaining how VReg_1 is a dirty hack for SelectionDAG. This does end up changing the behavior of i1 with inline asm and VGPR constraints, but the existing behavior was was already nonsensical and inconsistent. It should probably be disallowed anyway. Fixes bug 43699	2019-10-28 20:51:51 -07:00
Austin Kerbow	d11b93ec6a	AMDGPU: Avoid overwriting saved PC Summary: An outstanding load with same destination sgpr as call could cause PC to be updated with junk value on return. Reviewers: arsenm, rampitec Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69474	2019-10-28 10:02:22 -07:00
Dmitry Preobrazhensky	b8042dbe2b	[AMDGPU][MC][GFX10] Added v_interp_[p1/p2/mov]_f32_e64 See https://bugs.llvm.org/show_bug.cgi?id=43747 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69348	2019-10-28 15:03:43 +03:00
cdevadas	e921ede540	[AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies. There is a minor flaw in the implementation of function lowerPhis. This function replaces values of regclass Vreg_1 (boolean values) involved in PHIs into an SGPR. Currently it iterates over the MBBs and performs an inplace lowering of PHIs and fails to lower any incoming value that itself is another PHI of Vreg_1 regclass. The failure occurs only when the MBB where the incoming PHI value belongs is not visited/lowered yet. To fix this problem, collect all Vreg_1 PHIs upfront and then perform the lowering. Differential Revision: https://reviews.llvm.org/D69182	2019-10-26 14:37:45 +05:30
Stanislav Mekhanoshin	4c0251da14	[AMDGPU] Enable SGPR copy folding That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445	2019-10-25 15:08:30 -07:00
Stanislav Mekhanoshin	c7dcacf16a	[AMDGPU] Fixed asan failure in SIFoldOperands Both tryFoldOMod() and tryFoldClamp() remove original instruction, so the check MI.modifiesRegister() may use a deleted MI. Differential Revision: https://reviews.llvm.org/D69448	2019-10-25 13:59:56 -07:00
Matt Arsenault	171cf5302f	AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.	2019-10-25 13:11:09 -07:00
Changpeng Fang	1ce552f3ef	AMDGPU: Fix the broken dominator tree when creating waterfall loop for resource descriptor Summary: In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D69358	2019-10-25 13:08:04 -07:00
Stanislav Mekhanoshin	d4303b3861	[AMDGPU] Fold AGPR reg_sequence initializers Differential Revision: https://reviews.llvm.org/D69413	2019-10-25 11:39:02 -07:00
vpykhtin	c9c18e5a31	[AMDGPU] Disallow dpp combining for dpp instructions without Src2 operand (when Src2 is required) Differential revision: https://reviews.llvm.org/D69430	2019-10-25 21:30:37 +03:00
Austin Kerbow	c35b358b74	AMDGPU/GlobalISel: Legalize FDIV16 Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347	2019-10-25 11:07:17 -07:00
Stanislav Mekhanoshin	3c8e055187	[AMDGPU] Fix mfma scheduling crash An SUnit can be neither intruction not SDNode. It is all null if represents a nop. Fixed a crash on using SU->getInstr(). Differential Revision: https://reviews.llvm.org/D69395	2019-10-24 11:01:52 -07:00
dfukalov	6d0fc4373e	[NFC] Remove redundant lines Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69375	2019-10-24 19:54:28 +03:00
Michael Liao	b2a65f0d70	[AMDGPU] Skip additional folding on the same operand. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355	2019-10-24 11:30:22 -04:00
Stanislav Mekhanoshin	61e7a61bdc	[AMDGPU] Allow folding of sgpr to vgpr copy Potentially sgpr to sgpr copy should also be possible. That is however trickier because we may end up with a wrong register class at use because of xm0/xexec permutations. Differential Revision: https://reviews.llvm.org/D69280	2019-10-23 18:42:48 -07:00
Mirko Brkusanin	4b63ca1379	[Mips] Use appropriate private label prefix based on Mips ABI MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64 regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo we can find out Mips ABI and pick appropriate prefix. Tags: #llvm, #clang, #lldb Differential Revision: https://reviews.llvm.org/D66795	2019-10-23 12:24:35 +02:00
Stanislav Mekhanoshin	48f57138be	[AMDGPU] Allow tied operand subreg folding Turns out it makes sense, contrarily to what comment said. Differential Revision: https://reviews.llvm.org/D69287	2019-10-22 11:27:36 -07:00
Austin Kerbow	97263fa2dd	AMDGPU/GlobalISel: Legalize fast unsafe FDIV Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69231 llvm-svn: 375460	2019-10-21 22:18:26 +00:00
Matt Arsenault	ef9a0278f0	AMDGPU: Select basic interp directly from intrinsics llvm-svn: 375457	2019-10-21 21:49:44 +00:00
Matt Arsenault	38038f116f	AMDGPU: Use CopyToReg for interp intrinsic lowering This doesn't use the default value, so doesn't benefit from the hack to help optimize it. llvm-svn: 375450	2019-10-21 19:53:49 +00:00
Matt Arsenault	8ebbf25cb1	AMDGPU: Erase redundant redefs of m0 in SIFoldOperands Only handle simple inter-block redefs of m0 to the same value. This avoids interference from redefs of m0 in SILoadStoreOptimzer. I was initially teaching that pass to ignore redefs of m0, but having them not exist beforehand is much simpler. This is in preparation for deleting the current special m0 handling in SIFixSGPRCopies to allow the register coalescer to handle the difficult cases. llvm-svn: 375449	2019-10-21 19:53:46 +00:00
Matt Arsenault	dd6cf159ba	AMDGPU: Stop adding m0 implicit def to SGPR spills r375293 removed the SGPR spilling with scalar stores path, so this is no longer necessary. This also always had the defect of adding the def even when this path wasn't in use. llvm-svn: 375448	2019-10-21 19:42:29 +00:00
Matt Arsenault	b5234b64af	AMDGPU: Slightly restructure m0 init code This will allow using another operation to produce the glue in a future change. llvm-svn: 375447	2019-10-21 19:42:26 +00:00
Stanislav Mekhanoshin	33092194f2	[AMDGPU] Select AGPR in PHI operand legalization If a PHI defines AGPR legalize its operands to AGPR. At the moment we can get an AGPR PHI with VGPR operands. I am not aware of any problems as it seems to be handled gracefully in RA, but this is not right anyway. It also slightly decreases VGPR pressure in some cases because we do not have to a copy via VGPR. Differential Revision: https://reviews.llvm.org/D69206 llvm-svn: 375446	2019-10-21 19:25:27 +00:00
Guillaume Chatelet	3cc4835c00	Use Align for TFL::TransientStackAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69216 llvm-svn: 375398	2019-10-21 08:31:25 +00:00
Zinovy Nis	5fa36e42c4	Fix buildbot error in SIRegisterInfo.cpp. llvm-svn: 375373	2019-10-20 20:01:16 +00:00
Matt Arsenault	e5be543a55	AMDGPU: Increase vcc liveness scan threshold Avoids a test regression in a future patch. Also add debug printing on this case, so I waste less time debugging folds in the future. llvm-svn: 375367	2019-10-20 17:44:17 +00:00
Matt Arsenault	7cd57dcd5b	AMDGPU: Split flat offsets that don't fit in DAG We handle it this way for some other address spaces. Since r349196, SILoadStoreOptimizer has been trying to do this. This is after SIFoldOperands runs, which can change the addressing patterns. It's simpler to just split this earlier. llvm-svn: 375366	2019-10-20 17:34:44 +00:00
Matt Arsenault	1aad3835f8	AMDGPU: Fix missing OPERAND_IMMEDIATE llvm-svn: 375365	2019-10-20 16:56:10 +00:00
Matt Arsenault	fc205f1d11	AMDGPU: Don't re-get the subtarget It's already available in the class. llvm-svn: 375363	2019-10-20 16:26:26 +00:00
Matt Arsenault	8a8b317460	AMDGPU: Don't error on calls to null or undef Calls to constants should probably be generally handled. llvm-svn: 375356	2019-10-20 07:46:04 +00:00
Reid Kleckner	904cd3e06b	Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each Now X86ISelLowering doesn't depend on many IR analyses. llvm-svn: 375320	2019-10-19 01:31:09 +00:00
Reid Kleckner	1d7b41361f	Prune two MachineInstr.h includes, fix up deps MachineInstr.h included AliasAnalysis.h, which includes a world of IR constructs mostly unneeded in CodeGen. Prune it. Same for DebugInfoMetadata.h. Noticed with -ftime-trace. llvm-svn: 375311	2019-10-19 00:22:07 +00:00
Stanislav Mekhanoshin	0fab220eb5	[AMDGPU] move PHI nodes to AGPR class If all uses of a PHI are in AGPR register class we should avoid unneeded copies via VGPRs. Differential Revision: https://reviews.llvm.org/D69200 llvm-svn: 375297	2019-10-18 22:48:45 +00:00
Jay Foad	a9aa4ec6a3	[AMDGPU] Remove -amdgpu-spill-sgpr-to-smem. Summary: The implementation was never completed and never used except in tests. Reviewers: arsenm, mareko Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69163 llvm-svn: 375293	2019-10-18 21:48:22 +00:00
Quentin Colombet	9f9151d494	[GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual method The default implementation of isIncomingArgumentHandler could lead to generating incorrect code. Make it a pure virtual method, so that targets know they have to override it to produce correct code. NFC Differential Revision: https://reviews.llvm.org/D69187 llvm-svn: 375277	2019-10-18 20:13:42 +00:00
Matt Arsenault	f9a42ed0a7	AMDGPU: Relax 32-bit SGPR register class Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267	2019-10-18 18:26:37 +00:00
Austin Kerbow	2f41a023af	AMDGPU: Fix SMEM WAR hazard for gfx10 readlane Summary: Hazard recognizer fails to see hazard with V_READLANE_B32_gfx10. Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69172 llvm-svn: 375265	2019-10-18 18:20:30 +00:00
Dmitry Preobrazhensky	6c7d7eebda	[AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32 See https://bugs.llvm.org/show_bug.cgi?id=43608 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69096 llvm-svn: 375241	2019-10-18 14:49:53 +00:00
Dmitry Preobrazhensky	7d325fe57b	[AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwa See https://bugs.llvm.org/show_bug.cgi?id=43607 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69095 llvm-svn: 375231	2019-10-18 13:31:53 +00:00
Stanislav Mekhanoshin	befab66a2c	[AMDGPU] drop getIsFP td helper We already have isFloatType helper, and they are out of sync. Drop one and merge the type list. Differential Revision: https://reviews.llvm.org/D69138 llvm-svn: 375175	2019-10-17 21:46:56 +00:00
Daniil Fukalov	3972057511	[AMDGPU] Improve code size cost model Summary: Added estimation for zero size insertelement, extractelement and llvm.fabs operators. Updated inline/unroll parameters default values. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68881 llvm-svn: 375109	2019-10-17 12:15:35 +00:00
Guillaume Chatelet	882c43d703	[Alignment][NFC] Use Align for TargetFrameLowering/Subtarget Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68993 llvm-svn: 375084	2019-10-17 07:49:39 +00:00
Matt Arsenault	34ed76e180	GlobalISel: Implement lower for G_SADDO/G_SSUBO Port directly from SelectionDAG, minus the path using ISD::SADDSAT/ISD::SSUBSAT. llvm-svn: 375042	2019-10-16 20:46:32 +00:00
Stanislav Mekhanoshin	edcd5815ce	[AMDGPU] Do not combine dpp mov reading physregs We cannot be sure physregs will stay unchanged. Differential Revision: https://reviews.llvm.org/D69065 llvm-svn: 375033	2019-10-16 19:28:25 +00:00
Stanislav Mekhanoshin	3d99310c15	[AMDGPU] Do not combine dpp with physreg def We will remove dpp mov along with the physreg def otherwise. Differential Revision: https://reviews.llvm.org/D69063 llvm-svn: 375030	2019-10-16 18:48:54 +00:00
Stanislav Mekhanoshin	d4ab74ee0b	[AMDGPU] Supress unused sdwa insts generation Do not generate non-existing sdwa instructions. It reduces the number of generated instructions by 185. Differential Revision: https://reviews.llvm.org/D69010 llvm-svn: 375016	2019-10-16 16:58:06 +00:00
David Stuttard	2d6a2303f8	[AMDGPU] Fix-up cases where writelane has 2 SGPR operands Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of suitable constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). Reviewers: rampitec, tpr, arsenm, nhaehnle Reviewed By: rampitec, arsenm, nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D51932 Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea llvm-svn: 375004	2019-10-16 14:37:39 +00:00
Piotr Sobczak	02baaca742	[AMDGPU] Extend the SI Load/Store optimizer Summary: Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle different flavours of image_load and image_sample instructions. When the instructions of the same subclass differ only in dmask, merge them and update dmask accordingly. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64911 llvm-svn: 374984	2019-10-16 10:17:02 +00:00
Austin Kerbow	527e9f9a3f	AMDGPU: Fix infinite searches in SIFixSGPRCopies Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies. The first condition involves a REG_SEQUENCE that uses registers defined by both a PHI and a COPY. The second condition arises when a physical register is copied to a virtual register which is then used in a PHI node. If the same virtual register is copied to the same physical register, the result is an endless loop. %0:sgpr_64 = COPY $sgpr0_sgpr1 %2 = PHI %0, %bb.0, %1, %bb.1 $sgpr0_sgpr1 = COPY %0 Reviewers: alex-t, rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68970 llvm-svn: 374944	2019-10-15 19:59:45 +00:00
Stanislav Mekhanoshin	1184c27fa5	[AMDGPU] Support mov dpp with 64 bit operands We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910	2019-10-15 16:41:15 +00:00
Stanislav Mekhanoshin	6e8599d939	[AMDGPU] Allow DPP combiner to work with REG_SEQUENCE Differential Revision: https://reviews.llvm.org/D68828 llvm-svn: 374908	2019-10-15 16:17:50 +00:00
Guillaume Chatelet	b65fa48305	[Alignment] Migrate Attribute::getWith(Stack)Alignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Reviewed By: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68792 llvm-svn: 374884	2019-10-15 12:56:24 +00:00
Guillaume Chatelet	0e62011df8	[Alignment][NFC] Remove dependency on GlobalObject::setAlignment(unsigned) Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, mehdi_amini, jvesely, nhaehnle, hiraditya, steven_wu, dexonsmith, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68944 llvm-svn: 374880	2019-10-15 11:24:36 +00:00
Matt Arsenault	2bd166ad94	AMDGPU: Fix redundant setting of m0 for atomic load/store Atomic load/store would have their setting of m0 handled twice, which happened to be optimized out later. llvm-svn: 374801	2019-10-14 18:30:31 +00:00
Alexander Timofeev	c4d256a590	[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767	2019-10-14 12:01:10 +00:00
Stanislav Mekhanoshin	e2d104f64c	[AMDGPU] link dpp pseudos and real instructions on gfx10 This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 llvm-svn: 374604	2019-10-11 22:03:36 +00:00
Dmitry Preobrazhensky	c4995076c6	[AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ds_[read/write]_addtid_b32 See https://bugs.llvm.org/show_bug.cgi?id=37941 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68787 llvm-svn: 374561	2019-10-11 14:53:26 +00:00
Dmitry Preobrazhensky	b82fae01ea	[AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions buffer_atomic_[fcmpswap/fmin/fmax]* See https://bugs.llvm.org/show_bug.cgi?id=28232 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68788 llvm-svn: 374559	2019-10-11 14:44:51 +00:00
Dmitry Preobrazhensky	472c6b0aa0	[AMDGPU][MC][GFX10] Enabled null for 64-bit dst operands See https://bugs.llvm.org/show_bug.cgi?id=43524 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68785 llvm-svn: 374557	2019-10-11 14:35:11 +00:00
Dmitry Preobrazhensky	882c3e3db5	[AMDGPU][MC] Corrected parsing of optional operands See https://bugs.llvm.org/show_bug.cgi?id=43486 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D68350 llvm-svn: 374553	2019-10-11 14:05:09 +00:00
Matt Arsenault	4227c62bc7	AMDGPU: Move SelectFlatOffset back into AMDGPUISelDAGToDAG llvm-svn: 374495	2019-10-11 01:28:27 +00:00
Stanislav Mekhanoshin	19a1a739b1	[AMDGPU] Handle undef old operand in DPP combine It was missing an undef flag. Differential Revision: https://reviews.llvm.org/D68813 llvm-svn: 374455	2019-10-10 21:32:41 +00:00
Jay Foad	765055658c	Revert "[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs." Summary: This has been superseded by "[AMDGPU]: PHI Elimination hooks added for custom COPY insertion." This reverts the code changes from commit `53f967f2bd` but keeps the test case. Reviewers: hliao, arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68769 llvm-svn: 374347	2019-10-10 13:34:31 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Matt Arsenault	f8bf7d7f42	AMDGPU: Don't fold copies to physregs In a future patch, this will help cleanup m0 handling. The register coalescer handles copies from a register that materializes an immediate, but doesn't handle move immediates itself. The virtual register uses will often be allocated to the same register, so there end up being no real copy. llvm-svn: 374257	2019-10-09 22:51:42 +00:00
Matt Arsenault	85dfa82302	AMDGPU/GlobalISel: Fix crash on wide constant load with VGPR pointer This was ignoring the register bank of the input pointer, and isUniformMMO seems overly aggressive. This will now conservatively assume a VGPR in cases where the incoming bank hasn't been determined yet (i.e. is from a loop phi). llvm-svn: 374255	2019-10-09 22:44:49 +00:00
Matt Arsenault	001826835a	AMDGPU: Relax register classes used llvm-svn: 374254	2019-10-09 22:44:48 +00:00
Matt Arsenault	e114be608f	AMDGPU: Fix typos llvm-svn: 374253	2019-10-09 22:44:47 +00:00
Matt Arsenault	3cd3959fe2	GlobalISel: Implement fewerElementsVector for G_BUILD_VECTOR Turn it into a G_CONCAT_VECTORS of G_BUILD_VECTOR. llvm-svn: 374252	2019-10-09 22:44:43 +00:00
Stanislav Mekhanoshin	c6dec1d828	[AMDGPU] Fixed dpp combine of VOP1 If original instruction did not have source modifiers they were not added to the new DPP instruction as well, even if needed. Differential Revision: https://reviews.llvm.org/D68729 llvm-svn: 374241	2019-10-09 22:02:58 +00:00
Vitaly Buka	c001144b10	[System Model] [TTI] Define AMDGPUTTIImpl::getST and AMDGPUTTIImpl::getTLI To fix "infinite recursion" warning. llvm-svn: 374222	2019-10-09 20:48:54 +00:00
Evandro Menezes	c57a9dc487	[AMDGPU] Use math constants defined in MathExtras (NFC) Use the the new math constants in `MathExtras.h`. Differential revision: https://reviews.llvm.org/D68285 llvm-svn: 374208	2019-10-09 20:00:43 +00:00
Matt Arsenault	190a17bbd1	AMDGPU: Fix i16 arithmetic pattern redundancy There were 2 problems here. First, these patterns were duplicated to handle the inverted shift operands instead of using the commuted PatFrags. Second, the point of the zext folding patterns don't apply to the non-0ing high subtargets. They should be skipped instead of inserting the extension. The zeroing high code would be emitted when necessary anyway. This was also emitting unnecessary zexts in cases where the high bits were undefined. llvm-svn: 374092	2019-10-08 17:36:38 +00:00
Tom Stellard	3a8d80944b	AMDGPU: Add offsets to MMO when lowering buffer intrinsics Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087	2019-10-08 17:04:51 +00:00
Stanislav Mekhanoshin	8f002193bf	[AMDGPU] Disable unused gfx10 dpp instructions Inhibit generation of unused real dpp instructions on gfx10 just like it is done on other subtargets. This does not change anything because these are illegal anyway and not accepted, but it does reduce the number of instruction definitions generated. Differential Revision: https://reviews.llvm.org/D68607 llvm-svn: 374083	2019-10-08 16:56:01 +00:00
Nicolai Haehnle	df6e67697b	AMDGPU: Propagate undef flag during pre-RA exec mask optimizations Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68184 llvm-svn: 374041	2019-10-08 12:46:32 +00:00
Matt Arsenault	c8a6df7130	AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources llvm-svn: 373989	2019-10-07 23:33:08 +00:00
Matt Arsenault	538b73b797	AMDGPU/GlobalISel: Handle more G_INSERT cases Start manually writing a table to get the subreg index. TableGen should probably generate this, but I'm not sure what it looks like in the arbitrary case where subregisters are allowed to not fully cover the super-registers. llvm-svn: 373947	2019-10-07 19:16:26 +00:00
Matt Arsenault	4bcdcad91b	GlobalISel: Partially implement lower for G_INSERT llvm-svn: 373946	2019-10-07 19:13:27 +00:00
Matt Arsenault	1237aa2996	AMDGPU/GlobalISel: Fix selection of 16-bit shifts llvm-svn: 373945	2019-10-07 19:10:44 +00:00
Matt Arsenault	09ec6918bc	AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32 llvm-svn: 373944	2019-10-07 19:10:43 +00:00
Matt Arsenault	0b2ea91d6d	AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants This hides some defects in SIFoldOperands when the immediates are split. llvm-svn: 373943	2019-10-07 19:07:19 +00:00
Matt Arsenault	578fa2819f	AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources Continue making a mess of merge/unmerge legality. llvm-svn: 373942	2019-10-07 19:05:58 +00:00
Matt Arsenault	b4cbf9862c	AMDGPU/GlobalISel: Select more G_INSERT cases At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938	2019-10-07 18:43:31 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Jordan Rose	fdaa742174	Second attempt to add iterator_range::empty() Doing this makes MSVC complain that `empty(someRange)` could refer to either C++17's std::empty or LLVM's llvm::empty, which previously we avoided via SFINAE because std::empty is defined in terms of an empty member rather than begin and end. So, switch callers over to the new method as it is added. https://reviews.llvm.org/D68439 llvm-svn: 373935	2019-10-07 18:14:24 +00:00
Matt Arsenault	e59296a051	AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsets llvm-svn: 373842	2019-10-06 01:41:22 +00:00
Matt Arsenault	786a3953ba	AMDGPU/GlobalISel: RegBankSelect mul24 intrinsics llvm-svn: 373841	2019-10-06 01:37:39 +00:00
Matt Arsenault	c0ec72d4f8	AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics llvm-svn: 373840	2019-10-06 01:37:38 +00:00
Matt Arsenault	bcd6b1d209	AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS llvm-svn: 373839	2019-10-06 01:37:37 +00:00
Matt Arsenault	a5b9c75674	GlobalISel: Partially implement lower for G_EXTRACT Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838	2019-10-06 01:37:35 +00:00
Matt Arsenault	69c65a8609	AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics This wasn't updated for the immarg handling change. llvm-svn: 373837	2019-10-06 01:37:34 +00:00
Huihui Zhang	da9e252491	[NFC] Add { } to silence compiler warning [-Wmissing-braces]. ../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} llvm-svn: 373784	2019-10-04 20:04:34 +00:00
Dmitry Preobrazhensky	434d59250e	[AMDGPU][MC][GFX10][WS32] Corrected decoding of dst operand for v_cmp_*_sdwa opcodes See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68349 llvm-svn: 373745	2019-10-04 13:04:17 +00:00
Dmitry Preobrazhensky	9bd763679f	[AMDGPU][MC][GFX10] Enabled decoding of 'null' operand See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68348 llvm-svn: 373740	2019-10-04 12:38:36 +00:00
Dmitry Preobrazhensky	94d040706d	[AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructions See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68347 llvm-svn: 373736	2019-10-04 12:10:22 +00:00
Matt Arsenault	d7cad4fb41	AMDGPU/GlobalISel: Fix using wrong addrspace for aperture This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716	2019-10-04 08:35:38 +00:00
Matt Arsenault	412e0bf8f3	AMDGPU/GlobalISel: Select G_PTRTOINT llvm-svn: 373715	2019-10-04 08:35:37 +00:00
Matt Arsenault	be9521acaa	AMDGPU/GlobalISel: Support wave32 waterfall loops llvm-svn: 373714	2019-10-04 08:35:35 +00:00
Piotr Sobczak	165e469145	[AMDGPU][SILoadStoreOptimizer] NFC: Refactor code Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes. Introducing InstSubclass will keep them separate and help extending InstClassEnum with other instruction types (e.g. MIMG) in the future. This patch also makes getSubRegIdxs() more concise. Reviewers: nhaehnle, arsenm, tstellar Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68384 llvm-svn: 373699	2019-10-04 07:09:40 +00:00
Jordan Rupprecht	b2b43c8576	[NFC] Fix unused variable in release builds llvm-svn: 373646	2019-10-03 18:35:44 +00:00
Matt Arsenault	ed77b27441	AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT llvm-svn: 373639	2019-10-03 17:59:03 +00:00
Matt Arsenault	233ff982c7	AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638	2019-10-03 17:55:27 +00:00
Matt Arsenault	56271fe180	AMDGPU/GlobalISel: Allow VGPR to index SGPR register We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637	2019-10-03 17:50:32 +00:00
Matt Arsenault	3d23e58dbe	AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and This would try to do FewerElements to v9s8 llvm-svn: 373635	2019-10-03 17:50:29 +00:00
Tom Stellard	e6f5171305	AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions Summary: This adds a pre-pass to this optimization that scans through the basic block and generates lists of mergeable instructions with one list per unique address. In the optimization phase instead of scanning through the basic block for mergeable instructions, we now iterate over the lists generated by the pre-pass. The decision to re-optimize a block is now made per list, so if we fail to merge any instructions with the same address, then we do not attempt to optimize them in future passes over the block. This will help to reduce the time this pass spends re-optimizing instructions. In one pathological test case, this change reduces the time spent in the SILoadStoreOptimizer from 0.2s to 0.03s. This restructuring will also make it possible to implement further solutions in this pass, because we can now add less expensive checks to the pre-pass and filter instructions out early which will avoid the need to do the expensive scanning during the optimization pass. For example, checking for adjacent offsets is an inexpensive test we can move to the pre-pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65961 llvm-svn: 373630	2019-10-03 17:11:47 +00:00
Sander de Smalen	4f99b6f0fe	[AArch64] Static (de)allocation of SVE stack objects. Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects. The focus of this patch is purely to allow the stack frame to allocate/deallocate space for scalable SVE objects. More dynamic allocation (at compile-time, i.e. determining placement of SVE objects on the stack), or resolving frame-index references that include scalable-sized offsets, are left for subsequent patches. SVE objects are allocated in the stack frame as a separate region below the callee-save area, and above the alignment gap. This is done so that the SVE objects can be accessed directly from the FP at (runtime) VL-based offsets to benefit from using the VL-scaled addressing modes. The layout looks as follows: +-------------+ \| stack arg \| +-------------+ \| Callee Saves\| \| X29, X30 \| (if available) \|-------------\| <- FP (if available) \| : \| \| SVE area \| \| : \| +-------------+ \|/////////////\| alignment gap. \| : \| \| Stack objs \| \| : \| +-------------+ <- SP after call and frame-setup SVE and non-SVE stack objects are distinguished using different StackIDs. The offsets for objects with TargetStackID::SVEVector should be interpreted as purely scalable offsets within their respective SVE region. Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61437 llvm-svn: 373585	2019-10-03 11:33:50 +00:00
Matt Arsenault	efb5a24ab0	AMDGPU/GlobalISel: Don't re-get subtarget It's already available in the class. llvm-svn: 373568	2019-10-03 05:46:10 +00:00
Matt Arsenault	1c135a39aa	AMDGPU/GlobalISel: Expand G_BITCAST legality llvm-svn: 373567	2019-10-03 05:46:08 +00:00
Stanislav Mekhanoshin	1384c3a5b8	[AMDGPU] Fix illegal agpr use by VALU When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544	2019-10-02 23:23:46 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Jay Foad	dafda61021	[AMDGPU] Make printf lowering faster when there are no printfs Summary: Printf lowering unconditionally visited every instruction in the module. To make it faster in the common case where there are no printfs, look up the printf function (if any) and iterate over its users instead. Reviewers: rampitec, kzhuravl, alex-t, arsenm Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68145 llvm-svn: 373433	2019-10-02 08:44:15 +00:00
Matt Arsenault	86f864dace	AMDGPU/GlobalISel: Use getIntrinsicID helper llvm-svn: 373417	2019-10-02 01:02:27 +00:00
Matt Arsenault	cdfe5efe9b	AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415	2019-10-02 01:02:24 +00:00
Matt Arsenault	bfce0c2664	AMDGPU/GlobalISel: Private loads always use VGPRs llvm-svn: 373414	2019-10-02 01:02:21 +00:00
Matt Arsenault	05aa8a733e	AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR This will be needed to support AGPR operations. llvm-svn: 373413	2019-10-02 01:02:18 +00:00
Matt Arsenault	3a657afb3a	AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit values llvm-svn: 373412	2019-10-02 01:02:14 +00:00
Stanislav Mekhanoshin	075bc48a7f	[AMDGPU] separate accounting for agprs Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411	2019-10-02 00:26:58 +00:00
Changpeng Fang	e4ee28d14c	AMDGPU: Fix an out of date assert in addressing FrameIndex Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404	2019-10-01 23:07:14 +00:00
Tom Stellard	004c79157e	AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfo Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change the behavior of the pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65496 llvm-svn: 373366	2019-10-01 17:56:59 +00:00
Matt Arsenault	9dba603748	AMDGPU/GlobalISel: Increase max legal size to 1024 There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350	2019-10-01 16:35:06 +00:00
Jay Foad	e536800022	[AMDGPU] Add VerifyScheduling support. Summary: This is cut and pasted from the corresponding GenericScheduler functions. Reviewers: arsenm, atrick, tstellar, vpykhtin Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68264 llvm-svn: 373346	2019-10-01 15:45:47 +00:00
Matt Arsenault	fdea5e02ce	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP llvm-svn: 373298	2019-10-01 02:23:20 +00:00
Matt Arsenault	59b91aa93e	AMDGPU/GlobalISel: Add support for init.exec intrinsics TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296	2019-10-01 02:07:25 +00:00
Matt Arsenault	5823a28270	AMDGPU/GlobalISel: Allow scc/vcc alternative mappings for s1 constants llvm-svn: 373295	2019-10-01 02:07:19 +00:00
Matt Arsenault	8f6bdb7668	AMDGPU/GlobalISel: Avoid creating shift of 0 in arg lowering This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293	2019-10-01 01:44:46 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Matt Arsenault	54167ea316	AMDGPU/GlobalISel: Select G_UADDO/G_USUBO llvm-svn: 373288	2019-10-01 01:23:13 +00:00
Matt Arsenault	ed85b0cee6	GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287	2019-10-01 01:06:48 +00:00
Matt Arsenault	77ac400117	AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286	2019-10-01 01:06:43 +00:00
Alexander Timofeev	565b1d3d46	[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67662 llvm-svn: 373221	2019-09-30 15:31:17 +00:00
Guillaume Chatelet	ab11b9188d	[Alignment][NFC] Remove AllocaInst::setAlignment(unsigned) Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68141 llvm-svn: 373207	2019-09-30 13:34:44 +00:00
Matt Arsenault	317d991fa5	AMDGPU/GlobalISel: Fix select for v2s16 and/or/xor llvm-svn: 373180	2019-09-30 06:31:30 +00:00
Matt Arsenault	76f44f6b53	AMDGPU/GlobalISel: Avoid getting MRI in every function Store it in AMDGPUInstructionSelector to avoid boilerplate in nearly every select function. llvm-svn: 373139	2019-09-28 03:41:13 +00:00
Dmitry Preobrazhensky	436d5b335a	[AMDGPU][MC] Corrected parsing of registers Summary of changes: refactored code for better readability and future improvements; fixed bug 41281: https://bugs.llvm.org/show_bug.cgi?id=41281 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D65224 llvm-svn: 373094	2019-09-27 15:41:31 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Changpeng Fang	f5524f0451	Remove the AliasAnalysis argument in function areMemAccessesTriviallyDisjoint Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D58360 llvm-svn: 373024	2019-09-26 22:53:44 +00:00
Stanislav Mekhanoshin	486cd9a90d	[AMDGPU] copy OtherPredicates from pseudo to VOP3_Real Differential Revision: https://reviews.llvm.org/D68102 llvm-svn: 373015	2019-09-26 21:06:17 +00:00
Thomas Raoux	3c8c667235	[TargetLowering] Make allowsMemoryAccess methode virtual. Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935	2019-09-26 00:16:01 +00:00
Stanislav Mekhanoshin	d3b2b97195	[AMDGPU] gfx10 v_fmac_f16 operand folding Fold immediates into v_fmac_f16. Differential Revision: https://reviews.llvm.org/D68037 llvm-svn: 372906	2019-09-25 18:40:20 +00:00
Simon Pilgrim	5f2d8b2618	[TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr& Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future. Committed on behalf of @hvdijk (Harald van Dijk) Differential Revision: https://reviews.llvm.org/D66138 llvm-svn: 372882	2019-09-25 14:55:57 +00:00
Jakub Kuderski	269bd15c68	[Dominators][AMDGPU] Don't use virtual exit node in findNearestCommonDominator. Cleanup MachinePostDominators. Summary: This patch fixes a bug that originated from passing a virtual exit block (nullptr) to `MachinePostDominatorTee::findNearestCommonDominator` and resulted in assertion failures inside its callee. It also applies a small cleanup to the class. The patch introduces a new function in PDT that given a list of `MachineBasicBlock`s finds their NCD. The new overload of `findNearestCommonDominator` handles virtual root correctly. Note that similar handling of virtual root nodes is not necessary in (forward) `DominatorTree`s, as right now they don't use virtual roots. Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao Reviewed By: hliao Subscribers: hliao, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #amdgpu, #llvm Differential Revision: https://reviews.llvm.org/D67974 llvm-svn: 372874	2019-09-25 14:04:36 +00:00
Jay Foad	60d419e5cd	Add tracing in pickNodeFromQueue. This matches GenericScheduler::pickNodeFromQueue, from which this function was mostly cut and pasted. llvm-svn: 372829	2019-09-25 08:45:41 +00:00
Huihui Zhang	a18b00c8d5	[NFC] Add { } to silence compiler warning [-Wmissing-braces]. /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/lib/Object/MachOObjectFile.cpp:2731:7: warning: suggest braces around initialization of subobject [-Wmissing-braces] "i386", "x86_64", "x86_64h", "armv4t", "arm", "armv5e", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ { 1 warning generated. /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:46: warning: suggest braces around initialization of subobject [-Wmissing-braces] return addMappingFromTable<1>(MI, MRI, { 0 }, Table); ^ {} 1 warning generated. /local/mnt/workspace/huihuiz/llvm-comm-git-2/llvm-project/llvm/tools/llvm-objcopy/ELF/Object.cpp:400:57: warning: suggest braces around initialization of subobject [-Wmissing-braces] static constexpr std::array<uint8_t, 4> ZlibGnuMagic = {'Z', 'L', 'I', 'B'}; ^~~~~~~~~~~~~~~~~~ { } 1 warning generated. llvm-svn: 372811	2019-09-25 04:40:07 +00:00
Dmitry Preobrazhensky	6784a3cd79	[AMDGPU][MC] Corrected handling of relocatable expressions See bug 43359: https://bugs.llvm.org//show_bug.cgi?id=43359 Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D67829 llvm-svn: 372622	2019-09-23 15:41:51 +00:00
Simon Pilgrim	557cee337b	[AMDGPU] isSDNodeAlwaysUniform - silence static analyzer dyn_cast<LoadSDNode> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<LoadSDNode> directly and if not assert will fire for us. llvm-svn: 372528	2019-09-22 21:01:13 +00:00
Simon Pilgrim	2de9b107fa	AMDGPUPrintfRuntimeBinding - silence static analyzer null dereference warnings. NFCI. llvm-svn: 372501	2019-09-22 13:01:49 +00:00
Matt Arsenault	eb6eb694e4	AMDGPU/GlobalISel: Allow selection of scalar min/max I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450	2019-09-21 02:37:33 +00:00
Fangrui Song	084801bdc1	Use llvm::StringLiteral instead of StringRef in few places llvm-svn: 372395	2019-09-20 14:31:42 +00:00
Bjorn Pettersson	169cb63478	[AMDGPU] Use std::make_tuple to make some toolchains happy again My toolchain stopped working (LLVM 8.0 , libstdc++ 5.4.0) after r372338. The same problem was seen in clang-cuda-build buildbots: clang-cuda-build/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:763:12: error: chosen constructor is explicit in copy-initialization return {Reg, 0, nullptr}; ^~~~~~~~~~~~~~~~~ /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here constexpr tuple(_UElements&&... __elements) ^ This commit adds explicit calls to std::make_tuple to work around the problem. llvm-svn: 372384	2019-09-20 12:13:12 +00:00
Stanislav Mekhanoshin	d487d6401d	[AMDGPU] fixed underflow in getOccupancyWithNumVGPRs The function could return zero if an extreme number or registers were used. Minimal possible occupancy is 1. Differential Revision: https://reviews.llvm.org/D67771 llvm-svn: 372350	2019-09-19 20:09:04 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Tom Stellard	9f4c7571a1	AMDGPU/SILoadStoreOptimizer: Add const to more functions Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65901 llvm-svn: 372298	2019-09-19 04:39:45 +00:00
Matt Arsenault	bffbeecb44	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.ds.swizzle llvm-svn: 372297	2019-09-19 04:11:17 +00:00
Matt Arsenault	4f663a6367	AMDGPU/GlobalISel: RegBankSelect tbuffer load/store These have the same operand structure as the non-t buffer operations. llvm-svn: 372296	2019-09-19 04:11:12 +00:00
Matt Arsenault	494243597b	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store.format This needs special handling due to some subtargets that have a nonstandard register layout for f16 vectors Also reject some illegal types on other targets. llvm-svn: 372293	2019-09-19 02:35:08 +00:00
Matt Arsenault	67f1f6ff8c	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.store llvm-svn: 372292	2019-09-19 02:30:27 +00:00
Matt Arsenault	838ff36553	AMDGPU/GlobalISel: RegBankSelect struct buffer load/store llvm-svn: 372291	2019-09-19 02:26:53 +00:00
Matt Arsenault	a62ef58346	AMDGPU/GlobalISel: RegBankSelect llvm.amdgcn.raw.buffer.{load\|store} llvm-svn: 372290	2019-09-19 02:25:09 +00:00
Matt Arsenault	a30d022db6	AMDGPU/GlobalISel: Attempt to RegBankSelect image intrinsics Images should always have 2 consecutive, mandatory SGPR arguments. llvm-svn: 372289	2019-09-19 02:23:06 +00:00
Matt Arsenault	22e2c09515	AMDGPU/GlobalISel: Fix RegBankSelect G_SMULH/G_UMULH pre-gfx9 The scalar versions were only introduced in gfx9. llvm-svn: 372286	2019-09-19 01:42:34 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Guillaume Chatelet	d4c4671aa7	[Alignment][NFC] Remove LogAlignment functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67620 llvm-svn: 372231	2019-09-18 15:49:49 +00:00
Tim Renouf	1786117111	[AMDGPU] Allow FP inline constant in v_madak_f16 and v_fmaak_f16 Differential Revision: https://reviews.llvm.org/D67680 Change-Id: Ic38f47cb2079c2c1070a441b5943854844d80a7c llvm-svn: 372208	2019-09-18 09:32:06 +00:00
Stanislav Mekhanoshin	1fb584f7a2	[AMDGPU] Added MI bit IsDOT NFC, needed for future commit. Differential Revision: https://reviews.llvm.org/D67669 llvm-svn: 372151	2019-09-17 17:56:13 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Alexander Timofeev	6524a7a2b9	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed Defferential Revision: https://reviews.llvm.org/D67101 Reviewers: rampitec, vpykhtin llvm-svn: 372086	2019-09-17 09:08:58 +00:00
Matt Arsenault	fb51e64eac	AMDGPU/GlobalISel: Fail select of G_INSERT non-32-bit source This was producing an illegal copy which would hit an assert later. Error on selection for now until this is implemented. llvm-svn: 371993	2019-09-16 14:26:14 +00:00
Matt Arsenault	1fc07d6648	AMDGPU/GlobalISel: Fix RegBankSelect for G_FRINT and G_FCEIL llvm-svn: 371991	2019-09-16 14:14:37 +00:00
Matt Arsenault	bc8de8a8da	AMDGPU/GlobalISel: Select SMRD loads for more types llvm-svn: 371954	2019-09-16 00:54:07 +00:00
Matt Arsenault	48b158acae	AMDGPU/GlobalISel: RegBankSelect for kill llvm-svn: 371953	2019-09-16 00:48:37 +00:00
Matt Arsenault	01c7f40de3	AMDGPU/GlobalISel: Legalize s1 source G_[SU]ITOFP llvm-svn: 371952	2019-09-16 00:37:10 +00:00
Matt Arsenault	60169ed613	AMDGPU/GlobalISel: Set type on vgpr live in special arguments Fixes assertion with workitem ID intrinsics used in non-kernel functions. llvm-svn: 371951	2019-09-16 00:33:00 +00:00
Matt Arsenault	9f52c1ea58	AMDGPU/GlobalISel: Select S16->S32 fptoint llvm-svn: 371950	2019-09-16 00:32:56 +00:00
Matt Arsenault	0a6123595f	AMDGPU/GlobalISel: Select s32->s16 G_[US]ITOFP llvm-svn: 371949	2019-09-16 00:29:12 +00:00

... 7 8 9 10 11 ...

4705 Commits