llvm-project

Commit Graph

Author	SHA1	Message	Date
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
Craig Topper	a58b62b4a2	[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand(). This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand(). I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call. Differential Revision: https://reviews.llvm.org/D78882	2020-04-27 22:17:03 -07:00
Jay Foad	cca6bc42d9	[AMDGPU] Use RegClass helper functions in getRegForInlineAsmConstraint. This avoids more long lists of register classes that have to be updated every time we add a new one. NFC. Differential Revision: https://reviews.llvm.org/D78570	2020-04-23 12:26:52 +01:00
Jay Foad	0337017a9f	[AMDGPU] Use SGPR instead of SReg classes `12994a70cf` did this for 128-bit classes: SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. This patch extends it to all classes > 64 bits, for consistency. Differential Revision: https://reviews.llvm.org/D78622	2020-04-23 11:45:22 +01:00
Jay Foad	dbdffe3ee9	[AMDGPU] Add 192-bit register classes Differential Revision: https://reviews.llvm.org/D78312	2020-04-22 13:10:37 +01:00
Jay Foad	d625b4b081	[AMDGPU] Add missing AReg classes Add 96-bit, 160-bit and 256-bit AReg classes to match VReg and SReg. NFC as far as I know, but it may avoid weird legalization problems. Differential Revision: https://reviews.llvm.org/D78348	2020-04-22 13:10:37 +01:00
Jay Foad	7318625674	[AMDGPU] Remove obsolete special case for 1024-bit vector types. NFC.	2020-04-22 09:05:24 +01:00
Matt Arsenault	f463792506	AMDGPU: Remove custom node for RSQ_LEGACY Directly select from the intrinsic. This wasn't getting much value from the custom node.	2020-04-17 19:50:36 -04:00
Matt Arsenault	0ba40d4ccf	AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3] Ports the existing DAG combines, minus the simplify demanded bits which seems to have no equivalent now. Without these, this isn't particularly helpful in most of the IR sample cases.	2020-04-13 19:18:19 -04:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Matt Arsenault	e6605a209c	DAG: Fix wrong legality check for ISD::FMAD Since `1725f28841`, this should check isFMADLegalForFAddFSub rather than the the plain isOperationLegal. This would assert in a subset of cases due to an oddity in how FMAD is selected. We will allow FMA formation pre-legalize, but not FMAD even in cases where it would be valid. The current hook requires passing in the root fadd/fsub. However, in this distributed case, this would be far more complicated to pass in the relevant operand. AMDGPU doesn't get any value from the node, and only needs the type and is the only implementor, so I'm not sure why we have this complexity. Just rename and expand the assert to avoid the more complicated checks spread through the distribution logic.	2020-04-13 10:25:39 -07:00
Craig Topper	95192f548d	[CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface. Differential Revision: https://reviews.llvm.org/D77929	2020-04-12 11:26:25 -07:00
Stanislav Mekhanoshin	f96810ff34	[AMDGPU] Expand vector trunc stores from i16 to i8 Differential Revision: https://reviews.llvm.org/D77693	2020-04-07 21:47:45 -07:00
Matt Arsenault	869f05c834	AMDGPU: Remove dead paths for requiresUniformRegister The extracts from control flow intrinsics are already properly handled by divergence analysis. The inline asm case isn't dead, but has also never really worked correctly so leave it as-is for now.	2020-04-06 16:15:10 -04:00
Apelete Seketeli	8aadb442d1	[scan-build] fix dead store warnings emitted on LLVM AMDGPU code base This fixes dead store warnings of the type "dead assignment" reported by Clang Static Analyzer.	2020-04-05 11:19:03 -04:00
Matt Arsenault	178050c3ba	AMDGPU: Use Register in more places	2020-04-03 14:52:54 -04:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Sebastian Neubauer	5d3a69feca	[AMDGPU] New llvm.amdgcn.ballot intrinsic Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088	2020-03-31 10:35:39 +02:00
Matt Arsenault	db9f0d1ce5	AMDGPU: Form v_cvt_ubyte* with f16 results We get 2 conversion instructions anyway. Previously we would get a conversion with SDWA reading from a byte source, which has a larger encoding.	2020-03-30 17:59:49 -04:00
Matt Arsenault	570a578e46	AMDGPU: Account for dmask when computing image mem size Only the number of elements in the dmask will really be accessed.	2020-03-30 17:30:58 -04:00
Jakub Kuderski	77ce2e21a8	[AMDGPU] Add Relocation Constant Support Summary: This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf. The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry. `SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel. Author: csyonghe <yonghe@google.com> Reviewers: tpr, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76440	2020-03-30 13:49:20 -04:00
Matt Arsenault	ab7a41069e	AMDGPU: Fix using wrong instruction for FP conversion This was was never actually hit, but FTRUNC was clearly not the intent here.	2020-03-29 14:03:07 -04:00
Matt Arsenault	97bbe7ad2a	AMDGPU: Fix typo	2020-03-29 14:03:06 -04:00
Matt Arsenault	a950e3beef	AMDGPU: Move towards deprecating alignbit intrinsic This is equivalent to llvm.fshr, so legalize the intrinsic to the generic node.	2020-03-20 11:03:04 -04:00
Scott Linder	0e9368cc8c	[AMDGPU] Move frame pointer from s34 to s33 Remove the gap left between the stack pointer (s32) and frame pointer (s34) now that the scratch wave offset is no longer a part of the calling convention ABI. Update llvm/docs/AMDGPUUsage.rst to reflect the change. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75657	2020-03-19 15:35:16 -04:00
Scott Linder	60b1967c39	[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in the entry function prologue. This allows us to removes the scratch wave offset register from the calling convention ABI. As part of this change, allow the use of an inline constant zero for the SOffset of MUBUF instructions accessing the stack in entry functions when a frame pointer is not requested/required. Entry functions with calls still need to set up the calling convention ABI stack pointer register, and reference it in order to address arguments of called functions. The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative. Non-entry functions also use an inline constant zero SOffset for wave-relative scratch access, but continue to use the stack and frame pointers as before. When the stack or frame pointer is converted to a swizzled offset it is now scaled directly, as the scratch wave offset no longer needs to be subtracted first. Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling convention. Tags: #llvm Differential Revision: https://reviews.llvm.org/D75138	2020-03-19 15:35:16 -04:00
Matt Arsenault	4ea1baf6a0	AMDGPU: Initial, crude support for indirect calls This isn't really usable, and requires using the -amdgpu-fixed-function-abi flag to work. Assumes a uniform call target, and will hit a verifier error if the call target ends up in a VGPR. Also doesn't attempt to do anything sensible for the reported register/stack usage.	2020-03-18 12:03:48 -04:00
Matt Arsenault	015b640be4	AMDGPU: Add flag to used fixed function ABI Pass all arguments to every function, rather than only passing the minimum set of inputs needed for the call graph.	2020-03-13 13:27:05 -07:00
Matt Arsenault	bb8622094d	AMDGPU: Don't handle kernarg.segment.ptr in functions Just lower this to null. Pass implicitarg.ptr in its place in the argument list.	2020-03-13 12:51:12 -07:00
alex-t	39e1a90784	[AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block Summary: When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop. To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result. %95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32 results to s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_mov_b64 exec, s[6:7] The bug appeared in case this expansion occurs in the ELSE block of the CF. Originally %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32, %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec expanded to ****************** <== here exec has "THEN" context s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_or_saveexec_b64 s[4:5], s[4:5] <-- exec mask is restored for "ELSE" but immediately overwritten. s_mov_b64 exec, s[6:7] The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask" SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec. Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75472	2020-03-10 14:04:22 +03:00
Jay Foad	11d1573bb6	[APFloat] Make use of new overloaded comparison operators. NFC. Reviewers: ekatz, spatel, jfb, tlively, craig.topper, RKSimon, nikic, scanon Subscribers: arsenm, jvesely, nhaehnle, hiraditya, dexonsmith, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75744	2020-03-06 16:42:53 +00:00
Simon Pilgrim	e2f0093800	[AMDGPU] performCvtF32UByteNCombine - revisit node after src operand simplification. If SimplifyDemandedBits succeeds in simplifying the byte src, add the CVT_F32_UBYTE node back to the worklist as we might be able to simplify further. Yet another step towards removing SelectionDAG::GetDemandedBits.	2020-03-04 11:25:50 +00:00
Simon Pilgrim	4af8db317d	[AMDGPU] performCvtF32UByteNCombine - add SHL and SimplifyMultipleUseDemandedBits support This is part of the work to remove SelectionDAG::GetDemandedBits and just use SimplifyMultipleUseDemandedBits. Recent experiments raised some v_cvt_f32_ubyte*_e32 regressions, so I've added some additional abilities to performCvtF32UByteNCombine to help unpack byte data more aggressively. We still don't remove all OR(SHL,SRL) patterns as some of the regenerated nodes don't get combined again, but we are getting closer. Differential Revision: https://reviews.llvm.org/D74786	2020-02-19 11:45:57 +00:00
Matt Arsenault	e240b27d6d	AMDGPU/GlobalISel: Allow arbitrary global values Treat unknown address spaces as global	2020-02-17 11:32:28 -08:00
Matt Arsenault	8c2c0b3637	AMDGPU: Improve i16/v2i16 bswap	2020-02-14 09:53:22 -08:00
Matt Arsenault	bfe3779459	AMDGPU: Use v_perm_b32 to implement bswap Also greatly improve i64 lowering. LegalizeIntegerTypes does the correct narrowing if i64 isn't legal. Just workaround this for SelectionDAG by making i64 legal and splitting in the patterns.	2020-02-13 09:45:31 -08:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Matt Arsenault	7af7b96a9b	AMDGPU: Move R600 test compatability hack Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn intrinsics on r600.	2020-02-10 10:02:06 -08:00
Stanislav Mekhanoshin	ed3527c648	[AMDGPU] Split R600 and GCN subregs These are generated and do not need to have the same values. We are defining separate subregs for R600 and GCN but then using AMDGPU subregs on R600. Differential Revision: https://reviews.llvm.org/D74248	2020-02-10 08:29:56 -08:00
Sebastian Neubauer	8756869170	[AMDGPU] Add a16 feature to gfx10 Based on D72931 This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gfx7/8. Differential Revision: https://reviews.llvm.org/D73956	2020-02-10 09:04:23 +01:00
Changpeng Fang	884acbb9e1	AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D73588	2020-02-07 11:46:23 -08:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Stanislav Mekhanoshin	cacc3b7a55	[AMDGPU] Cleanup assumptions about generated subregs We are using countPopulation on a LaneBitmask to determine a number of registers it covers. This is the assumption which does not necessarily need to be true. It is not changed but factored into a single call SIRegisterInfo::getNumCoveredRegs(). Some other places are cleaned up with respect to assumptions about subreg indexes values and tablegen behavior. Differential Revision: https://reviews.llvm.org/D74177	2020-02-06 17:39:24 -08:00
Matt Arsenault	03a2d0045d	AMDGPU: Add compile time hack for hasCFUser Assume the control flow intrinsic results are never casted, and early exit based on the type.	2020-02-06 11:41:34 -08:00
Sebastian Neubauer	163e33b290	[AMDGPU] Fix lowering a16 image intrinsics scalar_to_vector takes only one argument, not two. The a16 tests now also check the packing of coordinates into registers Differential Revision: https://reviews.llvm.org/D73482	2020-02-05 10:54:34 +01:00
Sebastian Neubauer	3bc7ffdaab	[AMDGPU] Use v3f32 type in image instructions This should lower the amount of used registers for gfx9. I updated some of the changed tests with the update script because changing them by hand is tedious. Differential Revision: https://reviews.llvm.org/D73884	2020-02-05 10:35:41 +01:00
Matt Arsenault	9260d01faa	AMDGPU: Correct memory size for image intrinsics This was incorrectly rounding up to the next power of 2. v4f32 was rounding up to v8f32, which was just wrong. There are also v3i16/v3f16 available in MVT, so we don't even need to round the f16 cases anymore. Additionally, this field is really an EVT so we don't even need to consider this. Also switch some asserts to return invalid. We should have an IR verifier for these intrinsic return types, but for now it's better to not assert on IR that passes the verifier. This should also probably be fixed to consider that dmask is really eliminating some of the loaded components.	2020-02-04 22:29:23 -05:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Matt Arsenault	cb7b661d3d	AMDGPU: Analyze divergence of inline asm	2020-02-03 12:42:16 -08:00
Matt Arsenault	726446a009	AMDGPU: Fix splitting wide f32 s.buffer.load intrinsics This would witch f32 to i32, and produce an invald concat_vectors from i32 pieces to an f32 vector.	2020-02-03 12:28:08 -08:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Matt Arsenault	97a1d4bc02	AMDGPU: Don't use separate cache arguments for s_buffer_load node There's not much value to this separate node from the intrinsic. Make the operand structure the same as the intrinsic, so we can reuse the same pattern for GlobalISel.	2020-01-30 14:15:26 -08:00
Matt Arsenault	68b102b97a	AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16 Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both instructions in the output need it. Neither considers that both output instructions depend on m0. I don't know of any other pattern we need to handle this case, so it's less effort to just workaround this for now.	2020-01-29 08:24:31 -08:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Matt Arsenault	bef27175c7	AMDGPU: Fix not using f16 fsin/fcos I noticed this because this accidentally started working for GlobalISel.	2020-01-27 08:59:59 -08:00
Matt Arsenault	87c46a3129	AMDGPU: Don't error on ds.ordered intrinsic in function These should be assumed to be called from a compute context. Also don't use a 2 entry switch over constants.	2020-01-24 13:06:44 -08:00
Changpeng Fang	2531535984	AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv. In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath. FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request. RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal. Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed. 1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D71293	2020-01-23 16:57:43 -08:00
Matt Arsenault	0bf434ccd5	AMDGPU: Fix element size assertion The GlobalISel usage called this with bits, but the DAG usage was incorrectly using bytes.	2020-01-22 11:18:45 -05:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Matt Arsenault	9c928649a0	AMDGPU: Fix interaction of tfe and d16 This using the wrong result register, and dropping the result entirely for v2f16. This would fail to select on the scalar case. I believe it was also mishandling packed/unpacked subtargets.	2020-01-22 09:26:17 -05:00
Matt Arsenault	64e9528201	AMDGPU: Fix missing immarg on llvm.amdgcn.interp.mov The first operand maps to an immediate field, so this should be immarg.	2020-01-22 09:01:34 -05:00
Matt Arsenault	8615eeb455	AMDGPU: Partially merge indirect register write handling `a785209bc2` switched to using a pseudos instead of manually tying operands on the regular instruction. The VGPR indexing mode path should have the same problems that change attempted to avoid, so these should use the same strategy. Use a single pseudo for the VGPR indexing mode and movreld paths, and expand it based on the subtarget later. These have essentially the same constraints, reading the index from m0. Switch from using an offset to the subregister index directly, instead of computing an offset and re-adding it back. Also add missing pseudos for existing register class sizes.	2020-01-20 17:19:16 -05:00
Michael Liao	6d0d86a64d	[DAG] Add helper for creating constant vector index with correct type. NFC.	2020-01-18 01:23:36 -05:00
Stanislav Mekhanoshin	eebdd85e7d	[AMDGPU] allow multi-dword flat scratch access since GFX9 This is supported starting with GFX9. Differential Revision: https://reviews.llvm.org/D72865	2020-01-17 10:47:03 -08:00
Matt Arsenault	886f9071c6	AMDGPU: Don't assert on a16 images on targets without FeatureR128A16 Currently the lowering for i16 image coordinates asserts on gfx10. I'm somewhat confused by this though. The feature is missing from the gfx10 feature lists, but the a16 bit appears to be present in the manual for MIMG instructions.	2020-01-17 11:07:00 -05:00
Matt Arsenault	91e758b732	AMDGPU: Move permlane discard vdst_in optimization This case can be handled as a regular selection pattern, so move it out of the weird post-isel folding code which doesn't have an exactly equivalent place in GlobalISel. I think it doesn't make much sense to do this optimization here though, and it would be more useful in instcombine. There's not really any new information that will be gained during lowering since these inputs were known from the beginning.	2020-01-16 17:27:53 -05:00
Matt Arsenault	eef92f25cc	AMDGPU: Remove custom node for exports I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic. Requires hacking out the illegal type on SI, so manually select that case during lowering.	2020-01-15 18:33:15 -05:00
Matt Arsenault	bd7658a212	AMDGPU: Partially directly select llvm.amdgcn.interp.p1.f16 The 16 bank LDS case is complicated due to using multiple instructions. If I attempt to write a pattern for it, the generated selector incorrectly places the copy to m0 after the first instruction, so that needs to be separately addressed. Also fix not gluing the copy to m0 to the second operation in the second half of the 16 bank lowering.	2020-01-15 08:58:58 -05:00
Matt Arsenault	35c3d101ae	AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.	2020-01-09 19:52:24 -05:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Matt Arsenault	68e70fb098	AMDGPU: Fix not using v_cvt_f16_[iu]16 We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.	2020-01-07 15:10:07 -05:00
Matt Arsenault	452f6243c9	AMDGPU: Select llvm.amdgcn.interp.p2.f16 directly This will enable automatic GlobalISel support in a future commit.	2020-01-06 20:34:21 -05:00
Matt Arsenault	7f2db2917d	AMDGPU: Fix legalizing f16 fpow The existing test only covered one case for r600. The use of mul_legacy also looks suspicious to me, but leave it for now. The patterns are also not making use of source modifiers.	2020-01-06 17:21:51 -05:00
Michael Liao	3566c75ca8	[amdgpu] Skip non-instruction values in CF user tracing. Summary: - CF users won't be non-instruction values. Skip them to save the compilation time. It's especially true when there are multiple functions in that module, where, says, a constant may be used in most functions. The current CF user tracing adds significant overhead. Reviewers: alex-t, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72174	2020-01-03 16:02:47 -05:00
Reid Kleckner	9c2b72821b	Move tail call disabling code to target independent code When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in `d9699bc7bd` (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118	2020-01-03 11:27:41 -08:00
Jay Foad	13a7a4ccbf	Remove unneeded extra variable realArgIdx. NFC.	2020-01-02 14:27:32 +00:00
Craig Topper	787e078f3e	[TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues instead of creating a MERGE_VALUES node. NFCI This allows us to clean up some places that were peeking through the MERGE_VALUES node after the call. By returning the SDValues directly, we can clean that up. Unfortunately, there are several call sites in AMDGPU that wanted the MERGE_VALUES and now need to create their own.	2019-12-30 19:36:04 -08:00
Jay Foad	c7c05b0c8a	[AMDGPU] Don't create MachinePointerInfos with an UndefValue pointer Summary: The only useful information the UndefValue conveys is the address space, which MachinePointerInfo can represent directly without referring to an IR value. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71838	2019-12-23 15:58:19 +00:00
Tim Renouf	fce1a6f584	Revert "AMDGPU: Try to commute sub of boolean ext" This reverts commit `69fcfb7d35`. As shown in the test I attached to this commit, the change I reverted causes a problem with "zext(cc1) - zext(cc2)". It commuted the operands to the sub and used different logic to select the addc/subc instruction: sub zext (setcc), x => addcarry 0, x, setcc sub sext (setcc), x => subcarry 0, x, setcc ... but that is bogus. I believe it is not possible to fold those commuted patterns into any form of addcarry or subcarry. It may have worked as intended before "AMDGPU: Change boolean content type to 0 or 1" because the setcc was considered to be -1 rather than 1. Differential Revision: https://reviews.llvm.org/D70978 Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43	2019-12-13 12:49:06 +00:00
Michael Liao	4a308d302c	[AMDGPU] Keep consistent check of legal addressing mode. Summary: - Add test cases for GFX10, which has narrower offset range compared to GFX9. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70473	2019-11-20 15:08:17 -05:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Piotr Sobczak	02419ab5c7	[AMDGPU] Lower llvm.amdgcn.s.buffer.load.v3[i\|f]32 Summary: Add lowering support for 32-bit vec3 variant of s.buffer.load intrinsic. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70118	2019-11-15 15:01:15 +01:00
Matt Arsenault	31479d868e	AMDGPU: Change boolean content type to 0 or 1 The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.	2019-11-15 13:43:47 +05:30
Matt Arsenault	69fcfb7d35	AMDGPU: Try to commute sub of boolean ext Avoids another regression in a future patch.	2019-11-15 13:43:42 +05:30
Matt Arsenault	9d7bccab66	AMDGPU: Extend add x, (ext setcc) combine to sub This is the same as the add case, but inverts the operation type. This avoids regressions in a future patch.	2019-11-13 07:13:58 +05:30
Matt Arsenault	e16a71382d	AMDGPU: Select global atomicrmw fadd This only works if there is no use of the return value.	2019-11-06 16:06:38 -08:00
Matt Arsenault	6221767055	DAG: Add DAG argument to isFPExtFoldable For AMDGPU this is dependent on the FP mode, which should eventually not be a property of the subtarget.	2019-10-31 22:32:45 -07:00
Matt Arsenault	1725f28841	DAG: Add new control for ISD::FMAD formation For AMDGPU this depends on whether denormals are enabled in the default FP mode for the function. Currently this is treated as a subtarget feature, so FMAD is selectively legal based on that. I want to move this out of the subtarget features so this can be controlled with a denormal mode attribute. Additionally, this will allow folding based on a future ftz fast math flag.	2019-10-31 07:51:38 -07:00
Matt Arsenault	bc56166281	AMDGPU: Simplify getAddressSpace calls These can be directly taken from the GlobalValue instead of going through the type.	2019-10-31 07:51:38 -07:00
Matt Arsenault	171cf5302f	AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.	2019-10-25 13:11:09 -07:00
Matt Arsenault	ef9a0278f0	AMDGPU: Select basic interp directly from intrinsics llvm-svn: 375457	2019-10-21 21:49:44 +00:00
Matt Arsenault	38038f116f	AMDGPU: Use CopyToReg for interp intrinsic lowering This doesn't use the default value, so doesn't benefit from the hack to help optimize it. llvm-svn: 375450	2019-10-21 19:53:49 +00:00
Matt Arsenault	8a8b317460	AMDGPU: Don't error on calls to null or undef Calls to constants should probably be generally handled. llvm-svn: 375356	2019-10-20 07:46:04 +00:00
Reid Kleckner	904cd3e06b	Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each Now X86ISelLowering doesn't depend on many IR analyses. llvm-svn: 375320	2019-10-19 01:31:09 +00:00
Matt Arsenault	f9a42ed0a7	AMDGPU: Relax 32-bit SGPR register class Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. llvm-svn: 375267	2019-10-18 18:26:37 +00:00
Guillaume Chatelet	882c43d703	[Alignment][NFC] Use Align for TargetFrameLowering/Subtarget Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68993 llvm-svn: 375084	2019-10-17 07:49:39 +00:00
Alexander Timofeev	c4d256a590	[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 llvm-svn: 374767	2019-10-14 12:01:10 +00:00
Matt Arsenault	12994a70cf	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284	2019-10-10 07:11:33 +00:00
Tom Stellard	3a8d80944b	AMDGPU: Add offsets to MMO when lowering buffer intrinsics Summary: Without offsets on the MachineMemOperands (MMOs), MachineInstr::mayAlias() will return true for all reads and writes to the same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler when analyzing dependencies of buffer loads and stores. It also limits the SILoadStoreOptimizer from merging more instructions. This patch reduces the compile time of one pathological compute shader from 12 seconds to 1 second. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65097 llvm-svn: 374087	2019-10-08 17:04:51 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Thomas Raoux	3c8c667235	[TargetLowering] Make allowsMemoryAccess methode virtual. Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935	2019-09-26 00:16:01 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Guillaume Chatelet	d4c4671aa7	[Alignment][NFC] Remove LogAlignment functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67620 llvm-svn: 372231	2019-09-18 15:49:49 +00:00
Matt Arsenault	60169ed613	AMDGPU/GlobalISel: Set type on vgpr live in special arguments Fixes assertion with workitem ID intrinsics used in non-kernel functions. llvm-svn: 371951	2019-09-16 00:33:00 +00:00
Matt Arsenault	c0ceca5883	AMDGPU/GlobalISel: First pass at attempting to legalize load/stores There's still a lot more to do, but this handles decomposing due to alignment. I've gotten it to the point where nothing crashes or infinite loops the legalizer. llvm-svn: 371533	2019-09-10 16:20:14 +00:00
Guillaume Chatelet	3729b17cff	[Alignment][NFC] Use llvm::Align for TargetLowering::getPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67386 llvm-svn: 371511	2019-09-10 12:00:43 +00:00
Matt Arsenault	acc9571406	AMDGPU: Remove pointless wrapper nodes for init.exec intrinsics llvm-svn: 371364	2019-09-09 05:49:52 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Matt Arsenault	f581d575ce	AMDGPU: Add intrinsics for address space identification The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009	2019-09-05 02:20:39 +00:00
Jay Foad	6e18266aa4	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667	2019-09-02 14:40:57 +00:00
Matt Arsenault	0c096da02f	AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16 This is something of a workaround since computeRegisterProperties seems to be doing the wrong thing. llvm-svn: 370086	2019-08-27 17:51:56 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Matt Arsenault	1f2b727298	MVT: Add v3i16/v3f16 vectors AMDGPU has some buffer intrinsics which theoretically could use this. Some of the generated tables include the 3 and 4 element vector versions of these rounded to 64-bits, which is ambiguous. Add these to help the table disambiguate these. Assertion change is for the path odd sized vectors now take for R600. v3i16 is widened to v4i16, which then needs to be promoted to v4i32. llvm-svn: 369038	2019-08-15 18:58:25 +00:00
Austin Kerbow	a05c384132	Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367969	2019-08-06 02:16:11 +00:00
Dmitri Gribenko	37aa8ad663	Revert "[AMDGPU] Use S_DENORM_MODE for gfx10" This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. llvm-svn: 367904	2019-08-05 18:36:43 +00:00
Austin Kerbow	8d229dbb47	[AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 llvm-svn: 367882	2019-08-05 16:09:49 +00:00
Matt Arsenault	3922392969	AMDGPU: Correct behavior of f16 buffer loads Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879	2019-08-05 15:59:07 +00:00
Matt Arsenault	0e0a1c80fb	AMDGPU: Correct behavior of f16/i16 non-format store intrinsics This was switching to use a format store for a non-format store for f16 types. Also fixes i16/f16 stores on targets without legal f16. The corresponding loads also need to be fixed. llvm-svn: 367872	2019-08-05 14:57:59 +00:00
Guillaume Chatelet	c97a3d15d2	[LLVM][Alignment] Introduce Alignment Type Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828	2019-08-05 11:02:05 +00:00
Nicolai Haehnle	e204786b6c	AMDGPU: add missing llvm.amdgcn.{raw,struct}.buffer.atomic.{inc,dec} Summary: Wrapping increment/decrement. These aren't exposed by many APIs... Change-Id: I1df25c7889de5a5ba76468ad8e8a2597efa9af6c Reviewers: arsenm, tpr, dstuttard Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65283 llvm-svn: 367821	2019-08-05 09:36:06 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
Matt Arsenault	bb582ebdba	AMDGPU: Remove v0 workaround for DS_GWS_* instructions Any register should work for the src field since r366067, since the used value is not pulled from the expected encoding field. llvm-svn: 367598	2019-08-01 18:41:32 +00:00
Matt Arsenault	aff2995f46	AMDGPU: Use tablegen pattern for sendmsg intrinsics Since this now emits a direct copy to m0, SIFixSGPRCopies has to handle a physical register. llvm-svn: 367593	2019-08-01 18:27:11 +00:00
Stanislav Mekhanoshin	450afcea39	[AMDGPU] Reserve all AGPRs on targets which do not have them Differential Revision: https://reviews.llvm.org/D65471 llvm-svn: 367347	2019-07-30 19:29:33 +00:00
David Stuttard	20235ef3e7	[AMDGPU] Enable v4f16 and above for v_pk_fma instructions Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206	2019-07-29 08:15:10 +00:00
Carl Ritson	0b28357053	[AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAG Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105	2019-07-26 13:11:44 +00:00
Matt Arsenault	a85af76c72	AMDGPU: Don't assert on v4f16 arguments to shader calling conventions llvm-svn: 367018	2019-07-25 13:55:07 +00:00
Matt Arsenault	85f3890126	AMDGPU: Force s_waitcnt after GWS instructions This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607	2019-07-19 19:47:30 +00:00
Matt Arsenault	fecf43eba3	AMDGPU/GlobalISel: Rewrite lowerFormalArguments This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582	2019-07-19 14:15:18 +00:00
Matt Arsenault	1022c0dfde	AMDGPU: Decompose all values to 32-bit pieces for calling conventions This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578	2019-07-19 13:57:44 +00:00
Stanislav Mekhanoshin	6e0fa292c2	[AMDGPU] Change register type for v32 vectors When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction. Differential Revision: https://reviews.llvm.org/D64815 llvm-svn: 366252	2019-07-16 20:06:00 +00:00
Matt Arsenault	49169a963e	AMDGPU: Add 24-bit mul intrinsics Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094	2019-07-15 17:50:31 +00:00
Stanislav Mekhanoshin	1dfae6fe50	[AMDGPU] use v32f32 for 3 mfma intrinsics These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972	2019-07-12 22:42:01 +00:00
Matt Arsenault	51a05d72ae	AMDGPU: Drop remnants of byval support for shaders Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953	2019-07-12 20:12:17 +00:00
Stanislav Mekhanoshin	e67cc380a8	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824	2019-07-11 21:19:33 +00:00
Stanislav Mekhanoshin	e93279fd1b	[AMDGPU] gfx908 atomic fadd and atomic pk_fadd Differential Revision: https://reviews.llvm.org/D64435 llvm-svn: 365717	2019-07-11 00:10:17 +00:00
Stanislav Mekhanoshin	50d7f46460	[AMDGPU] gfx908 mAI instructions, MC part Differential Revision: https://reviews.llvm.org/D64446 llvm-svn: 365563	2019-07-09 21:43:09 +00:00
Christudasan Devadasan	b2d24bd540	[AMDGPU] Created a sub-register class for the return address operand in the return instruction. Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this regclass while lowering the return instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D63924 llvm-svn: 365512	2019-07-09 16:48:42 +00:00
Matt Arsenault	71dfb7ec5c	AMDGPU: Make s34 the FP register Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. llvm-svn: 365372	2019-07-08 19:03:38 +00:00
Tim Renouf	5816889c74	[AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8 Summary: Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these sizes has been marked "expand", which made LegalizeDAG lower it to loads and stores via a stack slot. The code got optimized a bit later, but the now-unused stack slot was never deleted. This commit avoids that problem by custom lowering INSERT_SUBVECTOR into an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the subvector to insert. V2: Addressed review comments re test. Differential Revision: https://reviews.llvm.org/D63160 Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1 llvm-svn: 365148	2019-07-04 17:38:24 +00:00
Matt Arsenault	5fe851b6cd	AMDGPU: Custom lower vector_shuffle for v4i16/v4f16 Ordinarily it is lowered as a build_vector of each extract_vector_elt, which in turn get lowered to bitcasts and bit shifts. Very little understand the lowered extract pattern, resulting in much worse code. We treat concat_vectors of v2i16 as legal, so prefer that. llvm-svn: 364959	2019-07-02 19:15:45 +00:00
Nicolai Haehnle	10c911db63	AMDGPU/GFX10: implement ds_ordered_count changes Summary: ds_ordered_count can now simultaneously operate on up to 4 dwords in a single instruction, which are taken from (and returned to) lanes 0..3 of a single VGPR. Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276 Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63716 llvm-svn: 364815	2019-07-01 17:17:52 +00:00
Nicolai Haehnle	4dc3b2bf95	AMDGPU: Support GDS atomics Summary: Original patch by Marek Olšák Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab Reviewers: mareko, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63452 llvm-svn: 364814	2019-07-01 17:17:45 +00:00
Stanislav Mekhanoshin	07fd88d735	[AMDGPU] Packed thread ids in function call ABI Differential Revision: https://reviews.llvm.org/D63851 llvm-svn: 364619	2019-06-28 01:52:13 +00:00
Nicolai Haehnle	2710171a15	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 llvm-svn: 364297	2019-06-25 11:52:30 +00:00
Matt Arsenault	22e3dc60a0	AMDGPU: Fix not using s33 for scratch wave offset in kernels Fixes missing piece from r363990. llvm-svn: 364099	2019-06-21 20:04:02 +00:00
Matt Arsenault	d88db6d7fc	AMDGPU: Always use s33 for global scratch wave offset Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990	2019-06-20 21:58:24 +00:00
Matt Arsenault	740322f1eb	AMDGPU: Add intrinsics for DS GWS semaphore instructions llvm-svn: 363983	2019-06-20 21:11:42 +00:00
Matt Arsenault	8ad1decf45	AMDGPU: Insert mem_viol check loop around GWS pre-GFX9 It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979	2019-06-20 20:54:32 +00:00
Matt Arsenault	e4c2e9b016	AMDGPU: Consolidate some getGeneration checks This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. llvm-svn: 363902	2019-06-19 23:54:58 +00:00
Matt Arsenault	4d55d024be	Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics" This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870	2019-06-19 19:55:27 +00:00
Simon Pilgrim	128ce93c60	Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797	2019-06-19 13:00:54 +00:00
Matt Arsenault	8d35dcd703	AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678	2019-06-18 13:19:57 +00:00
Nicolai Haehnle	6d71be4e67	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Nicolai Haehnle	490e83cd43	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	5250021672	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Stanislav Mekhanoshin	68a2fef9ae	[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32 Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339	2019-06-13 23:47:36 +00:00
Stanislav Mekhanoshin	5f581c9f08	[AMDGPU] gfx1010 premlane instructions Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185	2019-06-12 17:52:51 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Piotr Sobczak	9b11e93d90	[AMDGPU] Optimize image_[load\|store]_mip Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957	2019-06-10 15:58:51 +00:00
Alexander Timofeev	37bd9bd137	[AMDGPU] Partial revert for the `ba447bae74` "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749	2019-06-06 21:13:02 +00:00
Matt Arsenault	34c8b835b1	AMDGPU: Don't fix emergency stack slot at offset 0 This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665	2019-06-05 22:37:50 +00:00
Matt Arsenault	b812b7a45e	AMDGPU: Invert frame index offset interpretation Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661	2019-06-05 22:20:47 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Aakanksha Patil	d5443f8c21	AMDGPU: Return address lowering The patch computes the return address for the current function. Differential revision: https://reviews.llvm.org/D59666 llvm-svn: 362001	2019-05-29 18:20:11 +00:00
Michael Liao	5fc1dfa784	[AMDGPU] Correct the handling of inlineasm output registers. Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868	2019-05-28 19:37:09 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Matt Arsenault	5c714cbdd8	AMDGPU: Correct maximum possible private allocation size We were assuming a much larger possible per-wave visible stack allocation than is possible: `faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)` Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541	2019-05-23 19:38:14 +00:00
Matt Arsenault	99e6f4d11a	AMDGPU: Introduce TokenFactor for ABI register copies in call sequence The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906	2019-05-16 15:10:27 +00:00
Stanislav Mekhanoshin	28a1936f6d	[AMDGPU] gfx1010: use fmac instructions Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959	2019-05-04 04:20:37 +00:00
Stanislav Mekhanoshin	93f15c922f	[AMDGPU] gfx1010 loop alignment Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935	2019-05-03 21:17:29 +00:00
Sanjay Patel	284472be6d	[SelectionDAG] remove constant folding limitations based on FP exceptions We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791	2019-05-02 14:47:59 +00:00
Stanislav Mekhanoshin	64399da8b8	[AMDGPU] gfx1010 lost VOP2 forms of some add/sub Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757	2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin	692560dc98	[AMDGPU] gfx1010 MIMG implementation Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698	2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin	a224f68a10	[AMDGPU] gfx1010 DS implementation Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696	2019-05-01 16:11:11 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Matt Arsenault	055e4dce45	AMDGPU: Remove dx10-clamp from subtarget features Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302	2019-03-29 19:14:54 +00:00
Tim Renouf	6f0191a55a	[AMDGPU] Use three- and five-dword result type in image ops Some image ops return three or five dwords. Previously, we modeled that with a 4 or 8 dword register class. The register allocator could cleverly spot that some subregs were dead and allocate something else there, but that caused the de-optimization that waitcnt insertion would think that the result was used immediately. This commit allows such an image op to have a result with a three or five dword result, avoiding the above de-optimization. Differential Revision: https://reviews.llvm.org/D58905 Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b llvm-svn: 356757	2019-03-22 15:21:11 +00:00
Tim Renouf	677387d8dc	[AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsics Now we have vec3 MVTs, this commit implements dwordx3 variants of the buffer intrinsics. On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4 instruction, and a dwordx3 buffer store intrinsic is not supported. We need to support the dwordx3 load intrinsic because it is generated by subtarget-unaware code in InstCombine. Differential Revision: https://reviews.llvm.org/D58904 Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e llvm-svn: 356755	2019-03-22 14:58:02 +00:00
Tim Renouf	033f99a2e5	[AMDGPU] Added v5i32 and v5f32 register classes They are not used by anything yet, but a subsequent commit will start using them for image ops that return 5 dwords. Differential Revision: https://reviews.llvm.org/D58903 Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271 llvm-svn: 356735	2019-03-22 10:11:21 +00:00
Tim Renouf	361b5b2193	[AMDGPU] Support for v3i32/v3f32 Added support for dwordx3 for most load/store types, but not DS, and not intrinsics yet. SI (gfx6) does not have dwordx3 instructions, so they are not enabled there. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58902 Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9 llvm-svn: 356659	2019-03-21 12:01:21 +00:00
David Stuttard	fc2a747345	[AMDGPU] Allow MIMG with no uses in adjustWritemask in isel Summary: If an MIMG instruction has managed to get through to adjustWritemask in isel but has no uses (and doesn't enable TFC) then prevent an assertion by not attempting to adjust the writemask. The instruction will be removed anyway. Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58964 llvm-svn: 356540	2019-03-20 09:29:55 +00:00
Ryan Taylor	00e063ab92	[AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics Summary: Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59265 llvm-svn: 356465	2019-03-19 16:07:00 +00:00
Neil Henning	e85f6bd64f	[AMDGPU] Ban i8 min3 promotion. I found this really weird WWM-related case whereby through the WWM transformations our isel lowering was trying to promote 2 min's into a min3 for the i8 type, which our hardware doesn't support. The new min3_i8.ll test case would previously spew the error: PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68 Before the simple fix to our isel lowering to not do it for i8 MVT's. Differential Revision: https://reviews.llvm.org/D59543 llvm-svn: 356464	2019-03-19 15:50:24 +00:00
Tim Renouf	2e94f6e584	[AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers This commit allows v_cndmask_b32_e64 with abs, neg source modifiers on src0, src1 to be assembled and disassembled. This does appear to be allowed, even though they are floating point modifiers and the operand type is b32. To do this, I added src0_modifiers and src1_modifiers to the MachineInstr, which involved fixing up several places in codegen and mir tests. Differential Revision: https://reviews.llvm.org/D59191 Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea llvm-svn: 356398	2019-03-18 19:25:39 +00:00
Neil Henning	523dab0788	[AMDGPU] Add an experimental buffer fat pointer address space. Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 llvm-svn: 356373	2019-03-18 14:44:28 +00:00
Matt Arsenault	bc6d07ca46	MIR: Allow targets to serialize MachineFunctionInfo This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215	2019-03-14 22:54:43 +00:00
Matt Arsenault	caf1316f71	IR: Add immarg attribute This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. llvm-svn: 355981	2019-03-12 21:02:54 +00:00
Matt Arsenault	26e76ef0e2	DAG: Don't try to cluster loads with tied inputs This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728	2019-03-08 20:46:15 +00:00
Dmitry Preobrazhensky	ef92035827	[AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 llvm-svn: 354969	2019-02-27 13:12:12 +00:00
Stanislav Mekhanoshin	da1628eb67	[AMDGPU] Fixed hang during DAG combine SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 llvm-svn: 354926	2019-02-26 20:56:25 +00:00
Matt Arsenault	aa6fb4c45e	AMDGPU: Remove debugger related subtarget features As far as I know these aren't needed anymore. llvm-svn: 354634	2019-02-21 23:27:46 +00:00
Stanislav Mekhanoshin	42e229e130	[AMDGPU] fix commuted case of sub combine Differential Revision: https://reviews.llvm.org/D58481 llvm-svn: 354543	2019-02-21 02:58:00 +00:00
Stanislav Mekhanoshin	871821f786	[AMDGPU] Ressociate 'add (add x, y), z' to use SALU Reassociate adds to collect scalar operands in a single instruction when possible. That will result in a scalar add followed by vector instead of two vector adds, thus better utilizing SALU. Differential Revision: https://reviews.llvm.org/D58220 llvm-svn: 354066	2019-02-14 22:11:25 +00:00
Stanislav Mekhanoshin	0e858b028d	[AMDGPU] Split dot-insts feature Differential Revision: https://reviews.llvm.org/D57971 llvm-svn: 353587	2019-02-09 00:34:21 +00:00
Craig Topper	784929d045	Implementation of asm-goto support in LLVM This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 llvm-svn: 353563	2019-02-08 20:48:56 +00:00
Matt Arsenault	a8b4339c2f	AMDGPU/GlobalISel: Legalize addrspacecast Use a placeholder constant for now on targets that need the load from the queue ptr. llvm-svn: 353497	2019-02-08 02:40:47 +00:00
Scott Linder	e2c5847414	[AMDGPU] Consider XOR in waterfall loop as a terminator Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator. Differential Revision: https://reviews.llvm.org/D57703 llvm-svn: 353207	2019-02-05 19:50:32 +00:00
Scott Linder	d19d197221	[AMDGPU] Support emitting GOT relocations for function calls Differential Revision: https://reviews.llvm.org/D57416 llvm-svn: 353083	2019-02-04 20:00:07 +00:00
Tim Corringham	fa3e4e5b53	[AMDGPU] Fix for vector element insertion Summary: Incorrect code was generated when lowering insertelement operations for vectors with 8 or 16 bit elements. The value being inserted was not adjusted for the position of the element within the 32 bit word and so only the low element within each 32 bit word could receive the intended value. Fixed by simply replicating the value to each element of a congruent vector before the mask and or operation used to update the intended element. A number of affected LIT tests have been updated appropriately. before the mask & or into the intended Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Tags: #llvm Differential Revision: https://reviews.llvm.org/D57588 llvm-svn: 352885	2019-02-01 16:51:09 +00:00
Matt Arsenault	cdd191d9db	AMDGPU: Add DS append/consume intrinsics Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. llvm-svn: 352422	2019-01-28 20:14:49 +00:00
Tim Corringham	824ca3f3dd	[AMDGPU] Add intrinsics for 16 bit interpolation Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357	2019-01-28 13:48:59 +00:00
Matt Arsenault	a5840c3c39	Codegen support for atomicrmw fadd/fsub llvm-svn: 351851	2019-01-22 18:36:06 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Matt Arsenault	85af701e85	AMDGPU: Remove llvm.SI.load.const It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone llvm-svn: 351586	2019-01-18 20:27:02 +00:00
Changpeng Fang	fe9269f804	AMDGPU: Adjust the chain for loads writing to the HI part of a register. Summary: For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part of the register to maintain the appropriate order. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D56454 llvm-svn: 351379	2019-01-16 21:32:53 +00:00
Marek Olsak	c5cec5e1fa	AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 llvm-svn: 351351	2019-01-16 15:43:53 +00:00
Marek Olsak	33eb4d947d	AMDGPU: Add a fast path for icmp.i1(src, false, NE) Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150	2019-01-15 02:13:18 +00:00
David Stuttard	f77079f892	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054	2019-01-14 11:55:24 +00:00
Stanislav Mekhanoshin	d3757d3f3a	[AMDGPU] Separate feature dot-insts Differential Revision: https://reviews.llvm.org/D56524 llvm-svn: 350793	2019-01-10 03:25:20 +00:00
Stanislav Mekhanoshin	ed0d6c60af	Remove check for single use in ShrinkDemandedConstant This removes check for single use from general ShrinkDemandedConstant to the BE because of the AArch64 regression after D56289/rL350475. After several hours of experiments I did not come up with a testcase failing on any other targets if check is not performed. Moreover, direct call to ShrinkDemandedConstant is not really needed and superceed by SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D56406 llvm-svn: 350684	2019-01-09 02:24:22 +00:00
Piotr Sobczak	378131bae0	[AMDGPU] Handle OR as operand of raw load/store Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208	2019-01-02 09:47:41 +00:00
Simon Pilgrim	44dfd81d01	Fix unused variable warning. NFCI. llvm-svn: 348649	2018-12-07 21:44:25 +00:00
Matt Arsenault	ce2e053134	AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load llvm-svn: 348625	2018-12-07 18:41:39 +00:00
Matt Arsenault	ca8eb0b672	AMDGPU: Remove llvm.SI.tbuffer.store llvm-svn: 348619	2018-12-07 18:03:47 +00:00
Matt Arsenault	aa9bcd56b1	AMDGPU: Remove llvm.AMDGPU.kill This is the last of the old AMDGPU intrinsics. llvm-svn: 348615	2018-12-07 17:46:16 +00:00
Nicolai Haehnle	a7b00058e0	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 348050	2018-11-30 22:55:38 +00:00
Nicolai Haehnle	a9cc92c247	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049	2018-11-30 22:55:29 +00:00
David Stuttard	c6603861d8	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911	2018-11-29 20:14:17 +00:00
David Stuttard	535c1af0bf	Fix: Add support for TFE/LWE in image intrinsic My change svn-id: 347871 caused a buildbot failure due to an unused variable def (used in an assert). Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336 llvm-svn: 347876	2018-11-29 15:56:36 +00:00
David Stuttard	de02e4b1cc	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871	2018-11-29 15:21:13 +00:00
Stanislav Mekhanoshin	443a7f9788	[AMDGPU] Disable DAG combine at -O0 Differential Revision: https://reviews.llvm.org/D54358 llvm-svn: 347659	2018-11-27 15:13:37 +00:00
Fangrui Song	d83a5526d5	[AMDGPU] Fix -Wunused-variable llvm-svn: 347234	2018-11-19 17:54:27 +00:00
Stanislav Mekhanoshin	054f8101f1	[AMDGPU] Convert insert_vector_elt into set of selects This allows to avoid scratch use or indirect VGPR addressing for small vectors. Differential Revision: https://reviews.llvm.org/D54606 llvm-svn: 347231	2018-11-19 17:39:20 +00:00
Stanislav Mekhanoshin	bcb34ac2ea	[AMDGPU] combine extractelement into several selects An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision: https://reviews.llvm.org/D54351 llvm-svn: 346800	2018-11-13 21:18:21 +00:00
Nicolai Haehnle	bc233f5523	Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics" This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364	2018-11-07 21:53:43 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Sylvestre Ledru	df92dabaef	Fixed inclusion of M_PI fow MinGW-w64 Patch by KOLANICH llvm-svn: 346000	2018-11-02 17:25:40 +00:00
Neil Henning	7d1b77df57	[AMDGPU] UBSan bug fix for r345710 UBSan detected an error in our ISelLowering that is exposed only when you have a dmask == 0x1. Fix this by adding in an explicit check to ensure we don't do the UBSan detected shl << 32. llvm-svn: 345962	2018-11-02 10:24:57 +00:00
Reid Kleckner	4dc0b1ac60	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882	2018-11-01 19:54:45 +00:00
Neil Henning	63718b214a	[AMDGPU] support image load/store a16 Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710	2018-10-31 10:34:48 +00:00
Matt Arsenault	abc4f29f9c	AMDGPU: Remove custom BUILD_VECTOR combine This was looping in a testcase and removing it now slightly improves a test. llvm-svn: 345560	2018-10-30 01:37:59 +00:00
Matt Arsenault	687ec75d10	DAG: Change behavior of fminnum/fmaxnum nodes Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914	2018-10-22 16:27:27 +00:00
Nicolai Haehnle	4821937d2e	AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698	2018-10-17 15:37:48 +00:00
Nicolai Haehnle	c4a2ff0950	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696	2018-10-17 15:37:30 +00:00
Konstantin Zhuravlyov	aa067cb9fb	AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa The isAmdCodeObjectV2 is a misleading name which actually checks whether the os is amdhsa or mesa. Also add a test to make sure we do not generate old kernel header for code object v3. Differential Revision: https://reviews.llvm.org/D52897 llvm-svn: 343813	2018-10-04 21:02:16 +00:00
Tim Renouf	a37679d67b	[AMDGPU] Fix for negative offsets in buffer/tbuffer intrinsics Summary: The new buffer/tbuffer intrinsics handle an out-of-range immediate offset by moving/adding offset&-4096 to a vgpr, leaving an in-range immediate offset, with a chance of the move/add being CSEd for similar loads/stores. However it turns out that a negative offset in a vgpr is illegal, even if adding the immediate offset makes it legal again. Therefore, this commit disables the offset&-4096 thing if the offset is negative. Differential Revision: https://reviews.llvm.org/D52683 Change-Id: Ie02f0a74f240a138dc2a29d17cfbd9e350e4ed13 llvm-svn: 343672	2018-10-03 10:29:43 +00:00
Matt Arsenault	f432011d33	AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879	2018-09-24 13:18:15 +00:00
Matt Arsenault	ebf46143ea	AMDGPU: Don't form fmed3 if it will require materialization If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443	2018-09-18 02:34:54 +00:00
David Stuttard	20de3e99b5	[AMDGPU] Ensure trig range reduction only used for subtargets that require it Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222	2018-09-14 10:27:19 +00:00

... 3 4 5 6 7 ...

967 Commits