llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	a55daa1461	[MC] De-capitalize some MCStreamer::Emit* functions	2020-02-14 19:11:53 -08:00
Matt Arsenault	65dbdc329f	AMDGPU: Don't preserve analyses with div64 IR expansion The dominator tree needs to be updated, but that isn't handled now.	2020-02-14 20:06:02 -05:00
Matt Arsenault	dc3e499dd4	AMDGPU/GlobalISel: Fix G_EXTRACT of 96-bit results This would assert on an unhandled size in getRegSplitParts.	2020-02-14 15:57:40 -08:00
Matt Arsenault	60fea2713d	AMDGPU/GlobalISel: Improve 16-bit bswap Match the new DAG behavior and use v_perm_b32 when available. Also does better on SI/CI by expanding 16-bit swaps. Also fix non-power-of-2 cases.	2020-02-14 15:57:39 -08:00
Stanislav Mekhanoshin	922197d664	[TBLGEN] Allow to override RC weight Differential Revision: https://reviews.llvm.org/D74509	2020-02-14 15:49:52 -08:00
Austin Kerbow	07824e65bf	[AMDGPU] Always enable XNACK feature when support is explicitly requested Differential Revision: https://reviews.llvm.org/D74630	2020-02-14 11:58:58 -08:00
Matt Arsenault	9ec668606b	AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented. The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes.	2020-02-14 11:37:07 -08:00
Matt Arsenault	34d9a16e54	AMDGPU: Add option to expand 64-bit integer division in IR I didn't realize we were already expanding 24/32-bit division here already. Use the available IntegerDivision utilities. This uses loops, so produces significantly smaller code than the inline DAG expansion. This now requires width reductions of 64-bit divisions before introducing the expanded loops. This helps work around missing legalization in GlobalISel for division, which are the only remaining core instructions that didn't work at all. I think this is plausibly a better implementation than exists in the DAG, although turning it on by default misses out on the constant value optimizations and also needs benchmarking.	2020-02-14 11:16:08 -08:00
Matt Arsenault	bfbfa18591	GlobalISel: Lower s64->s16 G_FPTRUNC This is more or less directly ported from the AMDGPU custom lowering for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES instead of creating shift/trunc to extract the two halves, and zexting an inverted compare instead of select_cc). This also does not include the fast math expansion the DAG which converts to f32 and then to f16. I think that belongs in a pre-legalize combine instead.	2020-02-14 10:46:58 -08:00
Matt Arsenault	8c2c0b3637	AMDGPU: Improve i16/v2i16 bswap	2020-02-14 09:53:22 -08:00
Matt Arsenault	a257bde420	AMDGPU/GlobalISel: Handle G_BSWAP	2020-02-14 09:09:44 -08:00
Fangrui Song	bcd24b2d43	[AsmPrinter][MCStreamer] De-capitalize EmitInstruction and EmitCFI*	2020-02-13 22:08:55 -08:00
Fangrui Song	1d49eb00d9	[AsmPrinter] De-capitalize all AsmPrinter::Emit* but EmitInstruction Similar to rL328848.	2020-02-13 17:06:24 -08:00
Fangrui Song	0bc77a0f0d	[AsmPrinter] De-capitalize some AsmPrinter::Emit* functions Similar to rL328848.	2020-02-13 13:38:33 -08:00
Fangrui Song	0dce409cee	[AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile	2020-02-13 13:22:49 -08:00
Matt Arsenault	5adbf7d57f	AMDGPU/GlobalISel: Make G_TRUNC legal This is required to be legal. I'm not sure how we were getting away without defining any rules for it.	2020-02-13 15:25:52 -05:00
Matt Arsenault	bfe3779459	AMDGPU: Use v_perm_b32 to implement bswap Also greatly improve i64 lowering. LegalizeIntegerTypes does the correct narrowing if i64 isn't legal. Just workaround this for SelectionDAG by making i64 legal and splitting in the patterns.	2020-02-13 09:45:31 -08:00
Austin Kerbow	5db0b2521c	[AMDGPU][GlobalISel] Handle 64byte EltSIze in getRegSplitParts Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74518	2020-02-12 19:11:52 -08:00
Matt Arsenault	d1b393d92c	AMDGPU/GlobalISel: Select G_CTTZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:46 -08:00
Matt Arsenault	045a8921d7	AMDGPU/GlobalISel: Select G_CTLZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:45 -08:00
Matt Arsenault	e174c278ca	AMDGPU/GlobalISel: Fix mapping G_ICMP with constrained result When SI_IF is inserted, it constrains the source register with a register class, which was quite likely a G_ICMP. This was incorrectly treating it as a scalar, and then applyMappingImpl would end up producing invalid MIR since this was unexpected. Also fix not using all VGPR sources for vcc outputs.	2020-02-12 16:19:45 -08:00
Matt Arsenault	fa61e200e5	AMDGPU/GlobalISel: Widen non-power-of-2 load results Load extra bits if suitably aligned. This allows using widened 3-vector loads on SI, and fixes legalization for <9 x s32> (which LSV apparently forms frequently on lowered kernel argument lists). Fix incorrectly treating these as legal on SI. This should emit a 64-bit store and a 32-bit store. I think all of the load and store rules are just about complete, but due for a rewrite.	2020-02-12 09:35:10 -05:00
Hans Wennborg	a19de32095	Fix unused function warning (PR44808)	2020-02-12 15:12:48 +01:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Jay Foad	e9900b1fbf	[AMDGPU] Add one more pass to LLVMInitializeAMDGPUTarget	2020-02-12 11:19:14 +00:00
Nicolai Hähnle	ab2f610f38	AMDGPU: llvm.amdgcn.writelane is a source of divergence Summary: Consider: %r = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2) This produces a value that is 0 on lane 1, and 2 everywhere else; i.e., it is divergent. Reported-by: Marek Olsak <Marek.Olsak@amd.com> Reviewers: arsenm, foad, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74400	2020-02-12 09:12:56 +01:00
Matt Arsenault	6d4ebada79	AMDGPU: Use conditions directly in division expansion This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided by just using the boolean values. This was copied from the original DAG expansion, which also has the same problem. This doesn't have a observable change using SelectionDAG, but since GlobalISel is missing these optimizations, the final code was noticeably longer.	2020-02-11 23:11:30 -05:00
Austin Kerbow	3a312c3ee5	[AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned Differential Revision: https://reviews.llvm.org/D74261	2020-02-11 16:57:13 -08:00
Matt Arsenault	b30e122333	AMDGPU: Don't expand more special div cases in IR These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding division in the IR.	2020-02-11 19:01:06 -05:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Matt Arsenault	f734ce0488	AMDGPU: Fix crash on v3i15 kernel arguments This was split into 3 i15 arguments. The i15 piece needs to be rounded to a simple MVT for the memory type.	2020-02-11 18:11:39 -05:00
Matt Arsenault	92c62582fc	AMDGPU: Directly use rcp intrinsic in idiv expansions Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directly emit the rcp intrinsic when using it to implement integer division to avoid a pointlessly complex sequence.	2020-02-11 18:11:39 -05:00
Matt Arsenault	b87e3e2d0d	AMDGPU: Don't create potentially dead rcp declarations This will introduce unused declarations if this doesn't reach any of the paths that will really use it.	2020-02-11 18:11:39 -05:00
Jay Foad	9df0c264d4	[AMDGPU] Fix implicit operands for ENTER_WWM pseudo Summary: SIInstrInfo::expandPostRAPseudo converts ENTER_WWM in-place into an S_OR_SAVEEXEC instruction that needs certain implicit operands. Without this patch I get errors like this that make it harder to use -stop-after to bisect the pass pipeline: $ llc -march=amdgcn test/CodeGen/AMDGPU/wqm.ll -stop-after=postrapseudos -o - \| sed -E 's/ (from\|into) custom "TargetCustom[0-9]+"//' \| llc -march=amdgcn -x=mir error: <stdin>:1295:70: missing implicit register operand 'implicit-def $scc' renamable $sgpr2_sgpr3 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec ^ Note that this error is currently only generated by MIParser but it comes with a FIXME comment: // FIXME: Move the implicit operand verification to the machine verifier. Reviewers: critson, arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74428	2020-02-11 20:11:41 +00:00
Stanislav Mekhanoshin	453a8f3af7	[AMDGPU] Remove AMDGPURegisterInfo R600 and GCN do not have anything in common in terms of register file organization anymore. Differential Revision: https://reviews.llvm.org/D74426	2020-02-11 11:13:38 -08:00
Eric Astor	8d5bf0422b	[ms] [llvm-ml] Add support for attempted register parsing Summary: Add a new method (tryParseRegister) that attempts to parse a register specification. MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't. Reviewers: thakis Reviewed By: thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D73486	2020-02-11 10:45:33 -05:00
Jay Foad	b06a13f541	[AMDGPU] Fix non-deterministic iteration order Summary: As far as I know this did not affect code generation, but it did affect the order of -debug-only=si-wqm output and the naming of autonamed values in -print-after=si-wqm output. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, mgrang, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74317	2020-02-11 09:19:30 +00:00
diggerlin	09d26b79d2	[NFC] Refactor the tuple of symbol information with structure for llvm-objdump SUMMARY: refator the std::tuple<uint64_t, StringRef, uint8_t> to structor Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74240	2020-02-10 19:23:01 -05:00
Matt Arsenault	7af7b96a9b	AMDGPU: Move R600 test compatability hack Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn intrinsics on r600.	2020-02-10 10:02:06 -08:00
Stanislav Mekhanoshin	ed3527c648	[AMDGPU] Split R600 and GCN subregs These are generated and do not need to have the same values. We are defining separate subregs for R600 and GCN but then using AMDGPU subregs on R600. Differential Revision: https://reviews.llvm.org/D74248	2020-02-10 08:29:56 -08:00
Sebastian Neubauer	8756869170	[AMDGPU] Add a16 feature to gfx10 Based on D72931 This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gfx7/8. Differential Revision: https://reviews.llvm.org/D73956	2020-02-10 09:04:23 +01:00
Matt Arsenault	312a9d1b83	GlobalISel: Fix narrowScalar for G_{CTLZ\|CTTZ}_ZERO_UNDEF Narrow these for 64-bit VALU for AMDGPU.	2020-02-09 19:02:38 -05:00
Matt Arsenault	c437f6c687	AMDGPU/GlobalISel: Split 64-bit G_CTPOP in RegBankSelect	2020-02-09 18:39:33 -05:00
Matt Arsenault	2126c70e3a	AMDGPU/GlobalISel: Don't mis-select vector index on a constant Vector indexing with a constant index should be folded out in the legalizer, but this was accidentally falling through. This would produce the indexing operation with $noreg. Handle this case as a dynamic index just in case a bug like this happens again in the future.	2020-02-09 18:02:37 -05:00
Matt Arsenault	f4a38c114e	AMDGPU/GlobalISel: Look through casts when legalizing vector indexing We were failing to find constants that were casted. I feel like the artifact combiner should have folded the constant in the trunc before the custom lowering, but that doesn't happen.	2020-02-09 18:02:10 -05:00
Matt Arsenault	00115d767f	AMDGPU: Remove dead kill handling At one point a custom node was used for kill handling, but now the intrinsic is directly selected. Remove leftover pattern machinery.	2020-02-09 17:59:24 -05:00
Matt Arsenault	6e1770821f	AMDGPU: Fix SI_IF lowering when the save exec reg has terminator uses Reverts part of `6524a7a2b9`. Since that commit, the expansion was ignoring the actual save exec register produced by the instruction, and looking at other instructions. I do not understand why it was looking at other instructions, but relying on this scan was wrong. Fixes verifier errors after SI_IF is tail duplicated, which should be correct to do. The results were fed into a phi, which was lowered to the S_MOV_B64_term instructions.	2020-02-09 17:59:19 -05:00
Fangrui Song	ee3f13b81d	Fix -Wunused-lambda-capture for -DLLVM_ENABLE_ASSERTIONS=off builds after `6556c615f3`	2020-02-08 19:03:58 -08:00
Huihui Zhang	6556c615f3	Reland "[AMDGPU] Fix data race on RegisterBank initialization."	2020-02-07 14:18:48 -08:00
Changpeng Fang	884acbb9e1	AMDGPU: Enhancement on FDIV lowering in AMDGPUCodeGenPrepare Summary: The accuracy limit to use rcp is adjusted to 1.0 ulp from 2.5 ulp. Also, afn instead of arcp is used to allow inaccurate rcp to be used. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D73588	2020-02-07 11:46:23 -08:00
Changpeng Fang	6370c7c13e	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-07 11:06:33 -08:00
Petar Avramovic	7df5fc9e03	[GlobalISel] Add buildMerge with SrcOp initializer list Allows more flexible use of buildMerge in places where use operands are available as SrcOp since it does not require explicit conversion to Register. Simplify code with new buildMerge. Differential Revision: https://reviews.llvm.org/D74223	2020-02-07 18:43:45 +01:00
Matt Arsenault	2f885cbe90	AMDGPU/GlobalISel: Fix move s.buffer.load to VALU We were executing this in a waterfall loop as a placeholder, but this should really be converted to a MUBUF load. Also execute in a waterfall loop if the resource isn't an SGPR. This is a case where the DAG handling was wrong because doing the right thing was too hard. Currently, this will mishandle 96-bit loads. There's currently no way to track the original memory size with an MMO, so these loads will be widened andd the resulting memory size will be 128-bits.	2020-02-07 07:19:01 -08:00
Matt Arsenault	8de2dad9e0	GlobalISel: Fix lowering of G_CTLZ/G_CTTZ The type passed to lower was invalid, so I'm not sure how this was even working before. The source and destination type also do not have to match, so make sure to use the right ones.	2020-02-07 06:54:12 -08:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Matt Arsenault	6a570dc548	AMDGPU/GlobalISel: Fix non-pow-2 add/sub/mul for 16-bit insts These wouldn't legalize between 16-bits and 32-bits on targets with 16-bit instructions.	2020-02-06 21:43:54 -05:00
Stanislav Mekhanoshin	cacc3b7a55	[AMDGPU] Cleanup assumptions about generated subregs We are using countPopulation on a LaneBitmask to determine a number of registers it covers. This is the assumption which does not necessarily need to be true. It is not changed but factored into a single call SIRegisterInfo::getNumCoveredRegs(). Some other places are cleaned up with respect to assumptions about subreg indexes values and tablegen behavior. Differential Revision: https://reviews.llvm.org/D74177	2020-02-06 17:39:24 -08:00
Stanislav Mekhanoshin	2863c26968	Revert "AMDGPU: Limit the search in finding the instruction pattern for v_swap generation." This reverts commit `9827806481`.	2020-02-06 17:38:55 -08:00
Changpeng Fang	9827806481	AMDGPU: Limit the search in finding the instruction pattern for v_swap generation. Summary: Current implementation of matchSwap in SIShrinkInstructions searches the entire use_nodbg_operands set to find the possible pattern to generate v_swap instruction. This approach will lead to a O(N^3) in compile time for SIShrinkInstructions. But in reality, the matching pattern only exists within nearby instructions in the same basic block. This work limits the search to a maximum of 16 instructions, and has a linear compile time comsumption. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D74180	2020-02-06 16:40:21 -08:00
Matt Arsenault	03a2d0045d	AMDGPU: Add compile time hack for hasCFUser Assume the control flow intrinsic results are never casted, and early exit based on the type.	2020-02-06 11:41:34 -08:00
Matt Arsenault	5a8c0f552b	AMDGPU/GlobalISel: Avoid handling registers twice in waterfall loops When multiple instructions are moved into a waterfall loop, it's possible some of them re-use the same operands. Avoid creating multiple sequences of readfirstlanes for them. None of the current uses will hit this, but will be used in a future patch.	2020-02-06 09:38:24 -08:00
Matt Arsenault	89b7091c28	AMDGPU: Make LDS_DIRECT an artifical register	2020-02-05 17:47:22 -05:00
Matt Arsenault	baafe82b07	AMDGPU/GlobalISel: Remove bitcast legality hack	2020-02-05 16:24:24 -05:00
Matt Arsenault	364326ce66	AMDGPU/GlobalISel: Add mem operand to s.buffer.load intrinsic Really the intrinsic definition is wrong, but work around this here. The DAG lowering introduces an MMO. We have to introduce a new operation to avoid the verifier complaining about the missing mayLoad.	2020-02-05 15:04:42 -05:00
Matt Arsenault	5aa6e246a1	AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI Use cmp ord instead of cmp_class compared to the DAG version for the nan check, but mostly try to match the existsing pattern. I think the sign doesn't matter for fract, so we could do a little better with the source modifier matching. I think this is also still broken as in D22898, but I'm leaving it as-is for now while I don't have an SI system to test on.	2020-02-05 14:32:01 -05:00
Matt Arsenault	7bffa97285	AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE These have a better chance of combining with other operations and are currently much better supported than G_EXTRACT.	2020-02-05 12:56:10 -05:00
Matt Arsenault	e65e6d052e	AMDGPU/GlobalISel: Legalize TFE image result loads Rewrite the result register pair into the expected sinigle register format in the legalizer. I'm also operating under the assumption that TFE doesn't apply to stores or atomics, but don't know if this is true or not.	2020-02-05 12:40:20 -05:00
Matt Arsenault	096cd991ee	AMDGPU: Fix divergence analysis of control flow intrinsics The mask results of these should be uniform. The trickier part is the dummy booleans used as IR glue need to be treated as divergent. This should make the divergence analysis results correct for the IR the DAG is constructed from. This should allow us to eliminate requiresUniformRegister, which has an expensive, recursive scan over all users looking for control flow intrinsics. This should avoid recent compile time regressions.	2020-02-05 09:30:54 -08:00
Jordan Rupprecht	9f507bfd8d	NFC: fix unused var warnings in no-assert builds	2020-02-05 09:26:59 -08:00
Matt Arsenault	69cc9f3046	AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load The 96-bit results need to be widened. I find the interaction between LegalizerHelper and MIRBuilder somewhat awkward. The custom legalization is called by the LegalizerHelper, but then does not have access to the helper. You have to construct a new helper, which then does not own the MachineIRBuilder, but does modify it. Maybe custom legalization should be passed the helper?	2020-02-05 12:01:34 -05:00
Matt Arsenault	307e0d5490	AMDGPU/GlobalISel: Fix processing new phi in waterfall loop The adjusted iterator range included the last we just inserted, and don't want to process. Figure out the new iterator range before inserting phis. This was a harmless problem, but added an unnecessary complication for a future patch.	2020-02-05 11:52:42 -05:00
Matt Arsenault	dfa9420f09	AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR If we have s_pack_* instructions, legalize this to G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the s_pack_* instructions really behave. If we don't have s_pack_ instructions, expand this by creating a merge to s32 and bitcasting. This expands to the expected bit operations. I think this eventually should go in a new bitcast legalize action type in LegalizerHelper. We already directly emit the shift operations in RegBankSelect for the vector case. This could possibly be cleaned up, but I also may want to defer doing this expansion to selection anyway. I'll see about that when I try to actually match VOP3P instructions. This breaks the selection of the build_vector since tablegen doesn't know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.	2020-02-05 11:52:18 -05:00
Sebastian Neubauer	163e33b290	[AMDGPU] Fix lowering a16 image intrinsics scalar_to_vector takes only one argument, not two. The a16 tests now also check the packing of coordinates into registers Differential Revision: https://reviews.llvm.org/D73482	2020-02-05 10:54:34 +01:00
Sebastian Neubauer	3bc7ffdaab	[AMDGPU] Use v3f32 type in image instructions This should lower the amount of used registers for gfx9. I updated some of the changed tests with the update script because changing them by hand is tedious. Differential Revision: https://reviews.llvm.org/D73884	2020-02-05 10:35:41 +01:00
Jan Vesely	e6686adf8a	AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x) The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017	2020-02-05 00:24:07 -05:00
Matt Arsenault	9260d01faa	AMDGPU: Correct memory size for image intrinsics This was incorrectly rounding up to the next power of 2. v4f32 was rounding up to v8f32, which was just wrong. There are also v3i16/v3f16 available in MVT, so we don't even need to round the f16 cases anymore. Additionally, this field is really an EVT so we don't even need to consider this. Also switch some asserts to return invalid. We should have an IR verifier for these intrinsic return types, but for now it's better to not assert on IR that passes the verifier. This should also probably be fixed to consider that dmask is really eliminating some of the loaded components.	2020-02-04 22:29:23 -05:00
Matt Arsenault	4f9f5d09de	AMDGPU: Fix isAlwaysUniform for simple asm SGPR results We were handling the case where the result was a struct with an extracted SGPR component, but not for the simple case.	2020-02-04 13:34:14 -08:00
Matt Arsenault	12fe9b26ec	AMDGPU/GlobalISel: Select G_SEXT_INREG	2020-02-04 13:23:53 -08:00
Matt Arsenault	0693e827ed	AMDGPU/GlobalISel: Do a better job splitting 64-bit G_SEXT_INREG We don't need to expand to full shifts for the > 32-bit case. This just switches to a sext_inreg of the high half.	2020-02-04 13:23:53 -08:00
Matt Arsenault	05f2a04ba7	AMDGPU/GlobalISel: Legalize G_SEXT_INREG Split the VALU 64-bit case in RegBankSelect.	2020-02-04 13:23:53 -08:00
Austin Kerbow	0f116fd9d8	[AMDGPU] Fix infinite loop with fma combines https://reviews.llvm.org/D72312 introduced an infinite loop which involves DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine. fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ... This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32. Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math', and no source mods allowed on one of the users of the Op. This fix makes changes to indicate that it is not free to negate a fma if it has users with source mods. Differential Revision: https://reviews.llvm.org/D73939	2020-02-04 13:11:09 -08:00
Matt Arsenault	9b0ce8edfa	AMDGPU/GlobalISel: Remove extension legality hacks The legalization has improved since this was added, and the tests relying on this no longer need it.	2020-02-04 12:50:47 -08:00
Matt Arsenault	5d2749938c	AMDGPU/GlobalISel: Custom lower G_FEXP	2020-02-04 11:50:55 -08:00
Matt Arsenault	b461436d01	AMDGPU/GlobalISel: Legalize s16 G_FEXP2	2020-02-04 11:50:55 -08:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Matt Arsenault	75fcdfa1fc	AMDGPU: Cleanup SMRD buffer selection The usage of the Imm out argument from SelectSMRDOffset is pretty confusing. Stop trying to reject CI immediates in the case where the offset field can be used. It's not an illegal way to encode the immediate, so just prefer the better encoding pattern with AddedComplexity. We probably don't even really need the different opcodes for the different offset types anymore, but that will be more work to cleanup. The SMRD non-buffer load patterns could also use a cleanup to be done separately.	2020-02-04 10:28:08 -08:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Jay Foad	2252cac694	[ANDGPU] getMemOperandsWithOffset: support BUF non-stack-access instructions with resource but no vaddr Summary: This enables clustering for many more BUF instructions. Reviewers: rampitec, arsenm, nhaehnle Subscribers: jvesely, wdng, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73868	2020-02-03 22:49:30 +00:00
Matt Arsenault	7d3aace3f5	AMDGPU: Add flag to control mem intrinsic expansion GlobalISel doesn't implement the expansion for these yet, so add a flag to force expanding these so it's possible to avoid these for a while.	2020-02-03 14:26:01 -08:00
Matt Arsenault	cb7b661d3d	AMDGPU: Analyze divergence of inline asm	2020-02-03 12:42:16 -08:00
Matt Arsenault	2758ae41ae	AMDGPU/GlobalISel: Allow selecting s128 load/stores	2020-02-03 12:28:08 -08:00
Matt Arsenault	726446a009	AMDGPU: Fix splitting wide f32 s.buffer.load intrinsics This would witch f32 to i32, and produce an invald concat_vectors from i32 pieces to an f32 vector.	2020-02-03 12:28:08 -08:00
Matt Arsenault	cd7650c186	GlobalISel: Implement fewerElementsVector for G_SEXT_INREG Start using a new strategy with a combination of merge and unmerges. This allows scalarizing before lowering, which in cases like <2 x s128> avoids producing giant illegal shifts.	2020-02-03 11:47:33 -08:00
Jay Foad	05297b7cbe	[AMDGPU] getMemOperandsWithOffset: add resource operand for BUF instructions Summary: This prevents unwanted clustering of BUF instructions with the same vaddr but different resource descriptors. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73867	2020-02-03 17:06:09 +00:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
Matt Arsenault	00b22df71d	AMDGPU: Fix extra type mangling on llvm.amdgcn.if.break These have to be the same mask type.	2020-02-03 07:02:05 -08:00
Matt Arsenault	e4bc55bd94	AMDGPU/GlobalISel: Reduce indentation	2020-02-03 05:41:14 -08:00
Simon Moll	5c8ba508b2	[NFC] unsigned->Register in storeRegTo/loadRegFromStack Summary: This patch makes progress on the 'unsigned -> Register' rewrite for `TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`. Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka Reviewed By: arsenm Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73870	2020-02-03 14:22:16 +01:00
Jay Foad	97d9a76afc	[AMDGPU] Don't remove short branches over kills Summary: D68092 introduced a new SIRemoveShortExecBranches optimization pass and broke some graphics shaders. The problem is that it was removing branches over KILL pseudo instructions, and the fix is to explicitly check for that in mustRetainExeczBranch. Reviewers: critson, arsenm, nhaehnle, cdevadas, hakzsam Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73771	2020-02-03 09:26:52 +00:00
Nicolai Hähnle	ba8110161d	AMDGPU/GFX10: Fix NSA reassign pass when operands are undef Summary: Virtual registers that are undef have an empty LiveInterval at this point, which means beginIndex() and endIndex() cannot be used. We only need those indices to determine the range in which to scan for affected other NSA instructions, and undef operands cannot contribute to that range. Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73831	2020-02-01 22:41:40 +01:00
Matt Arsenault	c0b12916a7	AMDGPU/GlobalISel: Use more wide vector load/stores This improves the type breakdown for some large vectors. For example, we now get a <4 x s32> and s32 store instead of 5 s32 stores for <5 x s32>.	2020-02-01 10:47:21 -05:00
Matt Arsenault	e3117e5c30	AMDGPU/GlobalISel: Improve legalization of wide stores This fixes legalizations of global stores > 128-bits. It seems work is needed on how this split actually occurs. For example, we get the right code for s160, with an s128 and s32 load, but get 5 s32 loads for <5 x s32>.	2020-02-01 10:47:03 -05:00
Matt Arsenault	98aaed2980	AMDGPU/GlobalISel: Fix forming G_TRUNC with vcc result This somehow got lost when I fixed the boolean handling.	2020-01-31 20:29:41 -05:00
alex-t	5df1ac7846	[AMDGPU] fixed divergence driven shift operations selection Differential Revision: https://reviews.llvm.org/D73483 Reviewers: rampitec	2020-01-31 20:49:56 +03:00
Jay Foad	2a1b5af299	[GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister Summary: As a side effect some redundant copies of constant values are removed by CSEMIRBuilder. Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73789	2020-01-31 17:07:16 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Matt Arsenault	b3726ecea4	AMDGPU: Fix potential use of undefined value	2020-01-31 10:38:58 -05:00
Matt Arsenault	6fb544d1d2	AMDGPU/GlobalISel: Combine FMIN_LEGACY/FMAX_LEGACY Try out using combine definition rules. This really should be a post-legalizer combine, but the combiner pass is currently pre-legalize. Most of the target combines are really post-legalize, so we should probably move the pass.	2020-01-31 06:58:04 -08:00
Matt Arsenault	49e424e08e	AMDGPU/GlobalISel: Select global MUBUF atomicrmw	2020-01-31 06:05:41 -08:00
Matt Arsenault	0426c2d07d	Reapply "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `6a4acb9d80`.	2020-01-31 06:01:28 -08:00
Jay Foad	31e29d4afe	AMDGPU/GlobalISel: Make use of MachineIRBuilder helper functions. NFC.	2020-01-31 13:53:39 +00:00
Matt Arsenault	6a4acb9d80	Revert "AMDGPU: Cleanup and fix SMRD offset handling" This reverts commit `17dbc6611d`. A test is failing on some bots	2020-01-30 15:39:51 -08:00
Matt Arsenault	17dbc6611d	AMDGPU: Cleanup and fix SMRD offset handling I believe this also fixes bugs with CI 32-bit handling, which was incorrectly skipping offsets that look like signed 32-bit values. Also validate the offsets are dword aligned before folding.	2020-01-30 15:04:21 -08:00
Matt Arsenault	f7521dc292	AMDGPU: Replace subtarget check with an assert This is already checked by the pattern subtarget predicate.	2020-01-30 14:15:26 -08:00
Matt Arsenault	97a1d4bc02	AMDGPU: Don't use separate cache arguments for s_buffer_load node There's not much value to this separate node from the intrinsic. Make the operand structure the same as the intrinsic, so we can reuse the same pattern for GlobalISel.	2020-01-30 14:15:26 -08:00
hsmahesha	1d9e08ec35	[AMDGPU] Add file headers for few files where it is missing. Summary: Added file headers for files which implement iterative lightweight scheduling strategies. Which is basically an exercise which I undertook in order to get used to LLVM development process. Reviewers: arsenm, vpykhtin, cdevadas Reviewed By: vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73417	2020-01-31 02:06:41 +05:30
Matt Arsenault	d6b83d6ba5	AMDGPU/GlobalISel: Don't use pointless getConstantVRegVal This is always a G_CONSTANT already	2020-01-30 09:38:43 -05:00
Matt Arsenault	ea956685a1	GlobalISel: Implement s32->s64 G_FPTOSI lowering Port directly from DAG version. The lowering for G_FPTOUI used to fail on AMDGPU because it uses G_FPTOSI.	2020-01-30 08:47:07 -05:00
Matt Arsenault	b21571f4d5	AMDGPU/GlobalISel: Handle s64->s64 G_FPTOSI/G_FPTOUI	2020-01-30 08:46:37 -05:00
Matt Arsenault	8184176efd	AMDGPU/GlobalISel: Custom lower G_LOG/G_LOG10 I'm pretty sure this is wrong and we should expand these in a correct way, but this matches the existing behavior.	2020-01-30 08:38:50 -05:00
Matt Arsenault	872e899b75	AMDGPU/GlobalISel: Legalize unpacked d16 image operations On targets that don't have the normal packed f16 layout, handle these during legalization. Directly modify the register types. We can infer this was a d16 load based on the mem operand size during selection. A16 operands should possibly be handled here as well, but don't worry about that yet.	2020-01-30 08:36:11 -05:00
Matt Arsenault	d21182d692	AMDGPU/GlobalISel: Only map VOP operands to VGPRs This trivially avoids violating the constant bus restriction. Previously this was allowing one SGPR in the first source operand, which technically also avoided violating this for most operations (but not for special cases reading vcc). We do need to write some new, smarter operand folds to pick the optimal SGPR to use in some kind of post-isel fold, but that's purely an optimization. I was originally thinking we would pick which operands should be SGPRs in RegBankSelect, but I think this isn't really manageable. There would be additional complexity to handle every G_* instruction, and then any nontrivial instruction patterns would need to know when to avoid violating it, which is likely to be very error prone. I think having all inputs being canonically copies to VGPRs will simplify the operand folding logic. The current folding we do is backwards, and only considers one operand at a time, relative to operands it already has. It therefore poorly handles the case where there is already a constant bus operand user. If all operands are copies, it's somewhat simpler to consider all input operands at once to choose the optimal constant bus user. Since the failure mode for constant bus violations is now a verifier error and not an selection failure, this moves towards a place where we can turn on the fallback mode. The SGPR copy folding optimizations can be left for later.	2020-01-30 08:32:35 -05:00
Matt Arsenault	b4a0766c8d	AMDGPU/GlobalISel: Select llvm.amdgcn.buffer.atomic.cmpswap	2020-01-30 08:22:43 -05:00
Connor Abbott	ce06d50756	AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns Summary: The code was assuming in a few places that if there was only one exit from the function that it was a normal return, which is invalid. It could be an infinite loop, in which case we still need to insert the usual fake edge so that the null export happens. This fixes shaders that end with an infinite loop that discards. Reviewers: arsenm, nhaehnle, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71192	2020-01-30 10:55:02 +01:00
Matt Arsenault	c5fffa4da3	GlobalISel: Add observer argument to legalizeIntrinsic This is passed to legalizeCustom, but not intrinsic. Also remove the MRI argument, since you can get that from the MachineIRBuilder. I'm not sure why MachineIRBuilder has a private observer member, and this is passed separately.	2020-01-29 18:33:45 -05:00
Matt Arsenault	7f3280ecdd	AMDGPU/GlobalISel: Select permlane16/permlanex16	2020-01-29 17:55:31 -05:00
Huihui Zhang	af620fc36a	Revert "[AMDGPU] Fix data race on RegisterBank initialization." There looks to be buildbot failure related. This reverts commit `8bb6c8a22a`.	2020-01-29 11:16:27 -08:00
Austin Kerbow	2605adb69c	[AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73585	2020-01-29 10:42:12 -08:00
Huihui Zhang	8bb6c8a22a	[AMDGPU] Fix data race on RegisterBank initialization. Summary: The initialization of RegisterBank needs to be done only once. The logic of AlreadyInit has data race, use llvm::call_once instead. This is continuing work of D73587. Reviewers: arsenm, tstellar, ronlieb, efriedma, apazos, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73604	2020-01-29 10:14:40 -08:00
Jay Foad	d07a789579	[AMDGPU] Cluster FLAT instructions with both vaddr and saddr Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73634	2020-01-29 17:01:35 +00:00
Matt Arsenault	62129878a6	AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using tablegen selected add/sub, which switch to the signed version of the opcode. This matches the current DAG behavior. We can't drop the manual selection for add/sub yet, because it's still both for VALU add/sub and for G_PTR_ADD.	2020-01-29 08:55:54 -08:00
Matt Arsenault	68b102b97a	AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16 Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both instructions in the output need it. Neither considers that both output instructions depend on m0. I don't know of any other pattern we need to handle this case, so it's less effort to just workaround this for now.	2020-01-29 08:24:31 -08:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Connor Abbott	87d98c1495	AMDGPU: Fix handling of infinite loops in fragment shaders Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781	2020-01-29 17:13:25 +01:00
Matt Arsenault	94e8ef4d4c	AMDGPU/GlobalISel: Look through copies for source modifiers When all VOP instructions are legalized to VGPRs, any SGPR source modifiers will have a copy in the way.	2020-01-29 08:08:13 -08:00
Stanislav Mekhanoshin	c2ad7ee1a9	[AMDGPU] override isHighLatencyDef SIMachineScheduler uses isHighLatencyInstruction with the same sematincs, but TargetInstrInfo has virtual isHighLatencyDef method, so override it instead. Added FLAT to the list of high latency opcodes and a check for mayLoad since stores are not technically high latency in terms of data dependency. This change did not produce any visible impact on our tests. Differential Revision: https://reviews.llvm.org/D73582	2020-01-29 08:01:29 -08:00
Connor Abbott	08b205bb48	Revert "AMDGPU: Fix handling of infinite loops in fragment shaders" This reverts commit `0994c485e6`.	2020-01-29 16:14:52 +01:00
Connor Abbott	13ab22ab22	Revert "AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns" This reverts commit `323bfde20c`.	2020-01-29 16:14:49 +01:00
Matt Arsenault	02adfb5155	AMDGPU/GlobalISel: Manually select scalar f64 G_FNEG This should be no problem to support with a pattern, but it turns out there are just too many yaks to shave. The main problem is in the DAG emitter, which I have no desire to sink effort into fixing. If we had a bit to disable patterns in the DAG importer, fixing the GlobalISelEmitter is more manageable.	2020-01-29 06:49:16 -08:00
Connor Abbott	323bfde20c	AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns Summary: The code was assuming in a few places that if there was only one exit from the function that it was a normal return, which is invalid. It could be an infinite loop, in which case we still need to insert the usual fake edge so that the null export happens. This fixes shaders that end with an infinite loop that discards. Reviewers: arsenm, nhaehnle, critson Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71192	2020-01-29 15:08:46 +01:00
Connor Abbott	0994c485e6	AMDGPU: Fix handling of infinite loops in fragment shaders Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781	2020-01-29 15:08:46 +01:00
Jay Foad	ad08c01d6c	[AMDGPU] Simplify DS and SM cases in getMemOperandsWithOffset Summary: This removes a couple of unnecessary isReg checks, now that memOpsHaveSameBasePtr can handle FI operands, but is otherwise NFC. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73485	2020-01-29 09:43:24 +00:00
Benjamin Kramer	adcd026838	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Jay Foad	4a331beadc	[AMDGPU] Fix vccz after v_readlane/v_readfirstlane to vcc_lo/hi Summary: Up to gfx9, writes to vcc_lo and vcc_hi by instructions like v_readlane and v_readfirstlane do not update vccz to reflect the new value of vcc. Fix it by reusing part of the existing vccz bug handling code, which inserts an "s_mov_b64 vcc, vcc" instruction to restore vccz just before an instruction that needs the correct value. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69661	2020-01-28 10:52:17 +00:00
Matt Arsenault	d2a9739274	AMDGPU/GlobalISel: Eliminate SelectVOP3Mods_f32 Trivial type predicates should be moved into the tablegen pattern itself, and not checked inside complex patterns. This eliminates a redundant complex pattern, and fixes select source modifiers for GlobalISel. I have further patches which fully handle select in tablegen and remove all of the C++ selection, although it requires the ugliness to support the entire range of legal register types.	2020-01-27 17:53:54 -05:00
Matt Arsenault	c3075e6171	AMDGPU/GlobalISel: Select buffer atomics The cmpswap handling is incomplete and fails to select.	2020-01-27 15:16:44 -05:00
Matt Arsenault	0eb62d5b3f	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.store	2020-01-27 15:16:21 -05:00
Matt Arsenault	a69c26a927	AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.store[.format]	2020-01-27 15:00:21 -05:00
Matt Arsenault	533d650e94	AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling Treat this the same way as loads. There's less value to the intermediate nodes, but it's good to be consistent.	2020-01-27 14:59:30 -05:00
Matt Arsenault	75d66f8434	AMDGPU/GlobalISel: Select llvm.amdcn.struct.tbuffer.load	2020-01-27 14:42:04 -05:00
Matt Arsenault	09ed0e44d9	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load	2020-01-27 13:40:37 -05:00
Stanislav Mekhanoshin	53eb0f8c07	[AMDGPU] Attempt to reschedule withou clustering We want to have more load/store clustering but we also want to maintain low register pressure which are oposit targets. Allow scheduler to reschedule regions without mutations applied if we hit a register limit. Differential Revision: https://reviews.llvm.org/D73386	2020-01-27 10:27:16 -08:00
Matt Arsenault	97711228fd	AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load.format	2020-01-27 13:23:35 -05:00
Matt Arsenault	ce7ca2caf2	AMDGPU/GlobalISel: Select llvm.amdgcn.struct.buffer.load	2020-01-27 13:05:55 -05:00
Matt Arsenault	198624c39d	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format	2020-01-27 13:02:19 -05:00
Matt Arsenault	fc90222a91	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load Use intermediate instructions, unlike with buffer stores. This is necessary because of the need to have an internal way to distinguish between signed and unsigned extloads. This introduces some duplication and near duplication with the buffer store selection path. The store handling should maybe be moved into legalization to match and eliminate the duplication.	2020-01-27 12:49:23 -05:00
Matt Arsenault	e60d658260	AMDGPU/GlobalISel: Handle VOP3NoMods	2020-01-27 09:03:44 -08:00
Matt Arsenault	0968234590	AMDGPU/GlobalISel: Minor refactor of MUBUF complex patterns This will make it easier to support the small variants in the complex patterns for atomics.	2020-01-27 09:00:00 -08:00
Matt Arsenault	bef27175c7	AMDGPU: Fix not using f16 fsin/fcos I noticed this because this accidentally started working for GlobalISel.	2020-01-27 08:59:59 -08:00
Matt Arsenault	a1d33ce73a	AMDGPU/GlobalISel: Custom legalize v2s16 G_SHUFFLE_VECTOR Try to keep simple v2s16 cases as-is. This will more naturally map to how the VOP3P op_sel modifiers work compared to the expansion involving bitcasts and bitshifts. This could maybe try harder with wider source vector types, although that could be handled with a pre-legalize combine.	2020-01-27 08:28:05 -08:00
Matt Arsenault	4e69df091d	Revert "AMDGPU: Temporary drop s_mul_hi_i/u32 patterns" This reverts commit `fe23ed2c68`. It was never really clear this was responsible for the performance regressions that caused this to be reverted. It's been a long time, and we need to have scalar patterns for this to get GlobalISel working.	2020-01-27 08:07:21 -08:00
Matt Arsenault	bc3d900fa5	AMDGPU/GlobalISel: Fix not using global atomics on gfx9+ For some reason the flat/global atomics end up in the generated matcher table in a different order from SelectionDAG. Use AddedComplexity to prefer checking for global atomics first.	2020-01-27 07:42:42 -08:00
Matt Arsenault	ac0b9b4ccf	AMDPGPU/GlobalISel: Select more MUBUF global addressing modes The handling of the high bits of the resource descriptor seem weird to me, where the 3rd dword changes based on the instruction.	2020-01-27 07:28:36 -08:00
Matt Arsenault	fdaad485e6	AMDGPU/GlobalISel: Initial selection of MUBUF addr64 load/store Fixes the main reason for compile failures on SI, but doesn't really try to use the addressing modes yet.	2020-01-27 07:13:56 -08:00
Matt Arsenault	2214bc81d0	AMDGPU: Allow i16 shader arguments Not allowing this just creates unnecessary complications when writing simple tests.	2020-01-27 06:55:32 -08:00
Jay Foad	1bf00219fc	[AMDGPU] Handle multiple base operands in areMemAccessesTriviallyDisjoint Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73455. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73456	2020-01-27 14:45:21 +00:00
Jay Foad	6461eadf8f	[AMDGPU] Handle multiple base operands in shouldClusterMemOps Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Depends on D73454. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73455	2020-01-27 14:45:21 +00:00
Jay Foad	fcf5254fa7	[AMDGPU] Handle frame index base operands in memOpsHaveSameBasePtr Summary: This is in preparation for getMemOperandsWithOffset returning more base operands. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73454	2020-01-27 14:45:21 +00:00
vpykhtin	4332f1a4c8	[AMDGPU] Fix GCN regpressure trackers for INLINEASM instructions. Differential revision: https://reviews.llvm.org/D73338	2020-01-27 17:25:25 +03:00
Matt Arsenault	2a160ba5b0	GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results Only use shifts if the requested type exactly matches the source type, and create sub-unmerges otherwise.	2020-01-27 06:18:26 -08:00
Maheaha Shivamallappa	66f93071cd	AMDGPU/GlobalISel: Clean-up code around ISel for Intrinsics. Summary: A minor code clean-up around ISel for intrinsic llvm.amdgcn.end.cf() Reviewers: arsenm, mshivama Reviewed By: arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D73358	2020-01-26 14:09:31 +05:30
Tom Stellard	cb297050bb	AMDGPU/SILoadStoreOptimizer: Fix uninitialized variable error This was introduced by `86c944d790` and caught by the sanitizer-x86_64-linux-fast bot.	2020-01-24 21:53:05 -08:00
Tom Stellard	86c944d790	AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets Summary: This improves merging of sequences like: store a, ptr + 4 store b, ptr + 8 store c, ptr + 12 store d, ptr + 16 store e, ptr + 20 store f, ptr Prior to this patch the basic block was scanned in order to find instructions to merge and the above sequence would be transformed to: store4 <a, b, c, d>, ptr + 4 store e, ptr + 20 store r, ptr With this change, we now sort all the candidate merge instructions by their offset, so instructions are visited in offset order rather than in the order they appear in the basic block. We now transform this sequnce into: store4 <f, a, b, c>, ptr store2 <d, e>, ptr + 16 Another benefit of this change is that since we have sorted the mergeable lists by offset, we can easily check if an instruction is mergeable by checking the offset of the instruction that becomes before or after it in the sorted list. Once we determine an instruction is not mergeable we can remove it from the list and avoid having to do the more expensive mergeablilty checks. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: arsenm, nhaehnle Subscribers: kerbowa, merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65966	2020-01-24 19:45:56 -08:00
Matt Arsenault	3b93945587	AMDGPU/GlobalISel: Select wqm, softwqm and wwm intrinsics	2020-01-24 13:06:44 -08:00
Matt Arsenault	87c46a3129	AMDGPU: Don't error on ds.ordered intrinsic in function These should be assumed to be called from a compute context. Also don't use a 2 entry switch over constants.	2020-01-24 13:06:44 -08:00
Stanislav Mekhanoshin	be8e38cbd9	Correct NumLoads in clustering Scheduler sends NumLoads argument into shouldClusterMemOps() one less the actual cluster length. So for 2 instructions it will pass just 1. Correct this number. This is NFC for in tree targets. Differential Revision: https://reviews.llvm.org/D73292	2020-01-24 12:45:28 -08:00
Matt Arsenault	84e035d8f1	AMDGPU: Don't check constant address space for atomic stores We define a separate list for storable address spaces. This saves entry in the matcher table address space list.	2020-01-24 12:15:09 -08:00
Stanislav Mekhanoshin	555d8f4ef5	[AMDGPU] Bundle loads before post-RA scheduler We are relying on atrificial DAG edges inserted by the MemOpClusterMutation to keep loads and stores together in the post-RA scheduler. This does not work all the time since it allows to schedule a completely independent instruction in the middle of the cluster. Removed the DAG mutation and added pass to bundle already clustered instructions. These bundles are unpacked before the memory legalizer because it does not work with bundles but also because it allows to insert waitcounts in the middle of a store cluster. Removing artificial edges also allows a more relaxed scheduling. Differential Revision: https://reviews.llvm.org/D72737	2020-01-24 11:33:38 -08:00
Stanislav Mekhanoshin	44b865fa7f	[AMDGPU] Allow narrowing muti-dword loads Currently BE allows only a little load narrowing because of the fear it will produce sub-dword ext loads. However, we can always allow narrowing if we are shrinking one multi-dword load to another multi-dword load. In particular we were unable to reduce s_load_dwordx8 into s_load_dwordx4 if identity shuffle was used to extract low 4 dwords. Differential Revision: https://reviews.llvm.org/D73133	2020-01-24 11:03:41 -08:00
Austin Kerbow	c226646337	Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Resubmit with test updates since GPUDA was causing failures on Windows. Reviewers: rampitec, nhaehnle, arsenm, thakis Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73315	2020-01-24 10:39:40 -08:00
Guillaume Chatelet	805c157e8a	[Alignment][NFC] Deprecate Align::None() Summary: This is a follow up on https://reviews.llvm.org/D71473#inline-647262. There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case. Reviewers: xbolva00, courbet, bollu Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73099	2020-01-24 12:53:58 +01:00
Changpeng Fang	2531535984	AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare Summary: RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not meet the requirement. However, in DAG lowering, fpmath information gets lost, and thus we may generate either inaccurate rcp related computation or slow code for fdiv. In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could exactly know !fpmath. FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on unsafe-fp-math, fast math flags, denormals and fpmath accuracy request. RCP Optimizations: 1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with denormals flushed. a/b -> a*rcp(b) when fast unsafe rcp is legal. Use fdiv.fast: a/b -> fdiv.fast(a, b) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals flushed. 1/x -> fdiv.fast(1,x) when RCP optimization is not performed and fpmath >= 2.5ULP with denormals. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D71293	2020-01-23 16:57:43 -08:00
Matt Arsenault	86e5b56a7c	AMDGPU/GlobalISel: Fix RegBanKSelect for llvm.amdgcn.exp.compr This wasn't updated for the immarg handling change. We really need a verifier for this.	2020-01-23 13:30:46 -08:00
Matt Arsenault	fac9941e57	AMDGPU: Fix ubsan error Since register classes go up to 1024, 32 elements, all masks bits are needed and a 32-bit shift by 32 is illegal. We didn't have any instructions theoretically using a 32 element VGPR before `d1dbb5e471`	2020-01-23 15:05:47 -05:00
Matt Arsenault	618fa77ae4	AMDGPU/GlobalISel: Select V_ADD3_U32/V_XOR3_B32 The other 3-op patterns should also be theoretically handled, but currently there's a bug in the inferred pattern complexity. I'm not sure what the error handling strategy should be for potential constant bus violations. I think the correct strategy is to never produce mixed SGPR and VGPR operands in a typical VOP instruction, which will trivially avoid them. However, it's possible to still have hand written MIR (or erroneously transformed code) with these operands. When these fold, the restriction will be violated. We currently don't have any verifiers for reg bank legality. For now, just ignore the restriction. It might be worth triggering a DAG fallback on verifier error.	2020-01-23 12:04:20 -05:00
Guillaume Chatelet	59f95222d4	[Alignment][NFC] Use Align with CreateAlignedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73274	2020-01-23 17:34:32 +01:00
Matt Arsenault	dfec702290	AMDGPU: Check for other uses when looking through casted select Fixes mesa regression on ext_transform_feedback-max-varyings	2020-01-23 11:31:24 -05:00
Guillaume Chatelet	279fa8e006	[Alignement][NFC] Deprecate untyped CreateAlignedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73260	2020-01-23 13:34:32 +01:00
Matt Arsenault	4d14772f5c	AMDGPU/GlobalISel: Remove redundant or patterns These ended up with higher priority than or3 patterns in a future patch. This also fixes the using VOP2 forms.	2020-01-22 21:45:51 -05:00
Jan Vesely	1b8eab179d	AMDGPU/R600: Emit rodata in text segment R600 relies on this behaviour. Fixes: `6e18266aa4` ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since `6e18266` Differential Revision: https://reviews.llvm.org/D72991	2020-01-22 14:31:51 -05:00
Nico Weber	cd470717d1	Revert "[DA][TTI][AMDGPU] Add option to select GPUDA with TTI" This reverts commit `a90a6502ab`. Broke tests on Windows: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/13808	2020-01-22 12:56:19 -05:00
Matt Arsenault	1192d7b254	AMDGPU/GlobalISel: Handle 16-bank LDS llvm.amdgcn.interp.p1.f16 The pattern is also mishandled by the generated matcher, so workaround this as in the DAG path. The existing DAG tests aren't particularly targeted to just this one intrinsic. These also end up differing in scheduling from SGPR->VGPR operand constraint copies.	2020-01-22 12:10:59 -05:00
Matt Arsenault	c05f23e409	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp This is deprecated, but easy to support.	2020-01-22 11:43:53 -05:00
Matt Arsenault	dd09ec1208	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp8	2020-01-22 11:43:40 -05:00
Matt Arsenault	0bf434ccd5	AMDGPU: Fix element size assertion The GlobalISel usage called this with bits, but the DAG usage was incorrectly using bytes.	2020-01-22 11:18:45 -05:00
Matt Arsenault	bb562d1af0	AMDGPU/GlobalISel: Keep G_BITCAST out of waterfall loop The waterfall utility function blindly inserts a phi for every def in the loop. We don't need this one to be preserved for every iteration. Saves an extra phi and copy inside the loop body.	2020-01-22 11:16:19 -05:00
Matt Arsenault	52ec7379ad	AMDGPU/GlobalISel: Fold add of constant into G_INSERT_VECTOR_ELT Move the subregister base like in the extract case.	2020-01-22 11:09:15 -05:00
Matt Arsenault	d1dbb5e471	AMDGPU/GlobalISel: Select G_INSERT_VECTOR_ELT	2020-01-22 11:00:49 -05:00
Matt Arsenault	3524d4412c	AMDGPU/GlobalISel: Fix RegBankSelect for G_INSERT_VECTOR_ELT The result and source vector are going to be tied, so these need to be the same bank. The inserted value also needs to be broken down based on the result bank, not the inserted value itself.	2020-01-22 10:57:50 -05:00
Matt Arsenault	e3d352c541	AMDGPU/GlobalISel: Fold constant offset vector extract indexes Handle dynamic vector extracts that use an index that's an add of a constant offset into moving the base subregister of the indexing operation. Force the add into the loop in regbankselect, which will be recognized when selected.	2020-01-22 10:50:59 -05:00
Matt Arsenault	e93e1b621c	AMDGPU: Fix typo	2020-01-22 10:17:46 -05:00
Matt Arsenault	2fe500ab5b	AMDGPU: Look through casted selects to constant fold bin ops The promotion of the uniform select to i32 interfered with this fold.	2020-01-22 10:16:39 -05:00
Matt Arsenault	bcd91778fe	AMDGPU: Do binop of select of constant fold in AMDGPUCodeGenPrepare DAGCombiner does this, but divisions expanded here miss this optimization. Since `67aa18f165`, divisions have been expanded here and missed out on this optimization. Avoids test regressions in a future patch.	2020-01-22 10:16:39 -05:00
Matt Arsenault	a174f0da62	AMDGPU/GlobalISel: Add pre-legalize combiner pass Just copy the AArch64 pass as-is for now, except for removing the memcpy handling.	2020-01-22 10:16:39 -05:00
Jay Foad	e0f0d0e55c	[MachineScheduler] Allow clustering mem ops with complex addresses The generic BaseMemOpClusterMutation calls into TargetInstrInfo to analyze the address of each load/store instruction, and again to decide whether two instructions should be clustered. Previously this had to represent each address as a single base operand plus a constant byte offset. This patch extends it to support any number of base operands. The old target hook getMemOperandWithOffset is now a convenience function for callers that are only prepared to handle a single base operand. It calls the new more general target hook getMemOperandsWithOffset. The only requirements for the base operands returned by getMemOperandsWithOffset are: - they can be sorted by MemOpInfo::Compare, such that clusterable ops get sorted next to each other, and - shouldClusterMemOps knows what they mean. One simple follow-on is to enable clustering of AMDGPU FLAT instructions with both vaddr and saddr (base register + offset register). I've left a FIXME in the code for this case. Differential Revision: https://reviews.llvm.org/D71655	2020-01-22 14:28:24 +00:00
Matt Arsenault	70096ca111	AMDGPU/GlobalISel: Fix RegbankSelect for llvm.amdgcn.fmul.legacy	2020-01-22 09:26:17 -05:00
Matt Arsenault	a722cbf77c	AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec The intermediate instruction drops the extra volatile argument. We are missing an atomic ordering on these.	2020-01-22 09:26:17 -05:00
Matt Arsenault	9c928649a0	AMDGPU: Fix interaction of tfe and d16 This using the wrong result register, and dropping the result entirely for v2f16. This would fail to select on the scalar case. I believe it was also mishandling packed/unpacked subtargets.	2020-01-22 09:26:17 -05:00
Matt Arsenault	b94d3b9b77	AMDGPU/GlobalISel: RegBankSelect interp intrinsics Note this assumes the future use of immediates for immarg, not the current G_CONSTANT which will be emitted.	2020-01-22 09:01:34 -05:00
Matt Arsenault	64e9528201	AMDGPU: Fix missing immarg on llvm.amdgcn.interp.mov The first operand maps to an immediate field, so this should be immarg.	2020-01-22 09:01:34 -05:00
Austin Kerbow	a90a6502ab	[DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Reviewers: rampitec, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73049	2020-01-21 21:13:20 -08:00
Carl Ritson	6b4b3e2856	[AMDGPU] SIRemoveShortExecBranches should not remove branches exiting loops Summary: Check that a s_cbranch_execz is not a loop exit before removing it. As the pass is generating infinite loops. Reviewers: cdevadas, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits, dstuttard, foad Tags: #llvm Differential Revision: https://reviews.llvm.org/D72997	2020-01-22 13:18:40 +09:00
cdevadas	e53a9d96e6	Resubmit: [AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-22 13:18:32 +09:00
Amara Emerson	67a8775322	[AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible. In GlobalISel we may in some unfortunate circumstances generate PHIs with operands that are on separate banks. If-conversion doesn't currently check for that case and ends up generating a CSEL on AArch64 with incorrect register operands. Differential Revision: https://reviews.llvm.org/D72961	2020-01-21 16:51:31 -08:00
Matt Arsenault	e47965bf64	AMDGPU/GlobalISel: Merge trivial legalize rules Also move constant-like rules together	2020-01-21 17:37:19 -05:00
Matt Arsenault	9a5a6e9465	AMDGPU/GlobalISel: Merge G_PTR_ADD/G_PTR_MASK rules	2020-01-21 16:57:01 -05:00
Matt Arsenault	fd109308a7	AMDGPU/GlobalISel: Legalize G_PTR_ADD for arbitrary pointers Pointers of unrecognized address spaces shoudl be treated as global-like pointers. Even if loads and stores of them aren't handled, dumb operations that just operate on the bits should work.	2020-01-21 16:35:36 -05:00
Krzysztof Parzyszek	020041d99b	Update spelling of {analyze,insert,remove}Branch in strings and comments These names have been changed from CamelCase to camelCase, but there were many places (comments mostly) that still used the old names. This change is NFC.	2020-01-21 10:15:38 -06:00
Nicolai Hähnle	a80291ce10	Revert "[AMDGPU] Invert the handling of skip insertion." This reverts commit `0dc6c249bf`. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.	2020-01-21 09:17:25 +01:00
Fangrui Song	5721483b64	[AMDGPU] Fix -Wunused-variable after `e5823bf806`	2020-01-20 22:41:13 -08:00
Matt Arsenault	c72aa27f91	AMDDGPU/GlobalISel: Fix RegBankSelect for llvm.amdgcn.ps.live	2020-01-20 23:21:53 -05:00
Matt Arsenault	e5823bf806	AMDGPU: Don't create weird sized integers There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future patch.	2020-01-20 20:02:54 -05:00
Matt Arsenault	9b13b4a0e3	AMDGPU: Prepare to use scalar register indexing Define pseudos mirroring the the VGPR indexing ones, and adjust the operands in the s_movrel* instructions to avoid the result def.	2020-01-20 17:19:16 -05:00
Matt Arsenault	8615eeb455	AMDGPU: Partially merge indirect register write handling `a785209bc2` switched to using a pseudos instead of manually tying operands on the regular instruction. The VGPR indexing mode path should have the same problems that change attempted to avoid, so these should use the same strategy. Use a single pseudo for the VGPR indexing mode and movreld paths, and expand it based on the subtarget later. These have essentially the same constraints, reading the index from m0. Switch from using an offset to the subregister index directly, instead of computing an offset and re-adding it back. Also add missing pseudos for existing register class sizes.	2020-01-20 17:19:16 -05:00
Matt Arsenault	f6418d72f5	AMDGPU/GlobalISel: Add documentation for RegisterBankInfo Document some high level strategies that should be used for register bank selection. The constant bus restriction section hasn't actually been implemented yet.	2020-01-20 15:41:25 -05:00
Fangrui Song	8e8a75ad50	[TargetRegisterInfo] Default trackLivenessAfterRegAlloc() to true Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have problems), every target overrides it with true. PostMachineScheduler requires livein information. Not providing it can cause assertion failures in ScheduleDAGInstrs::addSchedBarrierDeps().	2020-01-19 14:20:37 -08:00
Michael Liao	6d0d86a64d	[DAG] Add helper for creating constant vector index with correct type. NFC.	2020-01-18 01:23:36 -05:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	ec9628318d	AMDGPU/GlobalISel: Select DS append/consume	2020-01-17 20:09:53 -05:00
Stanislav Mekhanoshin	eebdd85e7d	[AMDGPU] allow multi-dword flat scratch access since GFX9 This is supported starting with GFX9. Differential Revision: https://reviews.llvm.org/D72865	2020-01-17 10:47:03 -08:00
Matt Arsenault	886f9071c6	AMDGPU: Don't assert on a16 images on targets without FeatureR128A16 Currently the lowering for i16 image coordinates asserts on gfx10. I'm somewhat confused by this though. The feature is missing from the gfx10 feature lists, but the a16 bit appears to be present in the manual for MIMG instructions.	2020-01-17 11:07:00 -05:00
Matt Arsenault	117d4f1900	AMDGPU: Add register classes to MUBUF load patterns	2020-01-16 22:00:44 -05:00
Matt Arsenault	91e758b732	AMDGPU: Move permlane discard vdst_in optimization This case can be handled as a regular selection pattern, so move it out of the weird post-isel folding code which doesn't have an exactly equivalent place in GlobalISel. I think it doesn't make much sense to do this optimization here though, and it would be more useful in instcombine. There's not really any new information that will be gained during lowering since these inputs were known from the beginning.	2020-01-16 17:27:53 -05:00
Matt Arsenault	f5d98543b8	AMDGPU: Remove outdated comment	2020-01-16 14:54:27 -05:00
Matt Arsenault	e12b840abf	AMDGPU/GlobalISel: Improve lowering of G_SEXT_INREG Clamping the scalar is much better than lowering with superwide shifts for types > s64.	2020-01-16 14:29:37 -05:00
Matt Arsenault	4ca1ad85b7	AMDGPU/GlobalISel: Don't handle legacy buffer intrinsic	2020-01-16 11:31:12 -05:00
Matt Arsenault	9b2f3532c7	AMDGPU/GlobalISel: Select DS GWS intrinsics	2020-01-16 11:25:10 -05:00
Matt Arsenault	711a17afaf	AMDGPU/GlobalISel: Select exp with patterns This does produce slightly different code. Now a unique IMPLICIT_DEF is emitted for each of the implicit_def operands, rather than reusing the same one.	2020-01-15 18:33:15 -05:00
Matt Arsenault	eef92f25cc	AMDGPU: Remove custom node for exports I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic. Requires hacking out the illegal type on SI, so manually select that case during lowering.	2020-01-15 18:33:15 -05:00
Mircea Trofin	5466597fee	[NFC] Refactor InlineResult for readability Summary: InlineResult is used both in APIs assessing whether a call site is inlinable (e.g. llvm::isInlineViable) as well as in the function inlining utility (llvm::InlineFunction). It means slightly different things (can/should inlining happen, vs did it happen), and the implicit casting may introduce ambiguity (casting from 'false' in InlineFunction will default a message about hight costs, which is incorrect here). The change renames the type to a more generic name, and disables implicit constructors. Reviewers: eraman, davidxl Reviewed By: davidxl Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72744	2020-01-15 13:34:20 -08:00
Matt Arsenault	936483fb7d	GlobalISel: Implement lower for G_BITCAST Bitcast only really applies between scalars and vectors. Implement as an unmerge and remerge. The test needs to tolerate failure since one of the unmerges currently fails to legalize.	2020-01-15 08:58:58 -05:00
Matt Arsenault	bd7658a212	AMDGPU: Partially directly select llvm.amdgcn.interp.p1.f16 The 16 bank LDS case is complicated due to using multiple instructions. If I attempt to write a pattern for it, the generated selector incorrectly places the copy to m0 after the first instruction, so that needs to be separately addressed. Also fix not gluing the copy to m0 to the second operation in the second half of the 16 bank lowering.	2020-01-15 08:58:58 -05:00
cdevadas	0dc6c249bf	[AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-15 15:18:16 +05:30
Tom Stellard	0dbcb36394	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439	2020-01-14 19:46:52 -08:00
Michael Liao	01a4b83154	[codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU. Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709	2020-01-14 19:26:15 -05:00
Stanislav Mekhanoshin	ad741853c3	[AMDGPU] Model distance to instruction in bundle This change allows to model the height of the instruction within a bundle for latency adjustment purposes. Differential Revision: https://reviews.llvm.org/D72669	2020-01-14 01:18:59 -08:00
Stanislav Mekhanoshin	eca4474587	[AMDGPU] Fix getInstrLatency() always returning 1 We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655	2020-01-14 01:08:30 -08:00
Matt Arsenault	203801425d	AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add\|swap}	2020-01-13 13:09:38 -05:00
Matt Arsenault	3d8f1b2d22	AMDGPU/GlobalISel: Set insert point after waterfall loop The current users of the waterfall loop utility functions do not make use of the restored original insert point. The insertion is either done, or they set the insert point somewhere else. A future change will want to insert instructions after the waterfall loop, but figuring out the point after the loop is more difficult than ensuring the insert point is there after the loop.	2020-01-13 12:51:05 -05:00
Matt Arsenault	ca19d7a399	AMDGPU/GlobalISel: Fix branch targets when emitting SI_IF The branch target needs to be changed depending on whether there is an unconditional branch or not. Loops also need to be similarly fixed, but compiling a simple testcase end to end requires another set of patches that aren't upstream yet.	2020-01-13 12:51:05 -05:00
Matt Arsenault	7d9b0a61c3	AMDGPU/GlobalISel: Simplify assert	2020-01-13 12:51:05 -05:00
Matt Arsenault	555e7ee04c	AMDGPU/GlobalISel: Don't use XEXEC class for SGPRs We don't use the xexec register classes for arbitrary values anymore. Avoids a test variance beween GlobalISel and SelectionDAG>	2020-01-12 22:44:51 -05:00
Matt Arsenault	a10527cd37	AMDGPU/GlobalISel: Copy type when inserting readfirstlane getDefIgnoringCopies will fail to find any def if no type is set if we try to use it on the use's operand, so propagate the type.	2020-01-12 22:44:51 -05:00
Fangrui Song	6fdd6a7b3f	[Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction() The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.	2020-01-11 13:34:52 -08:00
Michael Bedy	4a32cd11ac	[AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering. Summary: - SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions. While this is necessary for the special handling of moving results out of WWM live ranges, it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every WQM operation. This change uses a COPY psuedo in these cases, which allows the register allocator to coalesce the moves away. Reviewers: tpr, dstuttard, foad, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71386	2020-01-10 23:01:19 -05:00
Stanislav Mekhanoshin	987bf8b6c1	Let targets adjust operand latency of bundles This reverts the AMDGPU DAG mutation implemented in D72487 and gives a more general way of adjusting BUNDLE operand latency. It also replaces FixBundleLatencyMutation with adjustSchedDependency callback in the AMDGPU, fixing not only successor latencies but predecessors' as well. Differential Revision: https://reviews.llvm.org/D72535	2020-01-10 14:56:53 -08:00
Matt Arsenault	bac995d978	AMDGPU/GlobalISel: Clamp G_ZEXT source sizes Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.	2020-01-10 09:42:49 -05:00
Matt Arsenault	35c3d101ae	AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.	2020-01-09 19:52:24 -05:00
Matt Arsenault	5cabb8357a	AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v case If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.	2020-01-09 19:46:54 -05:00
Stanislav Mekhanoshin	cd69e4c74c	[AMDGPU] Fix bundle scheduling Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487	2020-01-09 15:56:36 -08:00
Matt Arsenault	b4a647449f	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.	2020-01-09 17:37:52 -05:00
Matt Arsenault	0ea3c7291f	GlobalISel: Handle llvm.read_register Compared to the attempt in `bdcc6d3d26`, this uses intermediate generic instructions.	2020-01-09 17:37:52 -05:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Matt Arsenault	767aa507a4	AMDGPU/GlobalISel: Fix argument lowering for vectors of pointers When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.	2020-01-09 16:29:44 -05:00
Matt Arsenault	35ad66fae8	AMDGPU/GlobalISel: Widen 16-bit shift amount sources This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.	2020-01-09 16:29:44 -05:00
Matt Arsenault	9ffd0ed838	AMDGPU/GlobalISel: Fix import of integer med3 This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.	2020-01-09 10:29:32 -05:00
Matt Arsenault	c66b2e1c87	AMDGPU: Eliminate more legacy codepred address space PatFrags These should now be limited to R600 code.	2020-01-09 10:29:32 -05:00
Matt Arsenault	3766f4bacc	AMDGPU: Use new PatFrag system for d16 stores	2020-01-09 10:29:32 -05:00
Matt Arsenault	c1d4963b44	AMDGPU: Use new PatFrag system for d16 load nodes	2020-01-09 10:29:32 -05:00
Matt Arsenault	7d67742160	AMDGPU/GlobalISel: Fix import of zext of s16 op patterns	2020-01-09 10:29:32 -05:00
Matt Arsenault	e71af77568	AMDGPU/GlobalISel: Add IMMPopCount xform Partially fixes BFE pattern import.	2020-01-09 10:29:32 -05:00
Matt Arsenault	79450a4ea2	AMDGPU/GlobalISel: Add selectVOP3Mods_nnan This doesn't enable any new imports yet, but moves the fmed patterns from failing on this to hitting the "complex suboperand referenced more than once" limitation in tablegen.	2020-01-09 10:29:32 -05:00
Matt Arsenault	d964086c62	AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32 Only partially fixes one pattern import.	2020-01-09 10:29:31 -05:00
Matt Arsenault	3952748ffd	AMDGPU/GlobalISel: Fix add of neg inline constant pattern	2020-01-09 10:29:31 -05:00
Matt Arsenault	db7c920779	AMDGPU: Add register class to DS_SWIZZLE_B32 pattern Reduces diff for a future patch.	2020-01-09 10:29:31 -05:00
Ehud Katz	24b326cc61	[APFloat] Fix checked error assert failures `APFLoat::convertFromString` returns `Expected` result, which must be "checked" if the LLVM_ENABLE_ABI_BREAKING_CHECKS preprocessor flag is set. To mark an `Expected` result as "checked" we must consume the `Error` within. In many cases, we are only interested in knowing if an error occured, without the need to examine the error info. This is achieved, easily, with the `errorToBool()` API.	2020-01-09 09:42:32 +02:00
Michael Liao	07a569a053	[amdgpu] Remove unused header. NFC.	2020-01-08 11:32:09 -05:00
Matt Arsenault	22700f68e1	AMDGPU: Annotate EXTRACT_SUBREGs with source register classes This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.	2020-01-07 21:56:16 -05:00
Matt Arsenault	6652cc0cf7	AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointers `4e85ca9562` missed updating the legal condition type set for pointers with any unrecognized address space.	2020-01-07 16:36:31 -05:00
Matt Arsenault	4844bf0fe2	AMDGPU: Apply i16 add->sub pattern with zext to i32 This was only applying the deeper nested zext pattern, and missing the special case code size fold.	2020-01-07 16:36:31 -05:00
Matt Arsenault	c3a10faadc	AMDGPU: Remove VOP3Mods0Clamp0OMod Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.	2020-01-07 15:10:08 -05:00
Matt Arsenault	de46ab698b	AMDGPU: Fix misleading, misplaced end block comments	2020-01-07 15:10:08 -05:00
Matt Arsenault	bd8d696c14	AMDGPU: Use ImmLeaf	2020-01-07 15:10:07 -05:00
Matt Arsenault	68e70fb098	AMDGPU: Fix not using v_cvt_f16_[iu]16 We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.	2020-01-07 15:10:07 -05:00
Matt Arsenault	78b30a54c9	AMDGPU/GlobalISel: Fix readfirstlane pattern import The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.	2020-01-07 11:07:08 -05:00
Matt Arsenault	e699c03c9b	AMDGPU/GlobalISel: Fix import of s_abs_i32 pattern	2020-01-07 10:32:07 -05:00
Matt Arsenault	9150d6bd73	AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.vote	2020-01-07 10:15:29 -05:00
Matt Arsenault	a428386d4a	AMDGPU/GlobalISel: Partially fix llvm.amdgcn.kill pattern import Tests deferred since the existing DAG test depends on some other operations, but isn't far from working as-is.	2020-01-07 10:09:59 -05:00
Simon Pilgrim	6ff1ea3244	Fix "use of uninitialized variable" static analyzer warning. NFCI.	2020-01-07 12:06:54 +00:00
Fangrui Song	3d87d0b925	[MC] Add parameter `Address` to MCInstrPrinter::printInstruction Follow-up of D72172. Reviewed By: jhenderson, rnk Differential Revision: https://reviews.llvm.org/D72180	2020-01-06 20:44:14 -08:00
Fangrui Song	aa708763d3	[MC] Add parameter `Address` to MCInstPrinter::printInst printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172	2020-01-06 20:42:22 -08:00
Matt Arsenault	dc7b84c66c	AMDGPU/GlobalISel: Fix unused variable warning in release	2020-01-06 22:31:33 -05:00
Matt Arsenault	452f6243c9	AMDGPU: Select llvm.amdgcn.interp.p2.f16 directly This will enable automatic GlobalISel support in a future commit.	2020-01-06 20:34:21 -05:00
Matt Arsenault	e93b1ffc84	AMDGPU: Use default operands for clamp/omod We have a lot of complex pattern variants that just set the source modifiers that are really handled, and then set the output modifiers to 0. We're unlikely to ever match output modifiers from the use instruction side, and we already match clamp/omod in a separate pass.	2020-01-06 20:22:13 -05:00
Matt Arsenault	52afc93c38	AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTER	2020-01-06 19:16:32 -05:00
Matt Arsenault	d4c9e13324	AMDGPU/GlobalISel: Select G_UADDE/G_USUBE	2020-01-06 18:27:52 -05:00
Matt Arsenault	4e85ca9562	AMDGPU/GlobalISel: Replace handling of boolean values This solves selection failures with generated selection patterns, which would fail due to inferring the SGPR reg bank for virtual registers with a set register class instead of VCC bank. Use instruction selection would constrain the virtual register to a specific class, so when the def was selected later the bank no longer was set to VCC. Remove the SCC reg bank. SCC isn't directly addressable, so it requires copying from SCC to an allocatable 32-bit register during selection, so these might as well be treated as 32-bit SGPR values. Now any scalar boolean value that will produce an outupt in SCC should be widened during RegBankSelect to s32. Any s1 value should be a vector boolean during selection. This makes the vcc register bank unambiguous with a normal SGPR during selection. Summary of how this should now work: - G_TRUNC is always a no-op, and never should use a vcc bank result. - SALU boolean operations should be promoted to s32 in RegBankSelect apply mapping - An s1 value means vcc bank at selection. The exception is for legalization artifacts that use s1, which are never VCC. All other contexts should infer the VCC register classes for s1 typed registers. The LLT for the register is now needed to infer the correct register class. Extensions with vcc sources should be legalized to a select of constants during RegBankSelect. - Copy from non-vcc to vcc ensures high bits of the input value are cleared during selection. - SALU boolean inputs should ensure the inputs are 0/1. This includes select, conditional branches, and carry-ins. There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT selection ignores the usual register-bank from register class functions, and can't handle truncates with VCC result banks. I think this is OK, since the artifacts are specially treated anyway. This does require some care to avoid producing cases with vcc. There will also be no 100% reliable way to verify this rule is followed in selection in case of register classes, and violations manifests themselves as invalid copy instructions much later. Standard phi handling also only considers the bank of the result register, and doesn't insert copies to make the source banks match. This doesn't work for vcc, so we have to manually correct phi inputs in this case. We should add a verifier check to make sure there are no phis with mixed vcc and non-vcc register bank inputs. There's also some duplication with the LegalizerHelper, and some code which should live in the helper. I don't see a good way to share special knowledge about what types to use for intermediate operations depending on the bank for example. Using the helper to replace extensions with selects also seems somewhat awkward to me. Another issue is there are some contexts calling getRegBankFromRegClass that apparently don't have the LLT type for the register, but I haven't yet run into a real issue from this. This also introduces new unnecessary instructions in most cases, since we don't yet try to optimize out the zext when the source is known to come from a compare.	2020-01-06 18:26:42 -05:00
Matt Arsenault	f3de8ab5cc	GlobalISel: Implement lower for G_INTRINSIC_ROUND Mostly copied from AMDGPU lowering implementation, except used G_SITOFP instead of directly creating a select on -1.0, 0.0.	2020-01-06 18:26:42 -05:00
Matt Arsenault	ee6b8722ff	GlobalISel: Fix unsupported legalize action This would complain about invalid legalizer rules otherwise. Mark some operations as unsupported for AMDGPU. This currently seems to produce the same legalize error as when no rules are defined, but eventually this should produce a proper user facing error.	2020-01-06 17:21:51 -05:00
Matt Arsenault	7f2db2917d	AMDGPU: Fix legalizing f16 fpow The existing test only covered one case for r600. The use of mul_legacy also looks suspicious to me, but leave it for now. The patterns are also not making use of source modifiers.	2020-01-06 17:21:51 -05:00

... 4 5 6 7 8 ...

4705 Commits