llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	cc149172da	AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS This wasn't covered by existing tablegen patterns, but also suffers the same issues as G_FNEG. Workaround them by manually selecting, like G_FNEG.	2020-04-14 22:05:22 -04:00
Matt Arsenault	b27d255e1e	AMDGPU/GlobalISel: Form CVT_F32_UBYTE0	2020-03-30 17:45:55 -04:00
Matt Arsenault	9564f46766	AMDGPU: Make use of default operands	2020-03-28 17:33:29 -04:00
Matt Arsenault	a950e3beef	AMDGPU: Move towards deprecating alignbit intrinsic This is equivalent to llvm.fshr, so legalize the intrinsic to the generic node.	2020-03-20 11:03:04 -04:00
Matt Arsenault	4ea1baf6a0	AMDGPU: Initial, crude support for indirect calls This isn't really usable, and requires using the -amdgpu-fixed-function-abi flag to work. Assumes a uniform call target, and will hit a verifier error if the call target ends up in a VGPR. Also doesn't attempt to do anything sensible for the reported register/stack usage.	2020-03-18 12:03:48 -04:00
Matt Arsenault	c9b454a1b7	AMDGPU/GlobalISel: Fix verifier errors on image atomics	2020-03-17 20:06:25 -04:00
Matt Arsenault	83ffbf2618	AMDGPU/GlobalISel: Legalize non-a16 non-NSA images	2020-03-17 10:02:09 -04:00
Simon Pilgrim	e91feeed21	[AMDGPU] Add ISD::FSHR -> ALIGNBIT support This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions. This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default. Differential Revision: https://reviews.llvm.org/D76070	2020-03-12 20:16:57 +00:00
Matt Arsenault	200b20639a	AMDGPU: Use V_MAC_F32 for fmad.ftz This avoids regressions in a future patch. I'm confused by the use of the gfx9 usage legacy_mad. Was this a pointless instruction rename, or uses fmul_legacy handling? Why is regular mac avilable in that case?	2020-03-10 14:41:06 -07:00
Jay Foad	c7b2e7f527	[AMDGPU] Fix scheduling info for terminator SALU instructions Summary: Instruction variants like S_MOV_B32_term should have the same SchedRW class as the base instruction, S_MOV_B32. This probably doesn't make any difference in practice because as terminators, they'll always be scheduled at the end of a basic block, but it's simply more correct than giving them all the default SchedRW class of Write32Bit, which implies a VALU operation. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75860	2020-03-09 21:39:52 +00:00
Matt Arsenault	083717cf49	AMDGPU: Fix v2i64<->v4f32 bitcast I'm not sure how to test the v2i64->v4f32 case since I can't think of any v2i64 cases that won't legalize to v4i32.	2020-02-20 09:49:09 -05:00
Matt Arsenault	cbc3b3046f	AMDGPU/GlobalISel: Remove outdated comment	2020-02-19 17:32:25 -05:00
Matt Arsenault	9ec668606b	AMDGPU: Add option to disable CGP division expansion The division expansions in AMDGPUCodeGenPrepare can't be relied on for correctness, since they punt to later optimization and possibly legalization in some cases. We still need a way to be able to write tests for the legalizer versions of the expansion. This is mostly for GlobalISel, since the expected optimzations is expecting aren't implemented. The interaction with the flag to expand 64-bit division in the IR is pretty confusing, but these flags have different purposes.	2020-02-14 11:37:07 -08:00
Matt Arsenault	8c2c0b3637	AMDGPU: Improve i16/v2i16 bswap	2020-02-14 09:53:22 -08:00
Matt Arsenault	a257bde420	AMDGPU/GlobalISel: Handle G_BSWAP	2020-02-14 09:09:44 -08:00
Matt Arsenault	bfe3779459	AMDGPU: Use v_perm_b32 to implement bswap Also greatly improve i64 lowering. LegalizeIntegerTypes does the correct narrowing if i64 isn't legal. Just workaround this for SelectionDAG by making i64 legal and splitting in the patterns.	2020-02-13 09:45:31 -08:00
Jay Foad	9df0c264d4	[AMDGPU] Fix implicit operands for ENTER_WWM pseudo Summary: SIInstrInfo::expandPostRAPseudo converts ENTER_WWM in-place into an S_OR_SAVEEXEC instruction that needs certain implicit operands. Without this patch I get errors like this that make it harder to use -stop-after to bisect the pass pipeline: $ llc -march=amdgcn test/CodeGen/AMDGPU/wqm.ll -stop-after=postrapseudos -o - \| sed -E 's/ (from\|into) custom "TargetCustom[0-9]+"//' \| llc -march=amdgcn -x=mir error: <stdin>:1295:70: missing implicit register operand 'implicit-def $scc' renamable $sgpr2_sgpr3 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec ^ Note that this error is currently only generated by MIParser but it comes with a FIXME comment: // FIXME: Move the implicit operand verification to the machine verifier. Reviewers: critson, arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74428	2020-02-11 20:11:41 +00:00
Matt Arsenault	00115d767f	AMDGPU: Remove dead kill handling At one point a custom node was used for kill handling, but now the intrinsic is directly selected. Remove leftover pattern machinery.	2020-02-09 17:59:24 -05:00
Matt Arsenault	364326ce66	AMDGPU/GlobalISel: Add mem operand to s.buffer.load intrinsic Really the intrinsic definition is wrong, but work around this here. The DAG lowering introduces an MMO. We have to introduce a new operation to avoid the verifier complaining about the missing mayLoad.	2020-02-05 15:04:42 -05:00
Matt Arsenault	5aa6e246a1	AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI Use cmp ord instead of cmp_class compared to the DAG version for the nan check, but mostly try to match the existsing pattern. I think the sign doesn't matter for fract, so we could do a little better with the source modifier matching. I think this is also still broken as in D22898, but I'm leaving it as-is for now while I don't have an SI system to test on.	2020-02-05 14:32:01 -05:00
Matt Arsenault	6fb544d1d2	AMDGPU/GlobalISel: Combine FMIN_LEGACY/FMAX_LEGACY Try out using combine definition rules. This really should be a post-legalizer combine, but the combiner pass is currently pre-legalize. Most of the target combines are really post-legalize, so we should probably move the pass.	2020-01-31 06:58:04 -08:00
Matt Arsenault	d2a9739274	AMDGPU/GlobalISel: Eliminate SelectVOP3Mods_f32 Trivial type predicates should be moved into the tablegen pattern itself, and not checked inside complex patterns. This eliminates a redundant complex pattern, and fixes select source modifiers for GlobalISel. I have further patches which fully handle select in tablegen and remove all of the C++ selection, although it requires the ugliness to support the entire range of legal register types.	2020-01-27 17:53:54 -05:00
Matt Arsenault	c3075e6171	AMDGPU/GlobalISel: Select buffer atomics The cmpswap handling is incomplete and fails to select.	2020-01-27 15:16:44 -05:00
Matt Arsenault	533d650e94	AMDGPU/GlobalISel: Move llvm.amdgcn.raw.buffer.store handling Treat this the same way as loads. There's less value to the intermediate nodes, but it's good to be consistent.	2020-01-27 14:59:30 -05:00
Matt Arsenault	09ed0e44d9	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.tbuffer.load	2020-01-27 13:40:37 -05:00
Matt Arsenault	198624c39d	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load.format	2020-01-27 13:02:19 -05:00
Matt Arsenault	fc90222a91	AMDGPU/GlobalISel: Select llvm.amdgcn.raw.buffer.load Use intermediate instructions, unlike with buffer stores. This is necessary because of the need to have an internal way to distinguish between signed and unsigned extloads. This introduces some duplication and near duplication with the buffer store selection path. The store handling should maybe be moved into legalization to match and eliminate the duplication.	2020-01-27 12:49:23 -05:00
Matt Arsenault	c05f23e409	AMDGPU/GlobalISel: Select llvm.amdgcn.mov.dpp This is deprecated, but easy to support.	2020-01-22 11:43:53 -05:00
Matt Arsenault	a722cbf77c	AMDGPU/GlobalISel: Handle atomic_inc/atomic_dec The intermediate instruction drops the extra volatile argument. We are missing an atomic ordering on these.	2020-01-22 09:26:17 -05:00
Matt Arsenault	64e9528201	AMDGPU: Fix missing immarg on llvm.amdgcn.interp.mov The first operand maps to an immediate field, so this should be immarg.	2020-01-22 09:01:34 -05:00
Matt Arsenault	9b13b4a0e3	AMDGPU: Prepare to use scalar register indexing Define pseudos mirroring the the VGPR indexing ones, and adjust the operands in the s_movrel* instructions to avoid the result def.	2020-01-20 17:19:16 -05:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	eef92f25cc	AMDGPU: Remove custom node for exports I'm mildly worried about potentially reordering exp/exp_done with IntrWriteMem on the intrinsic. Requires hacking out the illegal type on SI, so manually select that case during lowering.	2020-01-15 18:33:15 -05:00
Matt Arsenault	9ffd0ed838	AMDGPU/GlobalISel: Fix import of integer med3 This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.	2020-01-09 10:29:32 -05:00
Matt Arsenault	3952748ffd	AMDGPU/GlobalISel: Fix add of neg inline constant pattern	2020-01-09 10:29:31 -05:00
Matt Arsenault	78b30a54c9	AMDGPU/GlobalISel: Fix readfirstlane pattern import The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.	2020-01-07 11:07:08 -05:00
Matt Arsenault	a428386d4a	AMDGPU/GlobalISel: Partially fix llvm.amdgcn.kill pattern import Tests deferred since the existing DAG test depends on some other operations, but isn't far from working as-is.	2020-01-07 10:09:59 -05:00
Matt Arsenault	7f2db2917d	AMDGPU: Fix legalizing f16 fpow The existing test only covered one case for r600. The use of mul_legacy also looks suspicious to me, but leave it for now. The patterns are also not making use of source modifiers.	2020-01-06 17:21:51 -05:00
Matt Arsenault	14d25052a2	AMDGPU: Use ImmLeaf for inline immediate predicates	2020-01-06 17:21:51 -05:00
Matt Arsenault	e4464bf3d4	AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR	2020-01-06 11:19:33 -05:00
Matt Arsenault	0d9f919b73	DAG: Use TargetConstant for FENCE operands	2020-01-02 17:16:10 -05:00
Matt Arsenault	1247865fe0	AMDGPU/GlobalISel: Select llvm.amdgcn.fmad.ftz	2019-12-30 11:12:35 -05:00
Matt Arsenault	171cf5302f	AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.	2019-10-25 13:11:09 -07:00
Matt Arsenault	ef9a0278f0	AMDGPU: Select basic interp directly from intrinsics llvm-svn: 375457	2019-10-21 21:49:44 +00:00
Stanislav Mekhanoshin	1184c27fa5	[AMDGPU] Support mov dpp with 64 bit operands We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 llvm-svn: 374910	2019-10-15 16:41:15 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Matt Arsenault	fdea5e02ce	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP llvm-svn: 373298	2019-10-01 02:23:20 +00:00
Matt Arsenault	59b91aa93e	AMDGPU/GlobalISel: Add support for init.exec intrinsics TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296	2019-10-01 02:07:25 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00

1 2 3 4 5 ...

399 Commits