llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	a8816ebee0	[AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32. Differential Revision: https://reviews.llvm.org/D83383	2020-07-08 19:14:49 +01:00
Matt Arsenault	f25d020c2e	AMDGPU/GlobalISel: Add types to special inputs When passing special ABI inputs, we have no existing context for the type to use.	2020-07-06 17:00:55 -04:00
Matt Arsenault	443556c18f	AMDGPU/GlobalISel: Fix some legalization of < dword vector stores This avoids many instances of failing to legalize a vector truncstore of <4 x s8> to 2 bytes. We don't perfectly handle every truncstore yet, largely because the given set of legalization actions can't actually differentiate between changing the result type and changing the memory type.	2020-06-26 18:07:39 -04:00
Matt Arsenault	a448670752	AMDGPU/GlobalISel: Legalize 64-bit G_SDIV/G_SREM Now all the divisions should be complete, although we should fix emitting the entire common part for div/rem when you use both.	2020-06-24 11:39:45 -04:00
Matt Arsenault	b5c4e6c148	AMDGPU/GlobalISel: Invert parameter for div/rem lowering function	2020-06-24 11:39:45 -04:00
Eli Friedman	a2caa3b614	Remove GlobalValue::getAlignment(). This function is deceptive at best: it doesn't return what you'd expect. If you have an arbitrary GlobalValue and you want to determine the alignment of that pointer, Value::getPointerAlignment() returns the correct value. If you want the actual declared alignment of a function or variable, GlobalObject::getAlignment() returns that. This patch switches all the users of GlobalValue::getAlignment to an appropriate alternative. Differential Revision: https://reviews.llvm.org/D80368	2020-06-23 19:13:42 -07:00
Michael Liao	20a1700293	[amdgpu] Fix REL32 relocations with negative offsets. Summary: - The offset should be treated as a signed one. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82234	2020-06-21 23:09:03 -04:00
Eric Christopher	cf23852587	[Target] As part of using inclusive language within the llvm project, migrate away from the use of blacklist and whitelist. This change affects an internal llvm command line option.	2020-06-20 00:06:39 -07:00
Carl Ritson	8f3b2c8aa3	AMDGPU/GlobalISel: Remove selection of MAD/MAC when not available Add code to respect mad-mac-f32-insts target feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D81990	2020-06-19 10:30:19 +09:00
Matt Arsenault	7f8b2e1b91	GlobalISel: Pass LegalizerHelper to custom legalize callbacks This was passing in all the parameters needed to construct a LegalizerHelper in the custom legalization, when it's simpler to just pass in the existing helper. This is slightly more annoying to use in the common case where you don't need the legalizer helper, but we could add back the common parameters back in addition to the helper. I didn't propagate this to all the internal target changes that this logically implies, but did update a sample one for legalizeMinNumMaxNum. This is in preparation for moving AMDGPU load/store legalization entirely into custom lowering. The current set of legalization actions is really constraining and not really capable of expressing all the actions needed to legalize loads/stores. In particular there's no way to express when the memory access itself needs to change size vs. the result type. There's also a lot of redundancy since the same split/widen actions need to be applied in both vector and scalar cases. All of the sub-cases logically belong as steps in the legalizer helper, but it will be easier to consider everything at once in custom lowering.	2020-06-18 17:17:38 -04:00
Matt Arsenault	3b34f3fcca	AMDGPU/GlobalISel: Fix obvious bug in ported 32-bit udiv/urem This was hidden by the IR expansion in AMDGPUCodeGenPrepare, which I forgot to turn off.	2020-06-16 22:46:35 -04:00
Matt Arsenault	e07cf92377	AMDGPU/GlobalISel: Don't hardcode maximum register size This is a somewhat artifical limit, so avoid repeating it many places in case it changes.	2020-06-15 15:01:19 -04:00
Matt Arsenault	1a7f115dce	AMDGPU/GlobalISel: Extend load/store workaround to i128 vectors	2020-06-15 14:55:11 -04:00
Matt Arsenault	dae9554b2b	AMDGPU/GlobalISel: Workaround some load/store type selection patterns The logic is written for what loads/stores should be selectable. There are a set of cases that should be selectable, but due to missing MVTs and/or selection patterns, will fail to select. I think eventually load/store select patterns should ignore the type and only look at the value size, but until that happens, bitcast these to equivalent i32 vectors.	2020-06-15 07:42:20 -04:00
Sebastian Neubauer	29a6ad94fd	[AMDGPU] Add G16 support to image instructions Add G16 feature for GFX10 and support A16 and G16 in GlobalISel. Differential Revision: https://reviews.llvm.org/D76836	2020-06-12 11:26:31 +02:00
Matt Arsenault	2247072b65	AMDGPU/GlobalISel: Set insert point when emitting control flow pseudos This was implicitly assuming the branch instruction was the next after the pseudo. It's possible for another non-terminator instruction to be inserted between the intrinsic and the branch, so adjust the insertion point. Fixes a non-terminator after terminator verifier error (which without the verifier, manifested itself as an infinite loop in analyzeBranch much later on).	2020-06-11 18:53:26 -04:00
Matt Arsenault	19b3b886b7	AMDGPU/GlobalISel: Fix porting error in 32-bit division The baffling thing is this passed the OpenCL conformance test for 32-bit integer divisions, but only failed in the 32-bit path of BypassSlowDivisions for the 64-bit tests.	2020-06-10 21:48:58 -04:00
Matt Arsenault	721f8f7530	AMDGPU: Stop using getSelectCC in division lowering This was promoting booleans to i32 to perform a comparison against them to feed to a select condition. Just use the booleans directly. This produces the same final code, since the combiner is unable to undo the mess this creates. I untangled this logic when I ported this code to GlobalISel, so port the cleanups back.	2020-06-10 13:56:53 -04:00
Matt Arsenault	ea1bd95411	AMDGPU/GlobalISel: Make G_IMPLICIT_DEF legality more consistent Makes <6 x s16> legal, <4 x s8> illegal, and clamps the maximum size to 1024.	2020-06-10 11:05:59 -04:00
Matt Arsenault	32823091c3	GlobalISel: Set instr/debugloc before any legalizer action It was annoying enough that every custom lowering needed to set the insert point, but this was made worse since now these all needed to be updated to setInstrAndDebugLoc. Consolidate these so every legalization action has the right insert position by default. This should fix dropping debug info in every custom AMDGPU legalization.	2020-06-09 15:37:02 -04:00
Matt Arsenault	bc20bdb9f9	AMDGPU/GlobalISel: Start rewriting load/store legality rules The current set is an incomprehensible mess riddled with ordering hacks for various limitations in the legalizer at the time of writing, many of which have been fixed. This takes a very small step in correcting this. The core first change is to start checking for fully legal cases first, rather than trying to figure out all of the actions that could need to be performed. It's recommended to check the legal cases first for faster legality checks in the common case. This still has a table listing some common cases, but it needs measuring whether this really helps or not. More significantly, stop trying to allow any arbitrary type with a legal bitwidth as a legal memory type, and start using the bitcast legalize action for them. Allowing loads of these weird vector types produced new burdens we don't need for handling all of the legalization artifacts. Unlike the SelectionDAG handling, this is still not casting 64 or 16-bit element vectors to 32-bit vectors. These cases should still be handled by increasing/decreasing the number of 16-bit elements. This is primarily to fix 8-bit element vectors. Another change is to stop trying to handle the load-widening based on a higher alignment. We should still do this, but the way it was handled wasn't really correct. We really need to modify the MMO's size at the same time, and not just increase the result type. The LegalizerHelper does not do this, and I think this would really require a separate WidenMemory action (or to add a memory action payload to the LegalizeMutation). These will now fail to legalize. The structure of the legalizer rules makes writing concise rules here difficult. It would be easier if the same function could answer the query the query, and report the action to perform at the same time. Instead these two are split into distinct predicate and action functions. This is mostly tolerable for other cases, but the load/store rules get pretty complicated so it's difficult to keep two versions of these functions in sync.	2020-06-06 09:59:46 -04:00
Matt Arsenault	fe0d5121fa	AMDGPU/GlobalISel: Fix making LDS FP atomics legal on SI/CI	2020-06-04 16:50:19 -04:00
Matt Arsenault	a1a93ca48a	AMDGPU/GlobalISel: Handle uniform G_DYN_STACKALLOC	2020-06-03 19:56:07 -04:00
Jay Foad	b28d038ff3	[AMDGPU] Better use of llvm::numbers Tweak a few constant expressions involving numbers::pi etc to avoid rounding errors. NFCI though it's possible some of these will now be more accurate in the last bit.	2020-05-29 09:55:36 +01:00
Jay Foad	036d4b0dbf	[AMDGPU] Use numbers::pi instead of M_PI. NFC.	2020-05-29 09:55:36 +01:00
Matt Arsenault	e13c84c3be	GlobalISel: Work on improving stock set of legality predicates I get confused by a lot of the predicate names here, since I would assume they apply to vectors as well. Rename to reflect they only apply to scalars. Also add a few predicates AMDGPU uses that should be generally useful. Also add any() to complement all. I've wanted to use this a few times but then worked around it not being there.	2020-05-28 20:28:24 -04:00
Matt Arsenault	ef3e831226	GlobalISel: Basic legalization for G_PTRMASK	2020-05-26 21:20:30 -04:00
Matt Arsenault	8bc03d2168	GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic Confusingly, these were unrelated and had different semantics. The G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a different format. G_PTR_MASK only allows clearing the low bits of a pointer, and only a constant number of bits. The ptrmask intrinsic allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic. Only selects the cases that look like the old instruction. More work is needed to select the general case. Also new legalization code is still needed to deal with the case where the incoming mask size does not match the pointer size, which has a specified behavior in the langref.	2020-05-26 11:48:13 -04:00
Matt Arsenault	66fe60220c	AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks Unlike SelectionDAGBuilder, IRTranslator omits the unconditional branch in fallthrough cases. Confusingly, the control flow pseudos function in the opposite way the intrinsics are used, and the branch targets always need to be swapped. We're inverting the target blocks, so we need to figure out the old fallthrough block and insert a branch to the original unconditional branch target.	2020-05-22 10:31:44 -04:00
Matt Arsenault	bf527a1dc4	AMDGPU/GlobalISel: Fix f64 G_FDIV lowering This was using an integer multiply instead of FP.	2020-05-18 15:14:08 -04:00
Matt Arsenault	3af85fa8f0	GlobalISel: Handle more cases in lowerUnmergeValues Handle scalar sources, as well as vectors.	2020-05-09 19:33:32 -04:00
Matt Arsenault	69999605ee	GlobalISel: Move code into lowering for G_MERGE_VALUES Currently this code exists in widenScalar for G_MERGE_VALUE sources. I'm not sure if the existing expansion in widenScalar should be removed or not. The widenScalar variant tries to extend to the requested size, but this just uses the original bitwidth.	2020-05-09 16:39:37 -04:00
Dominik Montada	55e3a7c6b2	[GlobalISel][AMDGPU] add legalization for G_FREEZE Summary: Copy the legalization rules from SelectionDAG: -widenScalar using anyext -narrowScalar using intermediate merges -scalarize/fewerElements using unmerge -moreElements using G_IMPLICIT_DEF and insert Add G_FREEZE legalization actions to AMDGPULegalizerInfo. Use the same legalization actions as G_IMPLICIT_DEF. Depends on D77795. Reviewers: dsanders, arsenm, aqjune, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson Reviewed By: arsenm Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78092	2020-04-17 16:44:46 +02:00
Austin Kerbow	a69b3e010c	[AMDGPU][GlobalISel] Fix div_scale in FDIV lowering Differential Revision: https://reviews.llvm.org/D78004	2020-04-13 15:54:49 -07:00
Matt Arsenault	ac8d51a3c6	AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16 The current selector depends on 16-bit shifts using 16-bit shift amount types, but really it should accept either for all types.	2020-04-11 18:12:26 -04:00
Matt Arsenault	c5497e5399	AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects	2020-04-11 15:59:51 -04:00
Matt Arsenault	5e4e8d0388	AMDGPU/GlobalISel: Change intrinsic ID for _L to _LZ opt Still should handle the other case changes the opcode this way.	2020-04-01 13:03:02 -04:00
Simon Pilgrim	552e46ea1e	Fix unused variable warnings. NFCI.	2020-04-01 14:36:51 +01:00
Matt Arsenault	43e576593e	AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD	2020-03-31 19:57:06 -04:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Matt Arsenault	bcb643c8af	AMDGPU/GlobalISel: Handle image atomics	2020-03-30 17:41:04 -04:00
Matt Arsenault	48eda37282	AMDGPU/GlobalISel: Start selecting image intrinsics Does not handled atomics yet.	2020-03-30 17:33:04 -04:00
Matt Arsenault	2641ba52a9	AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses The instruction definitions are missing for these register types, so round up to 8 like the DAG.	2020-03-30 17:02:47 -04:00
Matt Arsenault	42d5609809	AMDGPU/GlobalISel: Start handling _L to _LZ optimization We currently don't have a way to map to the equivalent intrinsic opcode, so track immediate 0s in place of the address for the selection to know to change the final opcode.	2020-03-30 17:02:30 -04:00
Matt Arsenault	4919f2e1c5	AMDGPU/GlobalISel: Basic legalize rules for G_FSHR Only handles easy 32-bit cases.	2020-03-30 11:53:01 -07:00
Matt Arsenault	90a36bbd7c	AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM Mostly ported from the DAG version. This results in much worse code than the DAG version, largely due to a much worse expansion for G_UMULH.	2020-03-30 10:57:37 -04:00
David Blaikie	9002db05a2	Roll otherwise unused subexpressions into an assertion	2020-03-26 11:32:33 -07:00
Matt Arsenault	678da7b109	AMDGPU/GlobalISel: Remove leftover #if 0 The subtarget feature used to be missing from subtargets, but that was fixed.	2020-03-19 20:07:05 -04:00
Matt Arsenault	ea4597eef1	Reapply "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" This reverts commit `9bca8fc4cf`. Rearrange handling to avoid changing the instruction in the case where it's going to be erased and replaced with undef.	2020-03-18 12:01:22 -04:00
Vitaly Buka	9bca8fc4cf	Revert "AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize" The patch introduced use-after-poison. This reverts commit `d0fe13ecf9`.	2020-03-17 22:04:14 -07:00
Matt Arsenault	039c917b43	AMDGPU/GlobalISel: Fix asserting on gather4 intrinsics	2020-03-17 11:07:30 -04:00
Matt Arsenault	d0fe13ecf9	AMDGPU/GlobalISel: Fully handle 0 dmask case during legalize For normal loads, fully eliminate the load. For the TFE case, adjust the dmask value in the instruction so the selector doesn't need to handle it. For the TFE special case, I guess it would be possible to replace the loaded data register with undef, but as-is this will start treating it as a well defined value.	2020-03-17 10:15:30 -04:00
Matt Arsenault	d9a012ed8a	AMDGPU/GlobalISel: Adjust image load register type based on dmask Trim elements that won't be written. The equivalent still needs to be done for writes. Also start widening 3 elements to 4 elements. Selection will get the count from the dmask.	2020-03-17 10:09:18 -04:00
Matt Arsenault	83ffbf2618	AMDGPU/GlobalISel: Legalize non-a16 non-NSA images	2020-03-17 10:02:09 -04:00
Matt Arsenault	2aba9b6cf8	AMDGPU/GlobalISel: Legalize a16 images Pack the address registers in the legalizer. Avoid introducing a huge family of new intermediate operations by filling dead operands with noreg.	2020-03-17 10:02:09 -04:00
Matt Arsenault	57d896e838	AMDGPU/GlobalISel: Make some large merges legal We allow up to 1024-bit registers, so we should support merges all the way to the maximum.	2020-03-16 10:49:10 -04:00
Matt Arsenault	bb8622094d	AMDGPU: Don't handle kernarg.segment.ptr in functions Just lower this to null. Pass implicitarg.ptr in its place in the argument list.	2020-03-13 12:51:12 -07:00
Matt Arsenault	1e0c540360	AMDGPU: Don't hard error on LDS globals in functions Instead, emit a trap and a warning. We force inlining of this situation, so any function where this happens should be dead as indirect or external calls are not yet supported. This should avoid erroring on dead code.	2020-03-11 15:34:11 -04:00
Matt Arsenault	edd0dfca0d	AMDGPU/GlobalISel: Refine G_TRUNC legality rules Scalarize most truncates. Avoid touching cases that could end up in unresolvable infinite loops.	2020-03-10 15:32:22 -07:00
Matt Arsenault	ce8a1f7294	GlobalISel: Implement fewerElementsVector for G_TRUNC Extend fewerElementsVectorBasic to handle operands with different element types.	2020-03-10 15:17:20 -07:00
hsmahesha	3fda1fde8f	AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s). Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74688	2020-03-05 08:16:57 +05:30
Matt Arsenault	86e13ec194	AMDGPU/GlobalISel: Use packed for G_ADD/G_SUB/G_MUL v2s16	2020-02-25 11:20:35 -05:00
Jay Foad	33cbd5ee08	AMDGPU/GlobalISel: Legalize s64 min/max by lowering Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75108	2020-02-25 16:00:43 +00:00
Jay Foad	0ed4744bb5	AMDGPU/GlobalISel: Lower 64-bit uaddo/usubo Summary: Add more test cases for signed and unsigned add/sub with overflow. Reviewers: arsenm, rampitec, kerbowa Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75051	2020-02-24 23:08:14 +00:00
Matt Arsenault	72eef820d5	AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel for VOP3P instructions. Expand it in some other way in case it doesn't fold into the use instructions.	2020-02-21 13:35:40 -05:00
Matt Arsenault	79ff188add	AMDGPU/GlobalISel: Legalize G_FPOW There are few differences from the DAG handling. First, the DAG handling uses a primitive selection pattern instead of custom legalizing it. Because of this, this makes use of source modifiers while the DAG does not. Also instead of promoting f16, try to use the f16 log/exp. There's no f16 fmul_legacy, so widen just for the multiply, although I'm not sure that's the best solution.	2020-02-21 10:31:13 -05:00
Matt Arsenault	37c452a289	AMDGPU/GlobalISel: Adjust branch target when lowering loop intrinsic This needs to steal the branch target like the other control flow intrinsics.	2020-02-18 06:35:40 -08:00
Matt Arsenault	f742a28ae3	AMDGPU/GlobalISel: Custom lower 32-bit G_SDIV/G_SREM	2020-02-17 15:09:51 -05:00
Matt Arsenault	e240b27d6d	AMDGPU/GlobalISel: Allow arbitrary global values Treat unknown address spaces as global	2020-02-17 11:32:28 -08:00
Matt Arsenault	96db12d507	AMDGPU/GlobalISel: Custom lower 32-bit G_UDIV/G_UREM AMDGPUCodeGenPrepare expands this most of the time, but not always. We will always at least need a fallback option here. This is the 3rd implementation of the same expansion in the backend. Eventually I would like to eliminate the IR expansion (and the DAG version obviously). Currently the new legalizer path produces a better result, since the IR expansion results in extra operations which need to be combined out. Notably, the IR expansion results in multiplies by 0.	2020-02-17 11:05:50 -08:00
Michael Liao	487fcc8d3d	Fix `-Wpedantic` warning. NFC.	2020-02-17 00:18:01 -05:00
Matt Arsenault	295bbea3ed	AMDGPU/GlobalISel: Fix non-power-of-2 G_SITOFP/G_UITOFP This wouldn't work for s33-s63 sources.	2020-02-16 22:48:57 -05:00
Matt Arsenault	044d40ed46	AMDGPU/GlobalISel: Move lambdas to normal function These aren't using any local state	2020-02-16 22:48:32 -05:00
Matt Arsenault	60fea2713d	AMDGPU/GlobalISel: Improve 16-bit bswap Match the new DAG behavior and use v_perm_b32 when available. Also does better on SI/CI by expanding 16-bit swaps. Also fix non-power-of-2 cases.	2020-02-14 15:57:39 -08:00
Matt Arsenault	bfbfa18591	GlobalISel: Lower s64->s16 G_FPTRUNC This is more or less directly ported from the AMDGPU custom lowering for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES instead of creating shift/trunc to extract the two halves, and zexting an inverted compare instead of select_cc). This also does not include the fast math expansion the DAG which converts to f32 and then to f16. I think that belongs in a pre-legalize combine instead.	2020-02-14 10:46:58 -08:00
Matt Arsenault	a257bde420	AMDGPU/GlobalISel: Handle G_BSWAP	2020-02-14 09:09:44 -08:00
Matt Arsenault	5adbf7d57f	AMDGPU/GlobalISel: Make G_TRUNC legal This is required to be legal. I'm not sure how we were getting away without defining any rules for it.	2020-02-13 15:25:52 -05:00
Matt Arsenault	fa61e200e5	AMDGPU/GlobalISel: Widen non-power-of-2 load results Load extra bits if suitably aligned. This allows using widened 3-vector loads on SI, and fixes legalization for <9 x s32> (which LSV apparently forms frequently on lowered kernel argument lists). Fix incorrectly treating these as legal on SI. This should emit a 64-bit store and a 32-bit store. I think all of the load and store rules are just about complete, but due for a rewrite.	2020-02-12 09:35:10 -05:00
Matt Arsenault	f4a38c114e	AMDGPU/GlobalISel: Look through casts when legalizing vector indexing We were failing to find constants that were casted. I feel like the artifact combiner should have folded the constant in the trunc before the custom lowering, but that doesn't happen.	2020-02-09 18:02:10 -05:00
Petar Avramovic	7df5fc9e03	[GlobalISel] Add buildMerge with SrcOp initializer list Allows more flexible use of buildMerge in places where use operands are available as SrcOp since it does not require explicit conversion to Register. Simplify code with new buildMerge. Differential Revision: https://reviews.llvm.org/D74223	2020-02-07 18:43:45 +01:00
Matt Arsenault	8de2dad9e0	GlobalISel: Fix lowering of G_CTLZ/G_CTTZ The type passed to lower was invalid, so I'm not sure how this was even working before. The source and destination type also do not have to match, so make sure to use the right ones.	2020-02-07 06:54:12 -08:00
Matt Arsenault	6a570dc548	AMDGPU/GlobalISel: Fix non-pow-2 add/sub/mul for 16-bit insts These wouldn't legalize between 16-bits and 32-bits on targets with 16-bit instructions.	2020-02-06 21:43:54 -05:00
Matt Arsenault	baafe82b07	AMDGPU/GlobalISel: Remove bitcast legality hack	2020-02-05 16:24:24 -05:00
Matt Arsenault	364326ce66	AMDGPU/GlobalISel: Add mem operand to s.buffer.load intrinsic Really the intrinsic definition is wrong, but work around this here. The DAG lowering introduces an MMO. We have to introduce a new operation to avoid the verifier complaining about the missing mayLoad.	2020-02-05 15:04:42 -05:00
Matt Arsenault	5aa6e246a1	AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI Use cmp ord instead of cmp_class compared to the DAG version for the nan check, but mostly try to match the existsing pattern. I think the sign doesn't matter for fract, so we could do a little better with the source modifier matching. I think this is also still broken as in D22898, but I'm leaving it as-is for now while I don't have an SI system to test on.	2020-02-05 14:32:01 -05:00
Matt Arsenault	7bffa97285	AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE These have a better chance of combining with other operations and are currently much better supported than G_EXTRACT.	2020-02-05 12:56:10 -05:00
Matt Arsenault	e65e6d052e	AMDGPU/GlobalISel: Legalize TFE image result loads Rewrite the result register pair into the expected sinigle register format in the legalizer. I'm also operating under the assumption that TFE doesn't apply to stores or atomics, but don't know if this is true or not.	2020-02-05 12:40:20 -05:00
Jordan Rupprecht	9f507bfd8d	NFC: fix unused var warnings in no-assert builds	2020-02-05 09:26:59 -08:00
Matt Arsenault	69cc9f3046	AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load The 96-bit results need to be widened. I find the interaction between LegalizerHelper and MIRBuilder somewhat awkward. The custom legalization is called by the LegalizerHelper, but then does not have access to the helper. You have to construct a new helper, which then does not own the MachineIRBuilder, but does modify it. Maybe custom legalization should be passed the helper?	2020-02-05 12:01:34 -05:00
Matt Arsenault	dfa9420f09	AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR If we have s_pack_* instructions, legalize this to G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the s_pack_* instructions really behave. If we don't have s_pack_ instructions, expand this by creating a merge to s32 and bitcasting. This expands to the expected bit operations. I think this eventually should go in a new bitcast legalize action type in LegalizerHelper. We already directly emit the shift operations in RegBankSelect for the vector case. This could possibly be cleaned up, but I also may want to defer doing this expansion to selection anyway. I'll see about that when I try to actually match VOP3P instructions. This breaks the selection of the build_vector since tablegen doesn't know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.	2020-02-05 11:52:18 -05:00
Matt Arsenault	05f2a04ba7	AMDGPU/GlobalISel: Legalize G_SEXT_INREG Split the VALU 64-bit case in RegBankSelect.	2020-02-04 13:23:53 -08:00
Matt Arsenault	9b0ce8edfa	AMDGPU/GlobalISel: Remove extension legality hacks The legalization has improved since this was added, and the tests relying on this no longer need it.	2020-02-04 12:50:47 -08:00
Matt Arsenault	5d2749938c	AMDGPU/GlobalISel: Custom lower G_FEXP	2020-02-04 11:50:55 -08:00
Matt Arsenault	b461436d01	AMDGPU/GlobalISel: Legalize s16 G_FEXP2	2020-02-04 11:50:55 -08:00
Matt Arsenault	1024b73ef5	AMDGPU: Split denormal mode tracking bits Prepare to accurately track the future denormal-fp-math attribute changes. The way to actually set these separately is not wired in yet. This is just a mechanical change, and mostly still assumes the input and output mode match. This should be refined for some cases. For example, fcanonicalize lowering should use the flushing variant if either input or output flushing is enabled	2020-02-04 10:44:21 -08:00
Matt Arsenault	cd7650c186	GlobalISel: Implement fewerElementsVector for G_SEXT_INREG Start using a new strategy with a combination of merge and unmerges. This allows scalarizing before lowering, which in cases like <2 x s128> avoids producing giant illegal shifts.	2020-02-03 11:47:33 -08:00
Matt Arsenault	c0b12916a7	AMDGPU/GlobalISel: Use more wide vector load/stores This improves the type breakdown for some large vectors. For example, we now get a <4 x s32> and s32 store instead of 5 s32 stores for <5 x s32>.	2020-02-01 10:47:21 -05:00
Matt Arsenault	e3117e5c30	AMDGPU/GlobalISel: Improve legalization of wide stores This fixes legalizations of global stores > 128-bits. It seems work is needed on how this split actually occurs. For example, we get the right code for s160, with an s128 and s32 load, but get 5 s32 loads for <5 x s32>.	2020-02-01 10:47:03 -05:00
Jay Foad	2a1b5af299	[GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister Summary: As a side effect some redundant copies of constant values are removed by CSEMIRBuilder. Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73789	2020-01-31 17:07:16 +00:00
Jay Foad	31e29d4afe	AMDGPU/GlobalISel: Make use of MachineIRBuilder helper functions. NFC.	2020-01-31 13:53:39 +00:00

1 2 3 4 5 ...

428 Commits