llvm-project

Commit Graph

Author	SHA1	Message	Date
Yeting Kuo	cefb7aab61	[VP][RISCV] Add vp.copysign and RISC-V support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134935	2022-10-01 10:19:10 +08:00
Philip Reames	1e3c179519	[RISCV] Address post commit review comments from D134881	2022-09-30 08:31:40 -07:00
Philip Reames	2b5960028e	[RISCV] Branchless lowering for select (and (x , 0x1) == 0), y, (z ^ y) ) and select (and (x , 0x1) == 0), y, (z \| y) ) This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes. Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload. Differential Revision: https://reviews.llvm.org/D134881	2022-09-30 08:24:32 -07:00
Ray Wang	4c786c9747	[RISCV] Remove some unused var decl. NFC Differential Revision: https://reviews.llvm.org/D134707	2022-09-30 08:08:15 -07:00
Yeting Kuo	1cc02b05b7	[SelectionDAG] Add helper function to check whether a SDValue is neutral element. NFC. Using this helper makes work about neutral elements more easier. Although I only find one case now, I think it will have more chance to be used since so many combine works are related to neutral elements. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133866	2022-09-30 11:29:11 +08:00
eopXD	02a982829c	[RISCV] Add lowering for llvm.roundeven Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134785	2022-09-29 06:08:14 -07:00
eopXD	9677d70eb2	[VP][RISCV] Add vp.floor, vp.round, vp.roundeven and their RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134759	2022-09-27 19:45:58 -07:00
Han-Kuan Chen	c595c874cb	[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133688	2022-09-27 17:25:34 -07:00
eopXD	163cb33854	[VP][RISCV] Add vp.ceil and RISC-V support Previous commit `8b00b24f85` missed to add `int_ceil` anchor for the llvm.ceil.* section under LangRef.rst Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 12:04:09 -07:00
eopXD	384b8b3da7	Revert "[VP][RISCV] Add vp.ceil and RISC-V support" This reverts commit `8b00b24f85`.	2022-09-27 11:12:57 -07:00
eopXD	8b00b24f85	[VP][RISCV] Add vp.ceil and RISC-V support Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134586	2022-09-27 11:08:27 -07:00
Yeting Kuo	04e1301f3d	[VP][RISCV] Add vp.maxnum and vp.minnum intrinsics and RISC-V support. Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum and llvm.minnum. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134639	2022-09-27 13:36:45 +08:00
Yeting Kuo	43c5fbdd3a	[VP][RISCV] Add vp.sqrt intrinsic and RISC-V support. The patch modeled vp.fabs patch D132793. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D133690	2022-09-26 10:47:40 +08:00
Philip Reames	6e7c54ecaf	[RISCV] Add lowering for scalable @llvm.riscv.masked.strided.load/store The code previously assumed fixed length vectors; make the relevant code conditional. Having the lowering in place is neccessary for an upcoming change to generalize scatter/gather matching to scalable vectors. Differential Revision: https://reviews.llvm.org/D134489	2022-09-24 17:41:57 -07:00
Craig Topper	19850cc2d8	Revert "[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type." This reverts commit `dd53a0bb30`. We have seen crashes from this internally. Probably due to the use of RoundingMode::Dynamic.	2022-09-23 18:41:41 -07:00
Craig Topper	90a5d8499a	[RISCV] Promote f16 STRICT_FCEIL/FLOOR/TRUNC/NEARBYINT/RINT/ROUND,ROUNDEVEN to f32.	2022-09-23 14:01:51 -07:00
Philip Reames	60c91fd364	[RISCV] Disallow scale for scatter/gather RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely. This has two effects: * First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to. * Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.) The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics. Differential Revision: https://reviews.llvm.org/D134382	2022-09-22 15:31:26 -07:00
Craig Topper	52708be182	[RISCV] Remove support for the unratified Zbe, Zbf, and Zbm extensions. These extensions do not appear to be on their way to ratification.	2022-09-22 13:04:41 -07:00
Fraser Cormack	92d71c615d	[RISCV] Use structured bindings in common RVV lowering code This patch uses structured bindings to simplify a couple of specific cases when lowering RVV operations where we commonly declare two SDValues and immediately 'tie' them to the mask and vector length. There's also a couple places where we split vectors that structured bindings make sense to use. This patch tries to keep these sorts of changes minimal and to cases where the returned types are commonly understood, rather than applying this wholesale to the RISCV backend. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D134442	2022-09-22 16:38:40 +01:00
Craig Topper	bf7c7696fe	[RISCV] Improve support for vector fp_to_sint_sat/uint_sat. The default fixed vector legalization is to unroll. The default scalable vector legalization is to clamp in the FP domain. The RVV vfcvt instructions have saturating behavior so we can use them directly. The only difference is that RVV instruction turn nan into the max value, but the _SAT intrinsics want 0. I'm only supporting 1 step of narrowing for now. I think we can support more steps by using VNCLIP to saturate and narrower. The only case that needs 2 steps of widening is f16->i64 which we can do as f16->f32->i64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D134400	2022-09-22 08:13:48 -07:00
Craig Topper	8b8e18e11f	[RISCV] Replace RISCVISD::GREV/GORC/SHFL/UNSHFL with BREV8/ORC_B/ZIP/UNZIP. With Zbp removed, we no longer need the generalized forms. The computeKnownBitsForTargetNode code brev8/orc.b is still based on the general form with the shift amount forced to 7.	2022-09-21 21:57:59 -07:00
Craig Topper	182aa0cbe0	[RISCV] Remove support for the unratified Zbp extension. This extension does not appear to be on its way to ratification. Still need some follow up to simplify the RISCVISD nodes.	2022-09-21 21:22:42 -07:00
Craig Topper	1d8a7adca6	[RISCV] Rename RISCVISD::SINT_TO_FP_VL/UINT_TO_FP_VL. NFC Name them after the instructions VFCVT_RTZ_X(U)_F_VL to make it clear that the ISD nodes don't have the poison semantics of ISD::SINT_TO_FP/UINT_TO_FP. I play to reuse this node for a FP_TO_SINT_SAT/FP_TO_UINT_SAT patch and need the instruction semantics.	2022-09-21 15:33:04 -07:00
Craig Topper	70a64fe7b1	[RISCV] Remove support for the unratified Zbt extension. This extension does not appear to be on its way to ratification. Out of the unratified bitmanip extensions, this one had the largest impact on the compiler. Posting this patch to start a discussion about whether we should remove these extensions. We'll talk more at the RISC-V sync meeting this Thursday. Reviewed By: asb, reames Differential Revision: https://reviews.llvm.org/D133834	2022-09-20 20:26:48 -07:00
LiaoChunyu	2e74157ad4	[RISCV]Preserve (and X, 0xffff) in targetShrinkDemandedConstant shrinkdemandedconstant does some optimizations, but is not very friendly to riscv, targetShrinkDemandedConstant to limit the damage. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134155	2022-09-19 14:19:38 +08:00
LiaoChunyu	8fee91c435	[RISCV][NFC]Remove outdated comment from targetShrinkDemandedConstant Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D134154	2022-09-19 10:23:06 +08:00
Craig Topper	61595c45af	[RISCV] Simplify some code in vector fp<->int handling. NFC We changed the way container types are selected since this code was written. We no longer need to use the largest type.	2022-09-16 12:56:42 -07:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Han-Kuan Chen	dd53a0bb30	[RISCV] Lower BUILD_VECTOR to RISCVISD::VID_VL if it is floating-point type. Differential Revision: https://reviews.llvm.org/D133688	2022-09-13 18:50:20 -07:00
Craig Topper	8d7e73effe	[RISCV] Teach lowerVECTOR_SHUFFLE to recognize some shuffles as vnsrl. Unary shuffles such as <0,2,4,6,8,10,12,14> or <1,3,5,7,9,11,13,15> where half the elements are returned, can be lowered using vnsrl. SelectionDAGBuilder lowers such shuffles as a build_vector of extract_elements since the mask has less elements than the source. To fix this, I've enable the extractSubvectorIsCheapHook to allow DAGCombine to rebuild the shuffle using 2 extract_subvectors preceding the shufffle. I've gone very conservative on extractSubvectorIsCheapHook to minimize test impact and match what we have test coverage for. This can be improved in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133736	2022-09-13 11:07:11 -07:00
Alex Bradbury	c44c1e9d3e	[RISCV] Implement isMaskAndCmp0FoldingBeneficial hook This hook is currently only used by CodeGenPrepare, which will sink and duplicate an 'and' into a block that has an 'icmp 0' user of it if the hook returns true. This hook is less useful for RISC-V than for targets like AArch64 that have a TBZ (test bit and branch if zero instruction), but may still be profitable if Zbs is available and a BEXTI can be selected. Conservatively, we return false even if Zbs is enabled for any masks that fit in the ANDI immediate because it's possible the only use is a branch on the result, and ANDI+BNEZ => BEXTI+BNEZ isn't a profitable transformation. Differential Revision: https://reviews.llvm.org/D131492	2022-09-13 18:54:00 +01:00
Alex Bradbury	547160848c	[RISCV] Return true in hasBitTest when Zbs is enabled and update BEXTI pattern for resulting canonicalisation As the Zbs extension includes bext[i] for bit extract, we can unconditionally return true from this hook. This hook causes the DAG combiner to perform the following canonicalisation: and (not (srl X, C)), 1 --> (and X, 1<<C) == 0 and (srl (not X), C)), 1 --> (and X, 1<<C) == 0 As simply changing the hook causes a codegen regression, this patch also modifies a BEXTI pattern to match this canonicalised form. As BSETINVMask is now used for BEXT as well as BSET and BINV, it has been renamed to the more generic SingleBitSetMask. There is one codegen change in bittest.ll for bittest_31_i64 (NOT+BEXTI rather than NOT+SRLIW). This is neutral in terms of code quality. Differential Revision: https://reviews.llvm.org/D131482	2022-09-13 16:51:47 +01:00
Craig Topper	5224bae613	[RISCV] Fix a bug in i32 FP_TO_UINT_SAT lowering on RV64. We use the saturating behavior of fcvt.wu.h/s/d but forgot to take into account that fcvt.wu will sign extend the saturated result. According to computeKnownBits a promoted FP_TO_UINT_SAT is expected to zero extend the saturated value. In many case the upper bits aren't be demanded so this wouldn't be an issue. But if we computeKnownBits caused an AND to be removed it would be a bug. This patch inserts an AND during to zero the upper bits. Unfortunately, this pessimizes code if we aren't able to tell if the upper bits are demanded. To fix that we could custom type promote the FP_TO_UINT_SAT with SEXT_INREG after it, but I'll leave that for future work. I haven't found a failure from this, I was revisiting the code to add vector support and spotted it. Differential Revision: https://reviews.llvm.org/D133746	2022-09-13 08:41:32 -07:00
Craig Topper	4186a49d79	[RISCV] Custom type legalize i32 loads by sign extending. The default is to use extload which can become a zextload or sextload if it is followed by an 'and' or sext_inreg. Sometimes type legalization will introduce an 'and' from promoting something like 'srl X, C' and a sext_inreg from from a setcc. The 'and' could be freely folded with the promoted 'srl' by using srliw, but the sext_inreg can't be folded into a compare. DAG combiner will see both of these choices and may decide to fold the 'and' instead of the 'sext_inreg'. This forces the sext_inreg to become a sext.w. By picking sextload in the type legalizer we take this choice away. Looking at spec2006 compiled with Zba and Zbb this appeared to be net reduction in lines of code in the objdump disassembly output. This is similar to what we do with i32 add/sub/mul/shl in type legalization where we always emit a sext_inreg. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130397	2022-09-12 09:13:07 -07:00
Joe Loser	5e96cea1db	[llvm] Use std::size instead of llvm::array_lengthof LLVM contains a helpful function for getting the size of a C-style array: `llvm::array_lengthof`. This is useful prior to C++17, but not as helpful for C++17 or later: `std::size` already has support for C-style arrays. Change call sites to use `std::size` instead. Differential Revision: https://reviews.llvm.org/D133429	2022-09-08 09:01:53 -06:00
Craig Topper	5d30565d80	[RISCV] Improve vector fround lowering by changing FRM. This is a follow up to D133238 which did this for ceil/floor. Reviewed By: arcbbb, frasercrmck Differential Revision: https://reviews.llvm.org/D133335	2022-09-06 09:33:13 -07:00
Craig Topper	f0332d12ae	[RISCV] Improve vector fceil/ffloor lowering by changing FRM. This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the VFCVT. Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from RVV intrinsics as I could. A followup patch will use this approach for FROUND too. Still need to fix the cost model. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D133238	2022-09-05 19:03:44 -07:00
liqinweng	c45810f810	[RISCV] When ISD::SETUGT && Imm == -1, has processed before lowering When ISD::SETUGT && Imm == -1, has processed before lowering. Use assert replace it Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132373	2022-09-01 15:38:16 +08:00
Craig Topper	6e0ae7e940	[RISCV] Slightly simplify coode in combineVWADD_W_VL_VWSUB_W_VL and combineMUL_VLToVWMUL_VL. NFC Use computeMaxSignificantBits instead of ComputeNumSignBits. Create APInt as part of call to MaskedValueIsZero instead of creating a named temporary.	2022-08-31 15:02:03 -07:00
Craig Topper	1c334b306e	[RISCV] Add more invertible setccs to tryDemorganOfBooleanCondition. This builds on D132771 to invert (setlt 0, X) to (setlt X, 1) and vice versa. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132798	2022-08-29 12:23:03 -07:00
Craig Topper	9d12bb77f9	[RISCV] Apply DeMorgan to (beqz (and/or (seteq), (xor Z, 1))) to remove the xor. We can rewrite to (bnez (or/and (setne), Z) is Z is 0/1. Alternatively, we could canonicalize to (xor (or/and (setne), Z), 1) even if there is no branch. The xor would not always get removed, but it might enable other DeMorgan combines. I decided to be conservative for this first patch and require the xor to be removed. I have a couple other invertible setccs I will add in a follow up patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132771	2022-08-29 12:16:34 -07:00
Craig Topper	2f811a6c7f	[VP][RISCV] Add vp.fabs intrinsic and RISC-V support. Mostly just modeled after vp.fneg except there is a "functional instruction" for fneg while fabs is always an intrinsic. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132793	2022-08-29 09:32:06 -07:00
Craig Topper	41a3b5739b	[RISCV] Teach combineDeMorganOfBoolean to handle (and (xor X, 1), (not Y)). SimplifyDemandedBits tries to agressively turn xor immediates into -1 to match a 'not' instruction. In this case, because X is a boolean, the upper bits of (xor X, 1) are known to be 0. Because this is an AND instruction, that means those bits aren't demanded from the other operand, and thus SimplifyDemandedBits can turn (xor Y, 1) to (not Y). We need to detect that this has happened to enable the DeMorgan optimization. To do this we allow one of the xors to use -1 when the outer operation is And. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132671	2022-08-25 10:55:45 -07:00
Craig Topper	ec91d761ac	[RISCV] Apply DeMorgan's law to (and/or (xor X, 1), (xor Y, 1)) if X and Y are 0/1. This optimizes xors that appear due to legalizing setge/setle which require an xor with 1. This reduces the number of xors and may allow the xor to fold with a beqz or bnez. Differential Revision: https://reviews.llvm.org/D132614	2022-08-25 08:49:30 -07:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Craig Topper	1a042dd6ed	[RISCV] Optimize x <s -1 ? x : -1. Improve x >u 1 ? x : 1. Similar to D132211, we can optimize x <s -1 ? x : -1 -> x <s 0 ? x : -1 Also improve the unsigned case from D132211 to use x != 0 which will give a bnez instruction which might be compressible. Differential Revision: https://reviews.llvm.org/D132252	2022-08-21 11:48:28 -07:00
LiaoChunyu	1fb87ace4d	[RISCV] Optimize x > 1 ? x : 1 -> x > 0 ? x : 1 if x == 1, x > 1 ? x : 1 return x, which is also 1. x > 0 ? x : 1 return 1. Reduce the number of load 1 instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132211	2022-08-21 20:26:39 +08:00
Craig Topper	6227b7ae31	[RISCV] Move xori creation for scalar setccs to lowering. This patch enables expansion or custom lowering for some integer condition codes so that any xori that is needed is created before the last DAG combine to enable optimization. I've seen cases where we end up with (or (xori (setcc), 1), (xori (setcc), 1)) which we would ideally convert to (xori (and (setcc), (setcc)), 1). This patch doesn't accomplish that yet, but it should allow us to add DAG combines as follow ups. Example https://godbolt.org/z/Y4qnvsq1b Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131729	2022-08-19 13:51:53 -07:00
Craig Topper	961838cc13	[RISCV] Add passthru operand to RISCVISD::SETCC_VL. Use it to the fix a bug in the fceil/ffloor lowerings. We were setting the passthru to IMPLICIT_DEF before and using a mask agnostic policy. This means where the incoming bits in the mask were 0 they could be anything in the outgoing mask. We want those bits in the outgoing mask to be 0. This means we need to pass the input mask as the passthru. This generates worse code because we are unable to allocate the v0 register to the output due to an earlyclobber constraint. We probably need a special TIED pseudoinstruction and probably custom isel since you can't use V0 twice in the input pattern. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132058	2022-08-19 08:53:44 -07:00
Craig Topper	ba1f4cab44	[RISCV] Copy SDNodeFlags in lowerToScalableOp. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D132177	2022-08-18 20:42:59 -07:00

1 2 3 4 5 ...

806 Commits