llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Craig Topper	12a1ca9c42	[RISCV] Relax another one use restriction in performSRACombine. When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C) it's possible that the add is used by multiple sras. We should allow the combine if all the SRAs will eventually be updated. After transforming all of the sras, the shls will share a single (sext_inreg (add X, C1), i32). This pattern occurs if an sra with 32 is used as index in multiple GEPs with different scales. The shl from the GEPs will be combined with the sra before we get a chance to match the sra pattern.	2022-08-04 14:32:31 -07:00
Craig Topper	a2de12c987	[RISCV] Relax a one use restriction performSRACombine When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) ignore the use count on the (shl X, 32). The sext_inreg after the transform is free. So we're only making 2 new instructions, the add and the shl. So we only need to be concerned with replacing the original sra+add. The original shl can have other uses. This helps if there are multiple different constants being added to the same shl.	2022-08-04 11:25:08 -07:00
Craig Topper	53d560b22f	[RISCV] Prevent infinite loop after D129980. D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop. To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation. Fixes PR56905. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131113	2022-08-03 15:19:07 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Alex Bradbury	28f12a09ae	[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics An unnecessary sext.w is generated when masking the result of the riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed. Although this isn't a particularly important optimisation, removing the sext.w simplifies implementation of an additional cmpxchg-related optimisation in D130192. Although I can't produce a test with different codegen for the other atomics intrinsics, these are added as well for completeness. Differential Revision: https://reviews.llvm.org/D130191	2022-08-03 13:41:58 +01:00
Fraser Cormack	646e2f4803	[VP] Rename VP int<->float conversion ISD opcodes These should be named like the non-VP versions for consistency. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130967	2022-08-03 10:04:38 +01:00
wanglian	e208bab55f	[RISCV][NFC] Use defined variable instead some code. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130687	2022-08-02 16:26:33 +08:00
Lorenzo Albano	71b7c03fd6	[RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121113	2022-08-01 09:23:45 -07:00
Craig Topper	d21b315360	[RISCV] Remove vmerges from vector ceil, floor, trunc lowering. Use masked operations to suppress spurious exception bits being set in fflags. Unfortunately, doing this adds extra copies.	2022-07-30 10:58:41 -07:00
Craig Topper	a23f07fb1d	[RISCV] Add merge operands to more RISCVISD::*_VL opcodes. This adds a merge operand to all of the binary _VL nodes. Including integer and widening. They all share multiclasses in tablegen so doing them all at once was easiest. I plan to use FADD_VL in an upcoming patch. The rest are just for consistency to keep tablegen working. This does reduce the isel table size by about 25k so that's nice. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130816	2022-07-30 10:26:38 -07:00
Craig Topper	9bf305fe2b	[RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes. Based on review feedback from D130816.	2022-07-30 09:57:05 -07:00
Craig Topper	2750873dfe	[RISCV] Update lowerFROUND to use masked instructions. This avoids a vmerge at the end and avoids spurious fflags updates. This isn't used for constrained intrinsic so we technically don't have to worry about fflags, but it doesn't cost much to support it. To support I've extend our FCOPYSIGN_VL node to support a passthru operand. Similar to what was done for VRGATHER*_VL nodes. I plan to do a similar update for trunc, floor, and ceil. Reviewed By: reames, frasercrmck Differential Revision: https://reviews.llvm.org/D130659	2022-07-28 10:05:19 -07:00
Craig Topper	89173dee71	[RISCV] Remove duplicate code. NFC The same operations are part of `FloatingPointVecReduceOps` a little bit earlier.	2022-07-28 10:05:19 -07:00
Craig Topper	1d1d8d6025	[RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC	2022-07-27 17:13:04 -07:00
Craig Topper	98647330bf	[RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL. Similar to what was done for VRGATHER*_VL recently. This will be used in D130659.	2022-07-27 15:25:34 -07:00
LiaoChunyu	bf4f9a468a	[RISCV]Enable isIntDivCheap when attribute is minsize Don't expand divisions by constants when attribute is minsize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130543	2022-07-27 18:22:51 +08:00
Craig Topper	45944e7cf4	[RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC. D130508 handles more constants than just 1 or -1. We need to extract the constant instead of relying isOneConstant or isAllOnesConstant.	2022-07-25 15:54:54 -07:00
jacquesguan	d8800ead62	[RISCV] Scalarize binop followed by extractelement. This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation. Differential Revision: https://reviews.llvm.org/D129545	2022-07-25 17:23:31 +08:00
Craig Topper	9adc00a9d0	[RISCV] Add a continue to reduce nesting. NFC	2022-07-23 17:36:12 -07:00
Kazu Hirata	1cc7f5bede	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2022-07-23 09:22:27 -07:00
Craig Topper	add17fc8e4	[RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale) (srl (and X, 1<<C), C) is the form we receive for testing bit C. An earlier combine removed the setcc so it wasn't there to match when we created the SELECT_CC. This doesn't happen for BR_CC because generic DAG combine rebuilds the setcc if it is used by BRCOND. We can shift X left by XLen-1-C to put the bit to be tested in the MSB, and use a signed compare with 0 to test the MSB.	2022-07-20 22:32:11 -07:00
Craig Topper	7dda6c71b1	[RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function. The only difference between the combines were the calls to getNode that include the true/false values for SELECT_CC or the chain and branch target for BR_CC. Wrap the rest of the code into a helper that reads LHS, RHS, and CC and outputs new values and a bool if a new node needs to be created.	2022-07-20 21:18:07 -07:00
Craig Topper	8983db15a3	[RISCV] Optimize (brcond (seteq (and X, 1 << C), 0)) If C > 10, this will require a constant to be materialized for the And. To avoid this, we can shift X left by XLen-1-C bits to put the tested bit in the MSB, then we can do a signed compare with 0 to determine if the MSB is 0 or 1. Thanks to @reames for the suggestion. I've implemented this inside of translateSetCCForBranch which is called when setcc+brcond or setcc+select is converted to br_cc or select_cc during lowering. It doesn't make sense to do this for general setcc since we lack a sgez instruction. I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31 and between 32 and 63 for both i32 and i64 where applicable. Select has some deficiencies where we receive (and (srl X, C), 1) instead. This doesn't happen for br_cc due to the call to rebuildSetCC in the generic DAGCombiner for brcond. I'll explore improving select in a future patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130203	2022-07-20 18:40:49 -07:00
ksyx	3198364e6e	[RISCV][Clang] Add support for Zmmul extension This patch implements recently ratified extension Zmmul, a subextension of M (Integer Multiplication and Division) consisting only multiplication part of it. Differential Revision: https://reviews.llvm.org/D103313 Reviewed By: craig.topper, jrtc27, asb	2022-07-18 20:26:08 -04:00
Craig Topper	0b02752899	[RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1) (and X, 0xffffffff) requires 2 shifts in the base ISA. Since we know the result is being used by a compare, we can use a sext_inreg instead of an AND if we also modify C1 to have 33 sign bits instead of 32 leading zeros. This can also improve the generated code for materializing C1. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D129980	2022-07-18 10:54:45 -07:00
Simon Pilgrim	259c36e7c1	[DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC.	2022-07-18 13:11:24 +01:00
jacquesguan	2b11174079	[RISCV][NFC] Use more Arrayref in TargetLowering functions. This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125656	2022-07-18 03:33:45 +00:00
Fangrui Song	d955497112	[RISCV] Simplify lowerGlobalAddress. NFC	2022-07-17 15:42:45 -07:00
Craig Topper	decf385c27	[RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR. We were only handling AND before, but SimplifyDemandedBits can also call it for OR and XOR.	2022-07-17 12:36:33 -07:00
Craig Topper	257755530a	[RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32). The former pattern will select as slliw+sraiw while the latter will select as slli+srai. This can enable the slli+srai to be compressed. Differential Revision: https://reviews.llvm.org/D129688	2022-07-13 14:34:17 -07:00
Philip Reames	dde2a7fb6d	[RISCV] Exploit fact that vscale is always power of two to replace urem sequence When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale. vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.) We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.) It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic. Differential Revision: https://reviews.llvm.org/D129609	2022-07-13 10:54:47 -07:00
Craig Topper	c5be6a8308	[RISCV] Use X0 in place of VLMaxSentinel in lowering. I thought I had already fixed all of these, but I guess I missed one.	2022-07-11 23:29:04 -07:00
Craig Topper	c3c17b1695	[RISCV] Use MVT for the argument to getMaskTypeFor. NFC Only one caller didn't already have an MVT and that was easy to fix. Since the return type is MVT and it uses MVT::getVectorVT, taking an MVT as input makes the most sense.	2022-07-11 15:14:44 -07:00
Craig Topper	1a2bd44b77	[RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true. This restores the old behavior before D129402 when enableUnalignedScalarMem is false. This fixes a regression spotted by @asb. To fix this correctly, we need to consider alignment of the load we'd be replacing, but that's not possible in the current interface.	2022-07-11 09:40:08 -07:00
LiaoChunyu	3f68f0f816	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-07-11 14:10:27 +08:00
Craig Topper	35ec8a423d	[RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools. I think it only makes sense to return true here if we aren't going to turn around and create a constant pool for the immmediate. I left out the check for useConstantPoolForLargeInts() thinking that even if you don't want the commpiler to create a constant pool you might still want to avoid materializing an integer that is already available in a global variable. Test file was copied from AArch64/ARM and has not been commited yet. Will post separate review for that. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D129402	2022-07-10 14:10:17 -07:00
Lian Wang	9cfb28d672	[RISCV] Change VECTOR_SPLICE mask operation from expand to promote Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128717	2022-07-08 06:20:22 +00:00
Diego Caballero	bf1758c3dc	Revert "[RISCV] Optimize 2x SELECT for floating-point types" This reverts commit `1178992c72`.	2022-07-07 22:54:00 +00:00
Craig Topper	51d672946e	[RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) Similar for a subtract with a constant left hand side. (sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine for (sext (add (trunc X to i32), 32) to i32). For RISCV, we should lower this as addiw which means turning it into (sext_inreg (add X, C1)). There is an existing DAG combine to convert back to (sext (add (trunc X to i32), 32) to i32), but it requires isTruncateFree to return true and for i32 to be a legal type as it used sign_extend and truncate nodes. So that doesn't work for RISCV. If the outer sra happens be used by a shl by constant, it will be folded and the shift amount of the sra will be changed before we can do our own DAG combine. This requires us to match the more general pattern and restore the shl. I had wanted to do this as a separate (add (shl X, 32), C1<<32) -> (shl (add X, C1), 32) combine, but that hit an infinite loop for some values of C1. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128869	2022-06-30 09:01:24 -07:00
Craig Topper	9ace5af049	[RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C). The sext_inreg can often be folded into an earlier instruction by using a W instruction. The sext_inreg also works better with our ABI. This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128843	2022-06-30 09:01:24 -07:00
Philip Reames	860c62f53c	[RISCV] Refine known bits for READ_VLENB This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two. The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too. Differential Revision: https://reviews.llvm.org/D128758	2022-06-28 15:42:14 -07:00
Lian Wang	96ab083622	[RISCV] Support VECTOR_REVERSE mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128627	2022-06-28 07:48:51 +00:00
LiaoChunyu	1178992c72	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-06-28 12:02:05 +08:00
Craig Topper	ea1b861278	[RISCV] Fix misleading formatting and remove a dead getNode call. NFC	2022-06-27 18:49:57 -07:00
Philip Reames	0533b6e2f6	[RISCV] Remove a use of getMinVLen in favor of getRealMinVLen The later is possibly greater than the former, and thus the assert was overly strong when a wider VLEN was set at the command line.	2022-06-27 12:52:24 -07:00
Philip Reames	a0443dd47c	[RISCV] Simplify 16 bit index handling in lowerVECTOR_REVERSE [nfc] getRealMaxVLen returns an upper bound on the value of VLEN. We can use this upper bound (which unless explicitly set at command line is going to result in a e8 MaxVLMax of much greater than 256) instead of explicitly handling the unknown case separately from the bounded by number greater than 256 case. Note as well that this code already implicitly depends on a capped value for VLEN. If infinite VLEN were possible, than 16 bit indices wouldn't be enough.	2022-06-24 13:08:39 -07:00
Philip Reames	f1e1c3ce77	[RISCV] Replace two calls to getMinRVVVectorSizeInBits in fixed length lowering [nfc] Both of these are only reached if useRVVForFixedLengthVectors is true. Given that, we know that getRealMinVLen() == getMinRVVVectorSizeInBits().	2022-06-24 13:00:57 -07:00
Craig Topper	c579ab53bd	[RISCV] Move vfma_vl+fneg_vl matching to DAG combine. This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with different portions negated. It also adds a DAG combine to peek through FNEG_VL to create these new opcodes. This is modeled after similar code from X86. This makes the isel patterns more regular and reduces the size of the isel table by ~37K. The test changes look like regressions, but they point to a bug that was already there. We aren't able to commute a masked FMA instruction to improve register allocation because we always use a mask undisturbed policy. Prior to this patch we matched two multiply operands in a different order and hid this issue for these test cases, but a different test still could have encountered it. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D128310	2022-06-24 00:00:37 -07:00
Craig Topper	8b10ffabae	[RISCV] Disable <vscale x 1 x > types with Zve32x or Zve32f. According to the vector spec, mf8 is not supported for i8 if ELEN is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32. Since RVVBitsPerBlock is 64 and LMUL is calculated as ((MinNumElements ElementSize) / RVVBitsPerBlock) this means we need to disable any type with MinNumElements==1. For generic IR, these types will now be widened in type legalization. For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan to work on disabling the intrinsics in the riscv_vector.h header. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D128286	2022-06-23 08:49:18 -07:00

1 2 3 4 5 ...

739 Commits