llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	188582b7e0	[RISCV] Considering existing offset in the alignment when folding ADDIs into load/store. getPointerAlignment and ConstantPoolSDNode::getAlign only consider the alignment of the object. If we already have a non-zero offset into the offset that may have reduced the alignment. Since the base pointer will become an LUI with the old offset, we need to be sure the new offset fits in the alignment of the address that will be used to create the LUI immediate. I'm not sure it is possible to have a non-zero offset in the GlobalAddressSDNode or ConstantPoolSDNode at this point today so this may only be a theoretical bug. Differential Revision: https://reviews.llvm.org/D129006	2022-07-01 11:18:40 -07:00
Craig Topper	058d521ea4	[RISCV] Avoid repeated code in SelectAddrRegImm. NFC	2022-06-30 17:22:04 -07:00
Craig Topper	5ca39a55a7	[RISCV] Remove an unnecessary copy of X0 in selectShiftMask. We know which instruction we're emitting so its ok to directly encode X0 into the instruction. We only need to create a copy when a constant 0 is selected without context of what instructions uses it.	2022-06-30 15:11:58 -07:00
Craig Topper	354e04554a	[RISCV] Make custom isel for (add X, imm) used by load/stores more selective. Only handle immediates that would produce an ADDI or ADDIW of Lo12 as the final instruction in their materialization. As the test change show this removes immediates that materialize with lui+addiw that is not the same as lui+addi.	2022-06-30 14:20:11 -07:00
Craig Topper	ae5f5eb2f1	[RISCV] Replace some uses of XLenVT in RISCVDAGToDAGISel::Select with the original Node VT. NFCI These should contain the same thing, but we aren't consistent about which we use. Since we call ReplaceNode, it seems more correct to use the initial VT.	2022-06-30 13:00:44 -07:00
Craig Topper	2b7b609821	[RISCV] Use getVTList to simplify creation of vleff MachineSDNode. NFC We don't need to pass the 3 VTs separately, we already have a list available to us.	2022-06-30 11:34:02 -07:00
Craig Topper	89e7e59621	[RISCV] Use the VT passed into selectImm instead of XLenVT. NFCI I think the VT pased in will always be XLen.	2022-06-30 11:15:28 -07:00
Craig Topper	7cbfb4eb7a	[RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C). If C2 has 32 leading zeros and C3 trailing zeros.	2022-06-29 09:15:09 -07:00
Craig Topper	5dcc525492	[RISCV] Fold (add X, [-4096, -2049]) or (add X, [2048,4096]) into load/store address during isel. Previously we iseled this to a pair of ADDIs and relied on a post isel peephole to fold one of the ADDIs into the load/store. Now we split the immediate in two parts the same way isel does and fold one of the pieces. If the add has a non-memory use it will emit two isels and larger one will CSE with the ADDI we created for the the memory use. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128741	2022-06-28 16:59:39 -07:00
Craig Topper	87077c7eb5	[RISCV] Remove repeated calls to getSExtValue. NFC	2022-06-27 13:42:58 -07:00
Craig Topper	5e944e9eb7	[RISCV] Refactor SelectAddrRegImm to not depend on SelectBaseAddr. SelectBaseAddr was a minor convenience to use since it already' existed for vector load/store. D128187 is going to remove the other uses of SelectBaseAddr so it has less reason to exist. This patch removes the dependency on SelectBaseAddr and adds a new SelectAddrFrameIndex to share some code with SelectFrameAddrRegImm.	2022-06-26 11:11:41 -07:00
Craig Topper	352346fa9e	[RISCV] Refactor code to remove some small wrapper methods and merge two functions together. NFC	2022-06-22 23:04:58 -07:00
Craig Topper	8780630ded	[RISCV] Merge two similar asserts from different if/else blocks. NFC	2022-06-19 19:48:50 -07:00
Craig Topper	cef03e3dcd	[RISCV] Move creation of constant pools from isel to lowering. This simplifies the isel code by removing the manual load creation. It also improves our ability to use 0 strided loads for vector splats. There is an assumption here that Mask and ShiftedMask constants are cheap enough that they don't become constant pool loads so that our isel optimizations involving And still work. I believe those constants are 3 instructions in the worst case. The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics being expanded to RISCVISD also occuring during lowering. So the optimizations were only happening during the last DAGCombine, which can't see through the load. I believe we can fix this test by implementing TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not a ratified extension, I don't view these as blocking this patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127520	2022-06-13 09:07:57 -07:00
Yeting Kuo	f68cad9087	[RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs. The patch is a replacement of D125199. PseudoReadVL with vtype has worry for computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of VLEFF/VLSEGFF to the users of its VL. The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and expand them in RISCVInsertVSETVLI pass. This patch also reverts commit `4537aae0d5`, "[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.". Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126794	2022-06-10 13:57:10 +08:00
Philip Reames	28be4b7454	[RISCV] Simplify InstrInfo access in doPeepholeMaskedRVV [nfc]	2022-06-09 17:02:40 -07:00
Craig Topper	cc3bd43533	[RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI. This fixes an inconsistency between RV32 and RV64. Still considering trying to do this peephole during isel, but wanted to fix the inconsistency first. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126986	2022-06-03 18:06:56 -07:00
Craig Topper	170c550ca8	[RISCV] Use SelectionDAG::isBaseWithConstantOffset in scalar load/store address matching. Test changes are because isBaseWithConstantOffset uses computeKnownBits and that is able to see that an earlier AND instruction guaranteed alignment so that we can treat an OR as an ADD. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126970	2022-06-03 10:55:28 -07:00
Craig Topper	4402852002	[RISCV] Reduce scalar load/store isel patterns to a single ComplexPattern. NFCI Previously we had 3 different isel patterns for every scalar load store instruction. This reduces them to a single ComplexPattern that returns the Base and Offset. Or an offset of 0 if there was no offset identified I've done a similar thing for the 2 isel patterns that match add/or with FrameIndex and immediate. Using the offset of 0, I was also able to remove the custom handler for FrameIndex. Happy to split that to another patch. We might be able to enhance in the future to remove the post-isel peephole or the special handling for ADD with constant added by D126576. A nice side effect is that this removes nearly 3000 bytes from the isel table. Differential Revision: https://reviews.llvm.org/D126932	2022-06-03 09:00:17 -07:00
Craig Topper	dbead2388b	[RISCV] Add custom isel for (add X, imm) used by load/stores. If the imm is out of range for an ADDI, we will materialize it in a register using multiple instructions. If the ADD is used by a load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from the constant materialization into the load/store offset. This only works if the ADD has a single use, otherwise the peephole would have to rebuild multiple nodes. This patch instead tries to solve the problem when the add is selected. We check that the add is only used by loads/stores and if it is we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI used as a pointer. As a result we can remove the more complicated peephole from doPeepholeLoadStoreADDI. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126576	2022-06-02 13:45:32 -07:00
eopXD	2cadf84fc8	[RISCV] Pass OptLevel to `RISCVDAGToDAGISel` correctly Originally, `OptLevel` isn't passed into the `MachineFunctionPass`. This lets the default parameter of `SelectionDAGISel`, which is `CodeGenOpt::Default`, be passed in. OptLevelChanger captures the optimization level with the parameter, and rather not the value within `TargetMachine`. This lets the optimization be unintentionally overwriten if other value than `CodeGenOpt::Default` passed. This patch fixes this by passing the optimization level rather than using the default value. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126641	2022-05-30 17:22:50 -07:00
Craig Topper	b09e54541a	[RISCV] Use template version of SignExtend64 for constant extends. NFC We were inconsistent about which one we used.	2022-05-27 13:11:15 -07:00
Craig Topper	d2ee2c9c8d	[RISCV] Add an operand kind to the opcode/imm returned from RISCVMatInt. Instead of matching opcodes to know the format to emit, use an enum value that we can get from the RISCVMatInt::Inst class. Change the consumers to use fully covered switches so that we get a compiler warning if a new kind is added. With the opcode checks it was easier to forget to update one of the 3 consumers. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126317	2022-05-24 14:56:29 -07:00
Zakk Chen	7dfc56c107	[RISCV] Add the passthru operand for RVV unmasked segment load IR intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125323	2022-05-13 02:16:40 -07:00
Craig Topper	5c7ec998a9	[RISCV] Fold addiw from (add X, (addiw (lui C1, C2))) into load/store address This is a followup to D124231. We can fold the ADDIW in this pattern if we can prove that LUI+ADDI would have produced the same result as LUI+ADDIW. This pattern occurs because constant materialization prefers LUI+ADDIW for all simm32 immediates. Only immediates in the range 0x7ffff800-0x7fffffff require an ADDIW. Other simm32 immediates work with LUI+ADDI. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D124693	2022-05-11 12:47:13 -07:00
Yeting Kuo	4537aae0d5	[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF. The patch make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF. It's useful to get the vtypes of locations of PseudoReadVL without finding the corresponding VLEFF/VLSEGFF. It could simplify optimizations in RISCVInsertVSETVLI like D123581. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125199	2022-05-11 14:07:58 +08:00
Zakk Chen	5807e59a0a	[RISCV] Fix incorrect codegen for masked vmsge{u}.vx with mask agnostic. The result was totally wrong. We could use mask undisturbed result to emulate the mask agnostic result. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124684	2022-05-02 17:57:29 -07:00
Craig Topper	f91690f7db	[RISCV] Don't merge addi into load/store address if addi has a FrameIndex operand. This fixes a crash from D124231. We can't fold (load (add base, (addi src, off1)), off2) -> (load (add base, src), off1+off2) if the src is a FrameIndex. FrameIndex cannot be the operand of an add. There was an immediate==0 check that I think was trying to catch the common case of FrameIndex addis where the immediate is 0, but they can also appear in non-zero form. Instead explicitly check for a FrameIndex operand.	2022-04-29 18:22:20 -07:00
Hsiangkai Wang	c62b014db9	[RISCV] Merge addi into load/store as there is a ADD between them This patch adds peephole optimizations for the following patterns: (load (add base, (addi src, off1)), off2) -> (load (add base, src), off1+off2) (store val, (add base, (addi src, off1)), off2) -> (store val, (add base, src), off1+off2) Differential Revision: https://reviews.llvm.org/D124231	2022-04-29 04:33:05 +00:00
ShihPo Hung	6b55f133fb	[RISCV][RVV] Select unmasked TU RVV pseudos in a DAG post-process Following D118810 that reduced the size of ISel table, this patch optimizes allone-masked RVV pseudos with TU policy and swap them out to their unmasked TU pseudos. Since the UNDEF merge operand is not preserved, we turn it into TA pseudo regardless of the policy operand. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D121881	2022-04-26 20:14:54 -07:00
Lian Wang	3100893f63	[RISCV] Remove sext_inreg+riscv_grev/riscv_gorc isel patterns Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123565	2022-04-14 08:16:32 +00:00
Craig Topper	1903b99154	[RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3). SLLI is always compressible to C.SLLI as long as the source and dest register is the same. ANDI and SRLI are only compressible if the register is x8-x15. By using SLLI we have a better chance of generating shorter code. I had to exclude one exclusion for the BEXTI case so that it's pattern match could still fire. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D123336	2022-04-08 09:04:04 -07:00
Craig Topper	4477500533	[RISCV] ISel (and (shift X, C1), C2)) to shift pair in more cases Previously, these isel optimizations were disabled if the AND could be selected as a ANDI instruction. This patch disables the optimizations only if the immediate is valid for C.ANDI. If we can't use C.ANDI, we might be able to compress the shift instructions instead. I'm not checking the C extension since we have relatively poor test coverage of the C extension. Without C extension the code size should be equal. My only concern would be if the shift+andi had better latency/throughput on a particular CPU. I did have to add a peephole to match SRLIW if the input is zexti32 to prevent a regression in rv64zbp.ll. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122701	2022-03-30 11:46:42 -07:00
Zakk Chen	b578330754	[RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof. masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could check maskedoff value to decide mask policy rather than have a addtional policy operand. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D122456	2022-03-29 18:05:33 -07:00
Zakk Chen	10b2760da0	Revert "[RISCV] Add policy operand for masked compare and vmsbf/vmsif/vmsof IR" This reverts commit `10fd2822b7`. I have a better implementation for those operations without the additional policy operand. masked compare and vmsbf/vmsif/vmsof are always tail agnostic so we could assume undef maskedoff is mask agnostic. Differential Revision: https://reviews.llvm.org/D122455	2022-03-29 18:05:33 -07:00
Zakk Chen	10fd2822b7	[RISCV] Add policy operand for masked compare and vmsbf/vmsif/vmsof IR intrinsics. Those operations are updated under a tail agnostic policy, but they could have mask agnostic or undisturbed. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D120228	2022-03-22 07:47:21 -07:00
Craig Topper	2e10671ec7	[RISCV] Improve detection of when to skip (and (srl x, c2) c1) -> (srli (slli x, c3-c2), c3) isel. We have a special case to skip this transform if c1 is 0xffffffff and x is sext_inreg in order to use sraiw+zext.w. But we were only checking that we have a sext_inreg opcode, not how many bits are being sign extended. This commit adds a check that it is a sext_inreg from i32 so we know for sure that an sraiw can be created.	2022-03-16 14:54:34 -07:00
Zakk Chen	3be907621f	[RISCV] Fix incorrect optimization for masked vmsgeu.vi with 0 immediate. vmsgeu.vi with 0 is always true, but in the masked with mask undisturbed policy, we still need to keep inactive elelemt which come from maskedoff. We could return mask directly if it's mask agnostic policy in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121080	2022-03-06 19:22:35 -08:00
Zakk Chen	33b61c5678	[RISCV] Fix incorrect codegen introduced by D119688. We should not emit a tail agnostic vlse for a tail undisturbed vmv.s.x In D119688: - if (IsScalarMove && !Node->getOperand(0).isUndef()) + bool HasPassthruOperand = Node->getOpcode() != ISD::SPLAT_VECTOR; + if (HasPassthruOperand && !IsScalarMove && !Node->getOperand(0).isUndef()) break; The IsScalarMove check in the if statement had been changed. Differential Revision: https://reviews.llvm.org/D120963	2022-03-05 06:10:26 -08:00
Lian Wang	db85cd729a	[RISCV] Add FMV_W_X and FMV_H_X instrutions to hasAllNBitUsers Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120699	2022-03-01 08:13:59 +00:00
Chenbing Zheng	7f811ce127	[RISCV] Optimize (sext.w, srli) to sraiw with Zba. In this patch, we add a more narrower exclusion for zeroext (srl x) -> srli (slli x), so that it provides an opportunity for the selection of sraiw. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120467	2022-02-28 10:34:35 +08:00
Haocong.Lu	865fe131f8	[RISCV] Fix a mistake in PostprocessISelDAG With the condition N->use_empty(), the root node of DAG always misses peephole optimization. So a dummy node is needed. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119934	2022-02-25 12:38:31 +00:00
Craig Topper	954fe404ab	[RISCV] Fix incorrect MemOperand copy converting splat+load to vlse. Due to an incorrect copy/paste from load intrinsic handling we checked if the splat node was a MemSDNode which of course it isn't. Instead get the MemOperand from the LoadSDNode for the source of the splat. This enables LICM to see the load is loop invariant and hoist it out of the loop. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D120014	2022-02-17 08:15:50 -08:00
Zakk Chen	eeb7754f68	[RISCV] Add the passthru operand for vmv.vv/vmv.vx/vfmv.vf IR intrinsics. Add the passthru operand for VMV_V_X_VL, VFMV_V_F_VL and SPLAT_VECTOR_SPLIT_I64_VL also. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D119688	2022-02-17 06:38:14 -08:00
Fangrui Song	8eb750189c	[RISCV] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-02-10 20:10:12 -08:00
Craig Topper	b861ddf365	[RISCV] Move the creation of VLMaxSentinel to isel. Use X0 during lowering. The VLMaxSentinel is represented as TargetConstant, but that's included in isa<ConstantSDNode>. To keep constant VLs and VLMax separate as long as possible, use the X0 register during lowering and only convert to VLMaxSentinel during isel. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118845	2022-02-10 09:28:44 -08:00
Fraser Cormack	fd43d99c93	[RISCV] Pre-process FP SPLAT_VECTOR to RISCVISD::VFMV_V_F_VL This patch builds on top of D119197 to canonicalize floating-point SPLAT_VECTOR as RISCVISD::VFMV_V_F_VL as a pre-process ISel step. This primarily benefits scalable-vector VP code, where our VP patterns only match VFMV_V_F_VL to reduce the burden on our ISel patterns, but where at the same time, scalable-vector code doesn't custom-legalize SPLAT_VECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117670	2022-02-10 09:56:00 +00:00
Craig Topper	c45c1b130b	[RISCV] Teach RISCVDAGToDAGISel::selectShiftMask to replace sub from constant with neg. If the shift amount is (sub C, X) where C is 0 modulo the size of the shift, we can replace it with neg or negw. Similar is is done for AArch64 and X86. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D119089	2022-02-09 12:33:01 -08:00
Craig Topper	e305b1de7e	[RISCV] Pre-process integer ISD::SPLAT_VECTOR to RISCISD::VMV_V_X_VL before isel. This allows us to remove some isel patterns that exist for both operations. Saving nearly 3000 bytes from the isel table. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D119197	2022-02-09 08:10:21 -08:00
Fraser Cormack	6449bea508	[RISCV] Select unmasked RVV pseudos in a DAG post-process This patch drops TableGen patterns matching all-ones masked RVV pseudos in the case where there are fallback patterns matching the generic masked forms to "_MASK" pseudos. This optimization is now performed with a SelectionDAG post-processing step which peephole-optimizes these same pseudos with all-ones masks and swaps them out to their unmasked pseudos. This cuts our generated ISel table down by around ~5% (~110kB) in lieu of a far smaller auto-generated table to help with the peephole. This only targets our custom RISCVISD::*_VL binary operator nodes, which use the one form for both masked and unmasked variants. A similar approach could be used for our intrinsics but we'd need to do some work, e.g., to represent unmasked intrinsics as true-masked intrinsics at the IR or ISel level. At a rough estimate, this could save us a further 9% on the size of our ISel table for the binary intrinsic patterns alone. There is no observable impact on our tests. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118810	2022-02-09 07:50:15 +00:00
Craig Topper	5f35009996	[RISCV] Remove a ComputeNumSignBits call from an isel special case. Only isel (and (srl (sexti32 Y), c2), c1) -> (srliw (sraiw Y, 31), c3 - 32) when there is a sext_inreg present. Don't both checking for Y having 32 sign bits.	2022-02-04 23:26:53 -08:00
Craig Topper	d752ea9a72	[RISCV] Remove exclusions for zext.h/zext.w from our (and (srl X, C1), C2) selection code. This code tries to replace the pattern with a pair of shifts, but we were excluding if the And could be a zext.h or zext.w. The SLLI/SRL pair is more compressible and doesn't come with much down side. We do regress one test case in rv64i-exhaustive-w-insts.ll but we can probably add a narrower exclusion for that case.	2022-02-04 17:10:48 -08:00
Craig Topper	2349fb0312	[RISCV] Remove RISCVISD::SPLAT_VECTOR_I64 in favor of RISCVISD::VMV_V_X_VL. SPLAT_VECTOR_I64 has the same semantics as RISCVISD::VMV_V_X_VL, it just assumed VLMax instead of carrying a VL operand. Include order of RISCVInstrInfoVSDPatterns.td and RISCVInstrInfoVVLPatterns.td has been swapped to avoid moving riscv_vmv_v_x_vl into RISCVInstrInfoVSDPatterns.td and to allow moving other "_vl" SDNodes back to RISCVInstrInfoVVLPatterns.td Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118841	2022-02-03 08:30:25 -08:00
Craig Topper	f1720abb54	[RISCV] Cleanup some places that assumed VLMaxSentinel and -1 constant mean the same thing. NFCI VLMaxSentintel happens to be represented as -1 TargetConstant. A user provided -1 would be an ISD::Constant. We shouldn't assume that they are the same thing. I'm still not entirely convinced that we should be treating -1 from the user as VLMAX. Also fix one place that failed to use XLenVT for the VLMaxSentinel, using MVT::i64 in code that only executes on RV32.	2022-02-02 12:23:12 -08:00
Alex Bradbury	588f121ada	[RISCV][NFC] Make Zb* instruction naming match the convention used elsewhere in the RISC-V backend Where the instruction mnemonic contains a dot, we name the corresponding instruction in the .td file using a _ in the place of the dot. e.g. LR_W rather than LRW. This commit updates RISCVInstrInfoZb.td to follow that convention.	2022-01-28 15:20:37 +00:00
Zakk Chen	9273378b85	[RISCV] Add the passthru operand for RVV nomask load intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Co-Authored-by: Hsiangkai Wang <Hsiangkai@gmail.com> Reviewers: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D117647	2022-01-25 17:31:36 -08:00
Fraser Cormack	d42678b453	[RISCV] Add side-effect-free vsetvli intrinsics This patch introduces new intrinsics that enable the use of vsetvli in contexts where only the returned vector length is of interest. The pre-existing intrinsics are marked with side-effects, which prevents even trivial optimizations on/across them. These intrinsics are intended to be used in situations where the vector length is fed in turn to RVV intrinsics or to vector-predication intrinsics during loop vectorization, for example. Those codegen paths ensure that instructions are generated with their own implicit vsetvli, so the vector length and vtype can be relied upon to be correct. No corresponding C builtins are planned at this stage, though that is a possibility for the future if the need arises. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117910	2022-01-24 13:52:08 +00:00
Chenbing.Zheng	9ea772ff81	[RISCV] Block vmsgeu.vi with 0 immediate in Isel For vmsgeu.vi with 0, we know this is always true. So we can replace it with vmset.m (unmasked) or vmset.m+vmand.mm (masked). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116584	2022-01-11 03:04:44 +00:00
jacquesguan	d0554ae4cf	[RISCV] Select vl op to X0 when it is equal to ~0. Now the backend will select ~0 vl to a register and load instruction, we could use X0 to replace it. Differential Revision: https://reviews.llvm.org/D116798	2022-01-11 10:56:25 +08:00
jacquesguan	b607cd3928	[RISCV] Use vmv.s.x to build one element splat vector. When we want to create an splat vector that only the first element is initialized, we could use vmv.s.x or vfmv.s.f to build it. Differential Revision: https://reviews.llvm.org/D116277	2022-01-11 10:21:18 +08:00
Craig Topper	b645bcd98a	[RISCV] Generalize (srl (and X, 0xffff), C) -> (srli (slli X, (XLen-16), (XLen-16) + C) optimization. This can be generalized to (srl (and X, C2), C) -> (srli (slli X, (XLen-C3), (XLen-C3) + C). Where C2 is a mask with C3 trailing ones. This can avoid constant materialization for C2. This is beneficial even when C2 can be selected to ANDI because the SLLI can become C.SLLI, but C.ANDI cannot cover all the immediates of ANDI. This also enables CSE in some cases of i8 sdiv by constant codegen.	2022-01-09 23:37:10 -08:00
Craig Topper	296e8cae5c	[RISCV] Isel (sra (sext_inreg X, i16), C) -> (srai (slli X, (XLen-16), (XLen-16) + C). Similar for (sra (sext_inreg X, i8), C). With Zbb, sext_inreg of i8 and i16 are legal for sext.b and sext.h. This transform makes the Zbb codegen the same as without Zbb. The shifts are more compressible. This also exposes an opportunity for CSE with another slli in the i16 sdiv by constant codegen.	2022-01-09 21:23:43 -08:00
jacquesguan	6b8362eb8d	[RISCV] Disable EEW=64 for index values when XLEN=32. Disable EEW=64 for vector index load/store when XLEN=32. Differential Revision: https://reviews.llvm.org/D106518	2022-01-10 10:51:27 +08:00
Craig Topper	2dd52f840b	[RISCV] Fold (srl (and X, 0xffff), C)->(srli (slli X, (XLen-16), (XLen-16) + C) even with Zbb/Zbp. We can use zext.h with Zbb, but srli/slli may offer more opportunities for compression.	2022-01-09 18:42:03 -08:00
Craig Topper	fd992aac19	[RISCV] Use macros to reduce repetive switch cases. NFC These 3 switches map LMUL enum to instruction names. These follow a regular pattern. Use a macro to reduce the number of source code lines. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D116631	2022-01-05 09:00:48 -08:00
wangpc	41454ab256	[RISCV] Use constant pool for large integers For large integers (for example, magic numbers generated by TargetLowering::BuildSDIV when dividing by constant), we may need about 4~8 instructions to build them. In the same time, it just takes two instructions to load constants (with extra cycles to access memory), so it may be profitable to put these integers into constant pool. Reviewed By: asb, craig.topper Differential Revision: https://reviews.llvm.org/D114950	2021-12-31 14:48:48 +08:00
Craig Topper	015ff729cb	[RISCV] Add a few more instructions to hasAllNBitUsers.	2021-12-29 09:17:47 -08:00
Craig Topper	7598ac5ec5	[RISCV] Convert (splat_vector (load)) to vlse with 0 stride. We already do this for splat nodes that carry a VL, but not for splats that use VLMAX. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D115483	2021-12-14 09:14:03 -08:00
Craig Topper	6f7de819b9	[RISCV] Use MULHU for more division by constant cases. D113805 improved handling of i32 divu/remu on RV64. The basic idea from that can be extended to (mul (and X, C2), C1) where C2 is any mask constant. We can replace the and with an SLLI by shifting by the number of leading zeros in C2 if we also shift C1 left by XLen - lzcnt(C1) bits. This will give the full product XLen additional trailing zeros, putting the result in the output of MULHU. If we can't use ANDI, ZEXT.H, or ZEXT.W, this will avoid materializing C2 in a register. The downside is it make take 1 additional instruction to create C1. But since that's not on the critical path, it can hopefully be interleaved with other operations. The previous tablegen pattern is replaced by custom isel code. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D115310	2021-12-09 09:10:14 -08:00
Zakk Chen	0649dfebba	[RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0. Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic. Reviewed By: frasercrmck, jrtc27, craig.topper Differential Revision: https://reviews.llvm.org/D111062	2021-11-04 10:08:01 -07:00
Craig Topper	1387483e72	[RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI Add new hasVInstructions() which is currently equivalent. Replace vector uses of hasStdExtZfh/F/D with new vector specific versions. The vector spec no longer requires that the vectors implement the same types as scalar. It only requires that the scalar type is the maximum size the vectors can support. This is currently implemented using the scalar rule we were using before. Add new hasVInstructionsI64() begin using to qualify code that requires i64 vector elements. This is all NFC for now, but we can start using this to better implement D112408 which introduces the Zve extensions. Reviewed By: frasercrmck, eopXD Differential Revision: https://reviews.llvm.org/D112496	2021-10-27 19:33:48 -07:00
Ben Shi	4fe5ab4b00	[RISCV] Optimize immediate materialisation with SHADD Use SH1ADD/SH2ADD/SH3ADD along with LUI+ADDI to compose int323, int325 and int329. Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D111484	2021-10-15 06:46:41 +00:00
Hsiangkai Wang	80a6456306	[RISCV] Update to vlm.v and vsm.v according to v1.0-rc1. vle1.v -> vlm.v vse1.v -> vsm.v Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106044	2021-10-05 21:49:54 +08:00
Craig Topper	715cf6ffb9	[RISCV] Add another isel optimization for (and (shl X, c2), c1). Where c1 is a shifted mask with 32-c2 leading zeros and c3 trailing zeros and c3>c2. We can select it as (slli (srliw X, c3-c2), c3).	2021-09-24 15:10:25 -07:00
Hsiangkai Wang	7d39a8a921	[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics. Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list. For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below. In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them. * Use dest argument to control tail policy vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument) vfmerge.vfm (add _t builtins with additional dest argument) vmv.v.v (add _t builtins with additional dest argument) vmv.v.x (add _t builtins with additional dest argument) vmv.v.i (add _t builtins with additional dest argument) vfmv.v.f (add _t builtins with additional dest argument) vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument) vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument) * Always has tail argument for masked/unmasked intrinsics Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Reduction Operations (add _t and _mt builtins) Vector Slideup Instructions (add _t and _mt builtins) Vector Slidedown Instructions (add _t and _mt builtins) Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101 Differential Revision: https://reviews.llvm.org/D105092	2021-09-24 17:09:50 +08:00
Craig Topper	70f50114f3	[RISCV] Add another isel optimization for (and (shl x, c2), c1) Turn (and (shl x, c2), c1) -> (slli (srli x, c3-c2), c3) if c1 is a shifted mask with no leading zeros and c3 trailing zeros where c3 is greater than c2.	2021-09-23 14:18:07 -07:00
Craig Topper	4a69551d66	[RISCV] Add more isel optimizations for (and (shr x, c2), c1). Turn (and (shr x, c2), c1) -> (slli (srli x, c2+c3), c3) if c1 is a shifted mask with c2 leading zeros and c3 trailing zeros. When the leading zeros is C2+32 we can use SRLIW in place of SRLI.	2021-09-23 11:29:04 -07:00
Craig Topper	f0a422f935	[RISCV] Add fcvt.s.w(u)/fcvt.d.w(u)/fcvt.h.w(u) to hasAllNBitUsers These instructions only read the lower 32 bits of their input.	2021-09-22 14:24:26 -07:00
Craig Topper	73e5b9ea90	[RISCV] Select (srl (sext_inreg X, i32), uimm5) to SRAIW if only lower 32 bits are used. SimplifyDemandedBits can turn srl into sra if the bits being shifted in aren't demanded. This patch can recover the original sra in some cases. I've renamed the tablegen class for detecting W users since the "overflowing operator" term I originally borrowed from Operator.h does not include srl. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D109162	2021-09-16 11:03:35 -07:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Craig Topper	b5fd6b46f5	[RISCV] Teach instruction selection to elide sext.w in some cases. If a sext_inreg is up for isel, and all its users are W instructions, we can skip emitting the sext_inreg. This helpful if the producing instruction can't become a W instruction. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108966	2021-09-02 07:54:34 -07:00
Craig Topper	e4e69ba4d1	[RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1. X0 has special meaning for vsetvli, we need to make sure we never create it a vsetvli that uses it by accident. This could happen if the register coalescer coalesces a copy from X0 into this instruction. This patch splits the instruction so that we can have GPRNoX0 register class to use for the cases where we don't want the source to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0 operand so we need a separate pseudo for those cases. I don't currently have a failing example for this. There was a failure in D107957, but the coalescable copy from that example should have been optimized away much earlier so I've fixed that. This is not a complete fix. We still need to prevent the same possible issue on the AVL operand of all of the vector instruction pseudos. I don't want to make two versions of all of those so we need to find a different solution for those. I have an idea I'm going to try. Differential Revision: https://reviews.llvm.org/D109110	2021-09-02 07:45:31 -07:00
Craig Topper	3f9b37ccb1	[RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns. Let the sext_inreg be selected to sext.w. Remove unneeded sext.w during PostProcessISelDAG. This gives opportunities for some other isel patterns to match like the ADDIPair or matching mul with immediate to shXadd. This becomes possible after D107658 started selecting W instructions based on users. The sext.w will be considered a W user so isel will often select a W instruction for the sext.w input and we can just remove the sext.w. Otherwise we can combine the sext.w with a ADD/SUB/MUL/SLLI to create a new W instruction in parallel to the the original instruction. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107708	2021-08-18 11:07:11 -07:00
Craig Topper	20e6265873	[RISCV] Improve constant materialization for stores of i16 or i32 negative constants. DAGCombiner::visitStore can clear the upper bits of constants used by stores. This leads prevents them from being recognized as sign extended negative values making them more expensive to materialize. This patch uses the hasAllNBitUsers method from D107658 to make a negative constant if none of the users care about the upper bits. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D108052	2021-08-18 10:25:12 -07:00
Craig Topper	d9ba1a9c5c	[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used. We normally select these when the root node is a sext_inreg, but SimplifyDemandedBits can sometimes bypass the sext_inreg for some users. This can create situation where sext_inreg+add/sub/mul/shl is selected to a W instruction, and then the add/sub/mul/shl is separately selected to a non-W instruction with the same inputs. This patch tries to detect when it would still be ok to use a W instruction without the sext_inreg by checking the direct users. This can allow the W instruction to CSE with one created for a sext_inreg+add/sub/mul/shl. To minimize complexity and cost of checking, we make no attempt to determine if the CSE will happen and just always use a W instruction when we can. Differential Revision: https://reviews.llvm.org/D107658	2021-08-18 10:22:00 -07:00
Craig Topper	98d4adc2d1	[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2) Replace some existing isel patterns that are covered by the new code. SLLIUWPat has been removed in favor of folding its root case into the new code. The other uses in isel patterns for shXadd.uw have been switched to using hardcoded AND masks. This is based on the original version of D49585 from ARM. The final version of that was made a DAG combine, but I've chosen to keep it as custom isel. I'm not convinced DAG combine is as good with shift pairs as it is with and+shift. I saw some issues optimizing the shifts created by vscale lowering if an and isn't created for from a shift pair. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106230	2021-07-20 08:53:55 -07:00
Craig Topper	4dbb788068	[RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions. If the upper 32 bits are zero and bit 31 is set, we might be able to use zext.w to fill in the zeros after using an lui and/or addi. Most of this patch is plumbing the subtarget features into the constant materialization. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D105509	2021-07-16 09:35:56 -07:00
Craig Topper	a37cf17834	[RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling. This is consistent with our other masked vector instructions. Previously we found cases where not doing this broke fast reg alloc.	2021-06-23 08:04:42 -07:00
Craig Topper	420bd5ee8e	[RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32. This helps us select W instructions in more cases. Most of the affected tests have had the sign_extend_inreg or AND folded into sextload/zextload. Differential Revision: https://reviews.llvm.org/D104079	2021-06-10 19:06:45 -07:00
Craig Topper	7c4e9a6826	[RISCV] Use 0 for Log2SEW for vle1/vse1 intrinsics to enable vsetvli optimization. Missed in D103299.	2021-06-07 22:41:14 -07:00
Craig Topper	9b92ae01ee	[RISCV] Store Log2 of EEW in the vector load/store intrinsic to pseudo lookup tables. NFCI This uses 3 bits of data instead of 7. I'm wondering if we can use bitfields for the lookup table key where this would matter. I also name the shift_amount template to log2 since it is used with more than just an srl now.	2021-06-07 15:47:45 -07:00
Craig Topper	c653711fd3	[RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL. All that really matters is that the VLMAX of the preceding instructions is the same as the VLMAX required by the mask operation. Also update the vmsge(u) handling to use the SEW/LMUL we use for other mask register operations. We were matching it to the compare before. Some cases will be improve if we fix masked compares to use tail agnostic policy. I think they ignore the tail policy anyway. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103299	2021-06-04 09:17:46 -07:00
Craig Topper	9065118b64	[RISCV] Optimize SEW=64 shifts by splat on RV32. SEW=64 shifts only uses the log2(64) bits of shift amount. If we're splatting a 64 bit value in 2 parts, we can avoid splatting the upper bits and just let the low bits be sign extended. They won't be read anyway. For the purposes of SelectionDAG semantics of the generic ISD opcodes, if hi was non-zero or bit 31 of the low is 1, the shift was already undefined so it should be ok to replace high with sign extend of low. In order do be able to find the split i64 value before it becomes a stack operation, I added a new ISD opcode that will be expanded to the stack spill in PreprocessISelDAG. This new node is conceptually similar to BuildPairF64, but I expanded earlier so that we could go through regular isel to get the right VLSE opcode for the LMUL. BuildPairF64 is expanded in a CustomInserter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102521	2021-05-26 10:23:32 -07:00
Craig Topper	0a34ff8bcb	[RISCV] Replace AddiPair ComplexPattern with a PatLeaf. NFC The ComplexPattern is looking for an immediate in a certain range that has a single use. This can be handled with a PatLeaf since we aren't matching multiple patterns or checking any complicated relationships between nodes. This shrinks the isel table a little bit since tablegen no longer has to generate patterns with commuted operands. With the PatLeaf, tablegen can see we're matching an immediate which should always be on the right hand side of add. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D102510	2021-05-16 12:17:52 -07:00
Craig Topper	9c345407b4	[RISCV] Remove RISCVII:VSEW enum. Make encodeVYPE operate directly on SEW. The VSEW encoding isn't a useful value to pass around. It's better to use SEW or log2(SEW) directly. The only real ugliness is that the vsetvli IR intrinsics use the VSEW encoding, but it's easy enough to decode that when the intrinsic is processed.	2021-05-12 13:19:08 -07:00
Evandro Menezes	3a64b7080d	[RISCV] Move instruction information into the RISCVII namespace (NFC) Move instruction attributes into the `RISCVII` namespace and add associated helper functions. Differential Revision: https://reviews.llvm.org/D102268	2021-05-11 16:32:42 -05:00
Craig Topper	191ffda3f7	[RISCV] Remove unused ComplexPatterns. NFC	2021-05-06 12:17:41 -07:00
Craig Topper	ba63cdb8f2	[RISCV] Store SEW in RISCV vector pseudo instructions in log2 form. This shrinks the immediate that isel table needs to emit for these instructions. Hoping this allows me to change OPC_EmitInteger to use a better variable length encoding for representing negative numbers. Similar to what was done a few months ago for OPC_CheckInteger. The alternative encoding uses less bytes for negative numbers, but increases the number of bytes need to encode 64 which was a very common number in the RISCV table due to SEW=64. By using Log2 this becomes 6 and is no longer a problem.	2021-05-02 12:09:20 -07:00
Craig Topper	ce09dd54e6	[RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter. This adds a special operand type that is allowed to be either an immediate or register. By giving it a unique operand type the machine verifier will ignore it. This perturbs a lot of tests but mostly it is just slightly different instruction orders. Something bad did happen to some min/max reduction tests. We're spilling vector registers when we weren't before. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101246	2021-04-27 14:38:16 -07:00
Craig Topper	e2cd92cb9b	[RISCV] Match splatted load to scalar load + splat. Form strided load during isel. This modifies my previous patch to push the strided load formation to isel. This gives us opportunity to fold the splat into a .vx operation first. Using a scalar register and a .vx operation reduces vector register pressure which can be important for larger LMULs. If we can't fold the splat into a .vx operation, then it can make sense to use a strided load to free up the vector arithmetic ALU to do actual arithmetic rather than tying it up with vmv.v.x. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101138	2021-04-26 13:32:03 -07:00

1 2 3 4 5 ...

262 Commits