llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	3f9b37ccb1	[RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns. Let the sext_inreg be selected to sext.w. Remove unneeded sext.w during PostProcessISelDAG. This gives opportunities for some other isel patterns to match like the ADDIPair or matching mul with immediate to shXadd. This becomes possible after D107658 started selecting W instructions based on users. The sext.w will be considered a W user so isel will often select a W instruction for the sext.w input and we can just remove the sext.w. Otherwise we can combine the sext.w with a ADD/SUB/MUL/SLLI to create a new W instruction in parallel to the the original instruction. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107708	2021-08-18 11:07:11 -07:00
Craig Topper	20e6265873	[RISCV] Improve constant materialization for stores of i16 or i32 negative constants. DAGCombiner::visitStore can clear the upper bits of constants used by stores. This leads prevents them from being recognized as sign extended negative values making them more expensive to materialize. This patch uses the hasAllNBitUsers method from D107658 to make a negative constant if none of the users care about the upper bits. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D108052	2021-08-18 10:25:12 -07:00
Craig Topper	d9ba1a9c5c	[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used. We normally select these when the root node is a sext_inreg, but SimplifyDemandedBits can sometimes bypass the sext_inreg for some users. This can create situation where sext_inreg+add/sub/mul/shl is selected to a W instruction, and then the add/sub/mul/shl is separately selected to a non-W instruction with the same inputs. This patch tries to detect when it would still be ok to use a W instruction without the sext_inreg by checking the direct users. This can allow the W instruction to CSE with one created for a sext_inreg+add/sub/mul/shl. To minimize complexity and cost of checking, we make no attempt to determine if the CSE will happen and just always use a W instruction when we can. Differential Revision: https://reviews.llvm.org/D107658	2021-08-18 10:22:00 -07:00
Craig Topper	98d4adc2d1	[RISCV] Add custom isel to select (and (srl X, C1), C2) and (and (shl X, C1), C2) Replace some existing isel patterns that are covered by the new code. SLLIUWPat has been removed in favor of folding its root case into the new code. The other uses in isel patterns for shXadd.uw have been switched to using hardcoded AND masks. This is based on the original version of D49585 from ARM. The final version of that was made a DAG combine, but I've chosen to keep it as custom isel. I'm not convinced DAG combine is as good with shift pairs as it is with and+shift. I saw some issues optimizing the shifts created by vscale lowering if an and isn't created for from a shift pair. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106230	2021-07-20 08:53:55 -07:00
Craig Topper	4dbb788068	[RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions. If the upper 32 bits are zero and bit 31 is set, we might be able to use zext.w to fill in the zeros after using an lui and/or addi. Most of this patch is plumbing the subtarget features into the constant materialization. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D105509	2021-07-16 09:35:56 -07:00
Craig Topper	a37cf17834	[RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling. This is consistent with our other masked vector instructions. Previously we found cases where not doing this broke fast reg alloc.	2021-06-23 08:04:42 -07:00
Craig Topper	420bd5ee8e	[RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32. This helps us select W instructions in more cases. Most of the affected tests have had the sign_extend_inreg or AND folded into sextload/zextload. Differential Revision: https://reviews.llvm.org/D104079	2021-06-10 19:06:45 -07:00
Craig Topper	7c4e9a6826	[RISCV] Use 0 for Log2SEW for vle1/vse1 intrinsics to enable vsetvli optimization. Missed in D103299.	2021-06-07 22:41:14 -07:00
Craig Topper	9b92ae01ee	[RISCV] Store Log2 of EEW in the vector load/store intrinsic to pseudo lookup tables. NFCI This uses 3 bits of data instead of 7. I'm wondering if we can use bitfields for the lookup table key where this would matter. I also name the shift_amount template to log2 since it is used with more than just an srl now.	2021-06-07 15:47:45 -07:00
Craig Topper	c653711fd3	[RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL. All that really matters is that the VLMAX of the preceding instructions is the same as the VLMAX required by the mask operation. Also update the vmsge(u) handling to use the SEW/LMUL we use for other mask register operations. We were matching it to the compare before. Some cases will be improve if we fix masked compares to use tail agnostic policy. I think they ignore the tail policy anyway. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103299	2021-06-04 09:17:46 -07:00
Craig Topper	9065118b64	[RISCV] Optimize SEW=64 shifts by splat on RV32. SEW=64 shifts only uses the log2(64) bits of shift amount. If we're splatting a 64 bit value in 2 parts, we can avoid splatting the upper bits and just let the low bits be sign extended. They won't be read anyway. For the purposes of SelectionDAG semantics of the generic ISD opcodes, if hi was non-zero or bit 31 of the low is 1, the shift was already undefined so it should be ok to replace high with sign extend of low. In order do be able to find the split i64 value before it becomes a stack operation, I added a new ISD opcode that will be expanded to the stack spill in PreprocessISelDAG. This new node is conceptually similar to BuildPairF64, but I expanded earlier so that we could go through regular isel to get the right VLSE opcode for the LMUL. BuildPairF64 is expanded in a CustomInserter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102521	2021-05-26 10:23:32 -07:00
Craig Topper	0a34ff8bcb	[RISCV] Replace AddiPair ComplexPattern with a PatLeaf. NFC The ComplexPattern is looking for an immediate in a certain range that has a single use. This can be handled with a PatLeaf since we aren't matching multiple patterns or checking any complicated relationships between nodes. This shrinks the isel table a little bit since tablegen no longer has to generate patterns with commuted operands. With the PatLeaf, tablegen can see we're matching an immediate which should always be on the right hand side of add. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D102510	2021-05-16 12:17:52 -07:00
Craig Topper	9c345407b4	[RISCV] Remove RISCVII:VSEW enum. Make encodeVYPE operate directly on SEW. The VSEW encoding isn't a useful value to pass around. It's better to use SEW or log2(SEW) directly. The only real ugliness is that the vsetvli IR intrinsics use the VSEW encoding, but it's easy enough to decode that when the intrinsic is processed.	2021-05-12 13:19:08 -07:00
Evandro Menezes	3a64b7080d	[RISCV] Move instruction information into the RISCVII namespace (NFC) Move instruction attributes into the `RISCVII` namespace and add associated helper functions. Differential Revision: https://reviews.llvm.org/D102268	2021-05-11 16:32:42 -05:00
Craig Topper	191ffda3f7	[RISCV] Remove unused ComplexPatterns. NFC	2021-05-06 12:17:41 -07:00
Craig Topper	ba63cdb8f2	[RISCV] Store SEW in RISCV vector pseudo instructions in log2 form. This shrinks the immediate that isel table needs to emit for these instructions. Hoping this allows me to change OPC_EmitInteger to use a better variable length encoding for representing negative numbers. Similar to what was done a few months ago for OPC_CheckInteger. The alternative encoding uses less bytes for negative numbers, but increases the number of bytes need to encode 64 which was a very common number in the RISCV table due to SEW=64. By using Log2 this becomes 6 and is no longer a problem.	2021-05-02 12:09:20 -07:00
Craig Topper	ce09dd54e6	[RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter. This adds a special operand type that is allowed to be either an immediate or register. By giving it a unique operand type the machine verifier will ignore it. This perturbs a lot of tests but mostly it is just slightly different instruction orders. Something bad did happen to some min/max reduction tests. We're spilling vector registers when we weren't before. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101246	2021-04-27 14:38:16 -07:00
Craig Topper	e2cd92cb9b	[RISCV] Match splatted load to scalar load + splat. Form strided load during isel. This modifies my previous patch to push the strided load formation to isel. This gives us opportunity to fold the splat into a .vx operation first. Using a scalar register and a .vx operation reduces vector register pressure which can be important for larger LMULs. If we can't fold the splat into a .vx operation, then it can make sense to use a strided load to free up the vector arithmetic ALU to do actual arithmetic rather than tying it up with vmv.v.x. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101138	2021-04-26 13:32:03 -07:00
Craig Topper	baa107f018	[RISCV] Only expose one interface for getContainerForFixedLengthVector in the RISCVTargetLowering class We can have RISCVISelDAGToDAG.cpp call the VT only version by finding the RISCVTargetLowering object via the Subtarget. Make the static versions just global static functions in RISCVISelLowering that can be called by static functions in that file.	2021-04-23 15:06:10 -07:00
Craig Topper	e01c419ecd	[RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi. These instructions don't really exist, but we have ways we can emulate them. .vv will swap operands and use vmsle().vv. .vi will adjust the immediate and use .vmsgt(u).vi when possible. For .vx we need to use some of the multiple instruction sequences from the V extension spec. For unmasked vmsge(u).vx we use: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd For cases where mask and maskedoff are the same value then we have vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that requires a temporary so we use: vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt For other masked cases we use this sequence: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 We trust that register allocation will prevent vd in vmslt{u}.vx from being v0 since v0 is still needed by the vmxor. Differential Revision: https://reviews.llvm.org/D100925	2021-04-22 10:44:38 -07:00
Ben Shi	30e2c7be99	[RISCV] Refactor an optimization of addition with immediate Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100769	2021-04-20 18:04:25 +08:00
Fraser Cormack	d737c47137	[RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates This patch adds more optimized codegen for the above SETCC forms, by matching the '.vi' vector forms when the immediate is a 5-bit signed immediate plus 1. The immediate can be decremented and the corresponding SET[U]LE or SET[U]GT forms can be matched. This work was left as a TODO from D94168. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100096	2021-04-12 18:36:45 +01:00
Craig Topper	ff902080a9	[RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64. We don't need the sign extending behavior here and SLLI/SRLI are able to compress to C.SLLI/C.SRLI.	2021-04-11 13:59:51 -07:00
Craig Topper	bc0e052730	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported. Similar to what we do for zext.w. Disable the (srl (and X, 0xffff), C) custom isel when zext.h is available.	2021-04-11 10:03:35 -07:00
Craig Topper	9895285191	[RISCV] Replace 'return ReplaceNode' with 'ReplaceNode; return;' NFC ReplaceNode is a void function as is the function that we were doing this in. While this is valid code, it was a bit confusing.	2021-04-07 12:18:41 -07:00
Craig Topper	3ae03f67fe	[RISCV] Add helper function to share some of the code for isel of vector load/store intrinsics. Many of the operands are handled the same or in the same order for all these intrinsics. Factor out the code for selecting and pushing them into the Operands vector. Differential Revision: https://reviews.llvm.org/D99923	2021-04-06 09:54:24 -07:00
Craig Topper	cb1028a0b9	[RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register. I missed a few intrinsics in `3dd4aa7d09` when I did this for masked loads and masked segment loads/stores. Found while trying to share more code between these custom isel functions.	2021-04-05 21:28:32 -07:00
Craig Topper	d61b40ed27	[RISCV] Improve 64-bit integer materialization for some cases. This adds a new integer materialization strategy mainly targeted at 64-bit constants like 0xffffffff where there are 32 or more trailing ones with leading zeros. We can materialize these by using an addi -1 and srli to restore the leading zeros. This matches what gcc does. I haven't limited to just these cases though. The implementation here takes the constant, shifts out all the leading zeros and shifts ones into the LSBs, creates the new sequence, adds an srli, and checks if this is shorter than our original strategy. I've separated the recursive portion into a standalone function so I could append the new strategy outside of the recursion. Since external users are no longer using the recursive function, I've cleaned up the external interface to return the sequence instead of taking a vector by reference. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D98821	2021-04-01 09:12:52 -07:00
Craig Topper	3dd4aa7d09	[RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register. This matches what we do in our isel patterns. In our internal testing we've found this is needed to make the fast register allocator happy at -O0. Otherwise it may assign V0 to an earlier operand and find itself with no registers left when it reaches the mask operand. By using V0 explicitly, the fast register allocator will see it when it checks for phys register usages before it starts allocating vregs. I'll try to update this with a test case. Unfortunately, this does appear to prevent some instruction reordering by the pre-RA scheduler which leads to the increased spills seen in some tests. I suspect that problem could already occur for other instructions that already used V0 directly. There's a lot of repeated code here that could do with some wrapper functions. Not sure if that should be at the level of the new code that deals with V0. That would require multiple output parameters to pass the glue, chain and register back. Maybe it should be at a higher level over the entire set of push_backs. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D99367	2021-03-29 10:20:43 -07:00
Craig Topper	5a18c576c4	[RISCV] Don't call CheckAndMask from selectZExti32. Now that targetShrinkDemandedConstant preserves 0xffffffff masks we shouldn't need to call computeKnownBits here.	2021-03-25 22:07:41 -07:00
Craig Topper	c40cea6f08	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff). We look for this pattern frequently in isel patterns so its a good idea to try to preserve it. This also let's us remove our special isel handling for srliw and use a direct pattern match of (srl (and X, 0xffffffff), C) since no bits will be removed from the and mask. Differential Revision: https://reviews.llvm.org/D99042	2021-03-25 09:03:25 -07:00
Craig Topper	839a46d88f	[RISCV] Use selectImm for RV32. NFC Previously we used selectImm for RV64 and isel patterns for RV32. This should be NFC, but will allow RV32 and RV64 to share improvements in the future. For example, it might be useful to use BSETI from Zbs to make single bit constants. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98877	2021-03-23 08:57:15 -07:00
Craig Topper	92b39c6907	[RISCV] Use getTargetExtractSubreg and getTargetInsertSubreg to simplify some code. NFCI	2021-03-17 12:10:19 -07:00
Fraser Cormack	8e7ceffd0b	[RISCV] Fix crash when inserting large fixed-length subvectors This patch addresses a compiler crash resulting from passing a fixed-length type to one that expects scalable vector types. An assertion was added to prevent this regressing in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97868	2021-03-04 09:27:16 +00:00
Fraser Cormack	4ea734e6ec	[RISCV] Unify scalable- and fixed-vector INSERT_SUBVECTOR lowering This patch unifies the two disparate paths for lowering INSERT_SUBVECTOR operations under one roof. Consequently, with this patch it is possible to support any fixed-length subvector insertion, not just "cast-like" ones. As before, support for the insertion of mask vectors will come in a separate patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97543	2021-03-01 11:38:47 +00:00
Craig Topper	b183cbfacd	[RISCV] Call SelectBaseAddr on the base pointer in the custom isel for vector loads and stores. This will allow FrameIndex as the base address instead of emitting a separate ADDI from isel. eliminateFrameIndex will likely turn it back into an ADDI, but this makes things consistent with the SDPatterns and VLPatterns. I only tested one case for simplicity. I can test more if reviewers want. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97221	2021-02-26 11:38:23 -08:00
Fraser Cormack	821f8bb29a	[RISCV] Unify scalable- and fixed-vector EXTRACT_SUBVECTOR lowering This patch unifies the two disparate paths for lowering EXTRACT_SUBVECTOR operations under one roof. Consequently, with this patch it is possible to support any fixed-length subvector extraction, not just "cast-like" ones. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97192	2021-02-25 11:46:57 +00:00
Craig Topper	159f78fc2f	[RISCV] Reuse existing SDLoc and XLenVT in the switch in RISCVISelDAGToDAG::Select. NFC A SDLoc and XLenVT were already created above the switch.	2021-02-24 21:39:00 -08:00
Craig Topper	9bde29629d	[RISCV] Use a ComplexPattern for zexti32 to match sexti32. We just started using a ComplexPattern for sexti32. This updates zexti32 to match. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D97231	2021-02-24 16:06:29 -08:00
Fraser Cormack	dd68f3cf28	[RISCV] Support insertion of misaligned subvectors This patch extends the support for RVV INSERT_SUBVECTOR to cover those which don't align to a vector register boundary. Like the support for EXTRACT_SUBVECTOR in D96959, it accomplishes this by extracting the nearest register-sized subvector (a subregister operation), then sliding the vector down with VSLIDEDOWN, inserting the subvector to the first position, and sliding the vector back up again afterwards. Unlike subvector extraction, for vectors that occupy less than a full vector register we must preserve the untouched elements. We do this by lowering to an LMUL=1 INSERT_SUBVECTOR using the above method and lowering that to a VSLIDEUP with a zero offset. This uses a tail-undisturbed policy and so has the effect of "sliding in" the subvector elements while preserving the surrounding ones. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96972	2021-02-23 10:31:06 +00:00
Craig Topper	3231607ce9	[RISCV] Have sexti32 also recognize AssertZExt from types smaller than i32. An i64 AssertZExt from a type smaller than i32 has at least 33 leading zeros which mean it has at least 33 sign bits. Since we have a couple patterns that use two sexti32, I've switched to a ComplexPattern so tablegen didn't have to generate 9 different permutations. As noted in the FIXME, maybe we should just call computeNumSignBits, but we don't have tests that benefit from that yet. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D97130	2021-02-22 14:56:22 -08:00
Craig Topper	1cd2a5a7da	[RISCV] Add isel support for bitcasts between fixed vector types. This should fix the issue reported in D96972. I don't have a good test case for this without those changes. Differential Revision: https://reviews.llvm.org/D97082	2021-02-22 12:05:46 -08:00
Craig Topper	1aeb927fed	[RISCV] Custom isel the rest of the vector load/store intrinsics. A previous patch moved the index versions. This moves the rest. I also removed the custom lowering for VLEFF since we can now do everything directly in the isel handling. I had to update getLMUL to handle mask registers to index the pseudo table correctly for VLE1/VSE1. This is good for another 15K reduction in llc size. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97097	2021-02-22 09:53:46 -08:00
Fraser Cormack	3e1317fd32	[RISCV] Support extraction of misaligned subvectors This patch extends the support for RVV EXTRACT_SUBVECTOR to cover those which don't align to a vector register boundary. It accomplishes this by extracting the nearest register-sized subvector (a subregister operation), then sliding the vector down with VSLIDEDOWN and extracting the subvector from the first position (a COPY operation). Since this procedure involves the use of VSCALE and multiplication, the handling of such operations is done during lowering to simplify the implementation and make use of DAG combining. This necessitated moving some helper functions from RISCVISelDAGToDAG to RISCVTargetLowering. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96959	2021-02-20 15:43:54 +00:00
Craig Topper	71b68fe532	[RISCV] Teach our custom vector load/store intrinsic isel code to propagate memory operands if we have them. We don't currently create memory operands for these intrinsics, but there was a suggestion of using the indexed load/store intrinsics to implement isel for scalable vector gather/scatter. That may propagate the memory operand from the gather/scatter ISD nodes.	2021-02-19 19:12:20 -08:00
Craig Topper	7f5b3886e4	[RISCV] Remove unneeded indexed segment load/store vector pseudo instruction. We had more combinations of data and index lmuls than we needed. Also add some asserts to verify that the IndexVT and data VT have the same element count when we isel these pseudo instructions.	2021-02-19 10:28:48 -08:00
Craig Topper	d056d5decf	[RISCV] Use custom isel for vector indexed load/store intrinsics. There are many legal combinations of index and data VTs supported for these intrinsics. This results in a lot of isel patterns in RISCVGenDAGISel.inc. By adding a separate table similar to what we use for segment load/stores, we can more efficiently manually select these intrinsics. We should also be able to reuse this table scalable vector gather/scatter. This reduces the llc binary size by ~56K. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D97033	2021-02-19 10:10:06 -08:00
Craig Topper	dbf910f0d9	[RISCV] Prevent selecting a 0 VL to X0 for the segment load/store intrinsics. Just like we do for isel patterns, we need to call selectVLOp to prevent 0 from being selected to X0 by the default isel. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97021	2021-02-19 10:07:12 -08:00
Fraser Cormack	d9531a3097	[RISCV] Address some clang-tidy warnings. NFCI.	2021-02-19 12:10:28 +00:00
Craig Topper	8ed3bbbcc3	[RISCV] Split zvlsseg searchable table into 4 separate tables. Index by properties rather than intrinsic ID. Intrinsic ID is a 32-bit value which made each row of the table 4 byte aligned. The remaining fields used 5 bytes. This meant 3 bytes of padding per row. This patch breaks the table into 4 separate tables and indexes them by properties we know about the intrinsic. NF, masked, strided, ordered, etc. The indexed load/store tables have no padding in their rows now. All together this reduces the size of llc binary by ~28K. I'm considering adding similar tables for isel of non-segment load/store as well to cut down the size of the isel table and probably improve our isel performance. Those tables would need to indexed from intrinsics, IR loads/stores, gathers/scatters, and RISCVISD opcodes. So having a table that can be indexed without using intrinsic ID is more flexible. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D96894	2021-02-18 19:00:49 -08:00

1 2 3

130 Commits