llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	0a34ff8bcb	[RISCV] Replace AddiPair ComplexPattern with a PatLeaf. NFC The ComplexPattern is looking for an immediate in a certain range that has a single use. This can be handled with a PatLeaf since we aren't matching multiple patterns or checking any complicated relationships between nodes. This shrinks the isel table a little bit since tablegen no longer has to generate patterns with commuted operands. With the PatLeaf, tablegen can see we're matching an immediate which should always be on the right hand side of add. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D102510	2021-05-16 12:17:52 -07:00
Hsiangkai Wang	b41e1306b8	[RISCV] Add the DebugLoc parameter to getVLENFactoredAmount(). The MachineBasicBlock::iterator is continuously changing during generating the frame handling instructions. We should use the DebugLoc from the caller, instead of getting it from the changing iterator. If the prologue instructions located in a basic block without any other instructions after these prologue instructions, the iterator will be updated to the boundary of the basic block and it is invalid to use the iterator to access DebugLoc. This patch also fixes the crash when accessing DebugLoc using the iterator. Differential Revision: https://reviews.llvm.org/D102386	2021-05-14 21:31:06 +08:00
Craig Topper	9c345407b4	[RISCV] Remove RISCVII:VSEW enum. Make encodeVYPE operate directly on SEW. The VSEW encoding isn't a useful value to pass around. It's better to use SEW or log2(SEW) directly. The only real ugliness is that the vsetvli IR intrinsics use the VSEW encoding, but it's easy enough to decode that when the intrinsic is processed.	2021-05-12 13:19:08 -07:00
Evandro Menezes	3a64b7080d	[RISCV] Move instruction information into the RISCVII namespace (NFC) Move instruction attributes into the `RISCVII` namespace and add associated helper functions. Differential Revision: https://reviews.llvm.org/D102268	2021-05-11 16:32:42 -05:00
Fangrui Song	ec27c5f170	[RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local Similar to X86 D73230 and AArch64 D101872 With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode, for default visibility external linkage non-ifunc-non-COMDAT definitions. For such dso_local definitions, variable access/taking the address of a function/calling a function will go through a local alias to avoid GOT/PLT. Reviewed By: jrtc27, luismarques Differential Revision: https://reviews.llvm.org/D101875	2021-05-11 11:29:45 -07:00
Craig Topper	ce6e4f27dd	[RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min. My thought process is that if v2i64 is an LMUL=1 type then v2i32 should be an LMUL=1/2 type. We limit the fractional LMUL so that SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This ensures there's always a fractional LMUL available to truncate a type. This does reduce the number of vsetvlis in some cases. Some tests increase vsetvlis because the best container type for a mask type is dependent on the LMUL+SEW that the mask was produced from, but you can't tell that from the type. I think this is something we need to solve this in the machine IR when optimizing vsetvlis. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D101215	2021-05-11 09:42:48 -07:00
Craig Topper	dc00cbb505	[RISCV] Match trunc_vector_vl+sra_vl/srl_vl with splat shift amount to vnsra/vnsrl. Limited to splats because we would need to truncate the shift amount vector otherwise. I tried to do this with new ISD nodes and a DAG combine to avoid such a large pattern, but we don't form the splat until LegalizeDAG and need DAG combine to remove a scalable->fixed->scalable cast before it becomes visible to the shift node. By the time that happens we've already visited the truncate node and won't revisit it. I think I have an idea how to improve i64 on RV32 I'll save for a follow up. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102019	2021-05-11 09:29:31 -07:00
Hsiangkai Wang	d8ec2b183e	[RISCV] Fix the calculation of the offset of Zvlsseg spilling. For Zvlsseg spilling, we need to convert the pseudo instructions into multiple vector load/store instructions with appropriate offsets. For example, for PseudoVSPILL3_M2, we need to convert it to VS2R %v2, %base ADDI %base, %base, (vlenb x 2) VS2R %v4, %base ADDI %base, %base, (vlenb x 2) VS2R %v6, %base We need to keep the size of the offset in the pseudo spilling instructions. In this case, it is (vlenb x 2). In the original implementation, we use the size of frame objects divide the number of vectors in zvlsseg types. The size of frame objects is not necessary exactly the same as the spilling data. It may be larger than it. So, we change it to (VLENB x LMUL) in this patch. The calculation is more direct and easy to understand. Differential Revision: https://reviews.llvm.org/D101869	2021-05-11 10:13:18 +08:00
Craig Topper	80b9510806	[RISCV] Correct VL for fixed length masked scatter. We were incorrectly calling getVectorNumElements on a scalable vector type. This shouldn't be allowed. This gives a warning on EVT, but not MVT.	2021-05-10 09:50:08 -07:00
Fraser Cormack	6db0cedd23	[LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion This patch extends VectorLegalizer::ExpandSELECT to permit expansion also for scalable vector types. The only real change is conditionally checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the vector type. We can use this to fix "cannot select" errors for scalable vector selects on the RISCV target. Note that in future patches RISCV will possibly custom-lower vector SELECTs to VSELECTs for branchless codegen. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102063	2021-05-10 08:22:35 +01:00
Zakk Chen	446ed6394b	[RISCV][NFC] Don't need to create a new STI in RISCVAsmPrinter. RISCVAsmPrinter already has MCSubtargetInfo. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101889	2021-05-10 09:33:23 +08:00
Craig Topper	191ffda3f7	[RISCV] Remove unused ComplexPatterns. NFC	2021-05-06 12:17:41 -07:00
Craig Topper	a577d59db2	[RISCV] Minor vector instruction tablegen cleanup. NFC Use result_type for the IMPLICIT_DEF in masked vector patterns. This doesn't matter today because result_type and op_type are always the same. Use multiclass inheritance to reduce repeated code.	2021-05-06 11:23:59 -07:00
Craig Topper	6660319cef	[RISCV] Remove unused RISCV::VLEFF and VLEFF_MASK. NFC Looks like these got left behind when vleff isel was moved to X86ISelDAGToDAG.cpp	2021-05-06 09:41:29 -07:00
Craig Topper	58323be415	[RISCV] Cleanup instruction formats used for B extension ternary operations. Rename RVInstR4 as used by F/D/Zfh extensions to RVInstR4Frm. Introduce new RVInstR4 that takes funct3 as a parameter. Add new format classes for FSRI and FSRIW instead of trying to bend RVInstR4 to use a shamt overlayed on rs2 and funct2. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100427	2021-05-06 08:59:05 -07:00
Saleem Abdulrasool	ba5c122647	RISSCV: clang-format RISC-V AsmParser (NFC) This corrects a few issues identified by `clang-format`. This is meant to be preparation for a subsequent change.	2021-05-05 10:16:41 -07:00
Fraser Cormack	efc31be7f8	[RISCV][NFC] Fix up pseudoinstruction name in comment	2021-05-05 16:40:28 +01:00
Fraser Cormack	6f17613bfb	[RISCV][VP] Lower VP ISD nodes to RVV instructions This patch supports all of the current set of VP integer binary intrinsics by lowering them to to RVV instructions. It does so by using the existing RISCVISD *_VL custom nodes as an intermediate layer. Both scalable and fixed-length vectors are supported by using this method. One notable change to the existing vector codegen strategy is that scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length BUILD_VECTOR counterparts. This allows them to reuse the existing "all-ones" VL patterns. To reduce the size of the phabricator diff, some tests are intentionally left out and will be added later if the patch is accepted. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101826	2021-05-05 12:32:24 +01:00
Fraser Cormack	cd6a52fede	[RISCV] Cap legal fixed-length vectors to 256-element types Previously, RISC-V would make legal all fixed-length vectors types whose size are less than or equal to some function of the minimum value of VLEN and the maximum-permissible LMUL grouping. Due to vector legalization issues, this patch instead caps the legal fixed-length vector types to those with 256 elements. This value was chosen because it is the longest vector length which has corresponding MVTs across all supported element types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101839	2021-05-05 09:51:08 +01:00
Fraser Cormack	46fa214a6f	[RISCV] Lower splats of non-constant i1s as SETCCs This patch adds support for splatting i1 types to fixed-length or scalable vector types. It does so by lowering the operation to a SETCC of the equivalent i8 type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101465	2021-05-04 09:14:05 +01:00
Fraser Cormack	d23e4f6872	[RISCV] Add support for fmin/fmax vector reductions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101518	2021-05-03 10:33:51 +01:00
Craig Topper	ba63cdb8f2	[RISCV] Store SEW in RISCV vector pseudo instructions in log2 form. This shrinks the immediate that isel table needs to emit for these instructions. Hoping this allows me to change OPC_EmitInteger to use a better variable length encoding for representing negative numbers. Similar to what was done a few months ago for OPC_CheckInteger. The alternative encoding uses less bytes for negative numbers, but increases the number of bytes need to encode 64 which was a very common number in the RISCV table due to SEW=64. By using Log2 this becomes 6 and is no longer a problem.	2021-05-02 12:09:20 -07:00
Fraser Cormack	791766e6d2	[RISCV] Support STEP_VECTOR with a step greater than one DAGCombiner was recently taught how to combine STEP_VECTOR nodes, meaning the step value is no longer guaranteed to be one by the time it reaches the backend for lowering. This patch supports such cases on RISC-V by lowering to other step values to a multiply following the vid.v instruction. It includes a small optimization for common cases where the multiply can be expressed as a shift left. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D100856	2021-04-30 09:36:18 +01:00
luxufan	5603ed60ad	[RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field Differential Revision: https://reviews.llvm.org/D100286	2021-04-30 11:02:38 +08:00
Craig Topper	dcdda2bdf2	[RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c)) Similar for or/xor with 0 in place of -1. This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction. The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier. I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select. The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D101485	2021-04-29 09:43:51 -07:00
Craig Topper	0c330afdfa	[RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32. This replaces D98479. This allows type legalization to form SPLAT_VECTOR_PARTS so we don't lose the splattedness when the scalar type is split. I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so we can continue using non-VL nodes for scalable vectors. I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR with other operations. Especially interesting is a splat BUILD_VECTOR of the extract_vector_elt which can become a splat shuffle, but won't if we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR or add visitSPLAT_VECTOR. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100803	2021-04-29 08:20:09 -07:00
Craig Topper	25391cec3a	[RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31. This seems like a reasonable upper bound on VL. WG discussions for the V spec would probably allow us to use 2^16 as an upper bound on VLEN, but this is good enough for now. This allows us to remove sext and zext if user happens to assign the size_t result into an int and then uses it as a VL intrinsic argument which is size_t. Reviewed By: frasercrmck, rogfer01, arcbbb Differential Revision: https://reviews.llvm.org/D101472	2021-04-29 08:07:59 -07:00
Fraser Cormack	43ad058a01	[RISCV] Fix stack slot for argument types (Bug 49500) This is an complementary/alternative fix for D99068. It takes a slightly different approach by explicitly summing up all of the required split part type sizes and ensuring we allocate enough space for them. It also takes the maximum alignment of each part. Compared with D99068 there are fewer changes to the stack objects in existing tests. However, @luismarques has shown in that patch that there are opportunities to reduce our stack usage in the future. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D99087	2021-04-29 09:10:48 +01:00
Craig Topper	1d4d6a9616	[RISCV] Add explanatory comment to RISCVOp::OPERAND_AVL.	2021-04-28 09:55:36 -07:00
Craig Topper	ce09dd54e6	[RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter. This adds a special operand type that is allowed to be either an immediate or register. By giving it a unique operand type the machine verifier will ignore it. This perturbs a lot of tests but mostly it is just slightly different instruction orders. Something bad did happen to some min/max reduction tests. We're spilling vector registers when we weren't before. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101246	2021-04-27 14:38:16 -07:00
Craig Topper	262a72f50f	[RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32. Reduces the amount of vector ALU operations and reduces vector register pressure.	2021-04-26 15:43:02 -07:00
Craig Topper	e2cd92cb9b	[RISCV] Match splatted load to scalar load + splat. Form strided load during isel. This modifies my previous patch to push the strided load formation to isel. This gives us opportunity to fold the splat into a .vx operation first. Using a scalar register and a .vx operation reduces vector register pressure which can be important for larger LMULs. If we can't fold the splat into a .vx operation, then it can make sense to use a strided load to free up the vector arithmetic ALU to do actual arithmetic rather than tying it up with vmv.v.x. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101138	2021-04-26 13:32:03 -07:00
Craig Topper	837442de9c	[RISCV] Cleanup setOperationAction calls for INTRINSIC_WO_CHAIN/INTRINSIC_W_CHAIN We have several extensions that need i32 to be Custom for INTRINSIC_WO_CHAIN with RV64 so enable it for all RV64. For V extension, make i32 Custom for RV64 and i64 Custom for RV32. When the i32 or i64 is legal, the operation action doesn't matter. LegalizeDAG checks MVT::Other rather than the real type.	2021-04-25 23:44:28 -07:00
Ben Shi	60ed86d350	[RISCV] Optimize addition with immediate Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D101244	2021-04-26 13:26:17 +08:00
Craig Topper	8f5cd49405	[RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs. This teaches DAG combine that shift amount operands for grev, gorc shfl, unshfl only read a few bits. This also teaches DAG combine that grevw, gorcw, shflw, unshflw, bcompressw, bdecompressw only consume the lower 32 bits of their inputs. In the future we can teach SimplifyDemandedBits to also propagate demanded bits of the output to the inputs in some cases.	2021-04-25 21:54:06 -07:00
Levy Hsu	8cf54c7ff5	[RISCV] [1/2] Add IR intrinsic for Zbe extension RV32/64: bcompress bdecompress RV64 ONLY: bcompressw bdecompressw Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101143	2021-04-25 19:14:34 -07:00
Craig Topper	bd28d86119	[RISCV] Removed getLMULForFixedLengthVector. Use getContainerForFixedLengthVector and getRegClassIDForVecVT to get the register class to use when making a fixed vector type legal. Inline it into the other two call sites. I'm looking into using fractional lmul for fixed length vectors and getLMULForFixedLengthVector returned an integer making it unable to express this. I considered returning the LMUL enum, but that seemed like it would introduce more complexity to convert it for use.	2021-04-23 16:56:46 -07:00
Craig Topper	bcf321015b	[RISCV] Move getLMULForFixedLengthVector out of RISCVSubtarget. Make it a static function RISCVISelLowering, the only place it is used. I think I'm going to make this return a fractional LMULs in some cases so I'm sorting out where it should live before I start making changes.	2021-04-23 15:06:20 -07:00
Craig Topper	baa107f018	[RISCV] Only expose one interface for getContainerForFixedLengthVector in the RISCVTargetLowering class We can have RISCVISelDAGToDAG.cpp call the VT only version by finding the RISCVTargetLowering object via the Subtarget. Make the static versions just global static functions in RISCVISelLowering that can be called by static functions in that file.	2021-04-23 15:06:10 -07:00
Craig Topper	3064a63b2b	[RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions. Theses instructions are allowed to write v0 when they are masked. We'll still never use v0 because of the earlyclobber constraint so this doesn't really help anything. It just makes the definitions correct. While I was there remove an unused multiclass I noticed. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101118	2021-04-23 09:33:29 -07:00
Craig Topper	fae1d31c09	[RISCV] Have assembler check that the temp register is different than dest register for vmsgeu.vx pseudo. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101015	2021-04-23 09:33:29 -07:00
Sander de Smalen	f9a50f04ba	[TTI] NFC: Change getIntImmCost[Inst\|Intrin] to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565	2021-04-23 16:06:36 +01:00
Fraser Cormack	83b8f8da82	[RISCV] Custom lower vector F(MIN\|MAX)NUM to vf(min\|max) This patch adds support for both scalable- and fixed-length vector code lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent RVV instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101035	2021-04-23 12:22:15 +01:00
Levy Hsu	b49337bbb9	[RISCV] [1/2] Add IR intrinsic for Zbp extension RV32/64: grev grevi gorc gorci shfl shfli unshfl unshfli RV64 ONLY: grevw greviw gorcw gorciw shflw shfli (For non-existing shfliw) unshfli (For non-existing unshfliw) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100830	2021-04-22 16:34:51 -07:00
Craig Topper	e01c419ecd	[RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi. These instructions don't really exist, but we have ways we can emulate them. .vv will swap operands and use vmsle().vv. .vi will adjust the immediate and use .vmsgt(u).vi when possible. For .vx we need to use some of the multiple instruction sequences from the V extension spec. For unmasked vmsge(u).vx we use: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd For cases where mask and maskedoff are the same value then we have vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that requires a temporary so we use: vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt For other masked cases we use this sequence: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 We trust that register allocation will prevent vd in vmslt{u}.vx from being v0 since v0 is still needed by the vmxor. Differential Revision: https://reviews.llvm.org/D100925	2021-04-22 10:44:38 -07:00
Craig Topper	d77d56acfd	[RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics. Refactor to use new multiclass instead of individual patterns. We already supported this due to SEW=64 on RV32, but we didn't have test cases for all the types we supported. Part of D100925	2021-04-22 10:44:38 -07:00
Craig Topper	9524a0553d	[RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics. We don't have instructions for these, but can swap the operands to use vmle/vmflt. This makes the IR interface more consistent and simplifies the frontend implementation. Part of D100925	2021-04-22 10:44:38 -07:00
Craig Topper	70254ccb69	[RISCV] Turn splat shuffles of vector loads into strided load with stride of x0. Implementations are allowed to optimize an x0 stride to perform less memory accesses. This is the case in SiFive cores. No idea if this is the case in other implementations. We might need a tuning flag for this. Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D100815	2021-04-22 10:02:57 -07:00
Craig Topper	77f14c96e5	[RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32. Rather than doing splatting each separately and doing bit manipulation to merge them in the vector domain, copy the data to the stack and splat it using a strided load with x0 stride. At least on some implementations this vector load is optimized to not do a load for each element. This is equivalent to how we move i64 to f64 on RV32. I've only implemented this for the intrinsic fallbacks in this patch. I think we do similar splatting/shifting/oring in other places. If this is approved, I'll refactor the others to share the code. Differential Revision: https://reviews.llvm.org/D101002	2021-04-22 09:50:07 -07:00
Serge Pavlov	740962e5d0	[RISCV] Custom lowering of SET_ROUNDING Differential Revision: https://reviews.llvm.org/D91242	2021-04-22 15:04:55 +07:00
Craig Topper	58c5b4c2c3	[RISCV] Use TargetConstant for condition code of RISCVISD::SELECT_CC. The value is always an immediate and can never be in a register. This the kind of thing TargetConstant is for. Saves a step GenDAGISel to convert a Constant to a TargetConstant.	2021-04-21 23:08:52 -07:00
Serge Pavlov	6e63dfdae2	[RISCV] Custom lowering of FLT_ROUNDS_ Differential Revision: https://reviews.llvm.org/D90854	2021-04-22 11:39:15 +07:00
Craig Topper	f6d8cf7798	[RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo. This recognizes the case when Hi is (sra Lo, 31). We can use SPLAT_VECTOR_I64 rather than splatting the high bits and combining them in the vector register.	2021-04-21 20:24:23 -07:00
Craig Topper	023b243d1d	[RISCV] Cleanup up the spec version references around fmaxnum/fminnum. This previously made references to 2.3-draft which was a short lived version number in 2017. It was replaced by date based versions leading up to ratification. This patch uses the latest ratified version number and just says what the behavior is. Nothing here is in flux. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100878	2021-04-21 14:50:29 -07:00
Craig Topper	a8822caa1b	[RISCV] Temporary in vmsge(u).vx pseudo instructions can't be V0. This was checked in some asserts, but not enforced by the instruction matching. There's still a second bug that we don't check that vt and vd are different registers, but that will require custom checking. Differential Revision: https://reviews.llvm.org/D100928	2021-04-21 14:50:29 -07:00
Fraser Cormack	3f02d26943	[RISCV] Further fixes for RVV stack offset computation This patch fixes a case missed out by D100574, in which RVV scalable stack offset computations may require three live registers in the case where the offset's fixed component is 12 bits or larger and has a scalable component. Instead of adding an additional emergency spill slot, this patch further optimizes the scalable stack offset computation sequences to reduce register usage. By emitting the sequence to compute the scalable component before the fixed component, we can free up one scratch register to be reallocated by the sequence for the fixed component. Doing this saves one register and thus one additional emergency spill slot. Compare: $x5 = LUI 1 $x1 = ADDIW killed $x5, -1896 $x1 = ADD $x2, killed $x1 $x5 = PseudoReadVLENB $x6 = ADDI $x0, 50 $x5 = MUL killed $x5, killed $x6 $x1 = ADD killed $x1, killed $x5 versus: $x5 = PseudoReadVLENB $x1 = ADDI $x0, 50 $x5 = MUL killed $x5, killed $x1 $x1 = LUI 1 $x1 = ADDIW killed $x1, -1896 $x1 = ADD $x2, killed $x1 $x1 = ADD killed $x1, killed $x5 Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100847	2021-04-21 10:51:07 +01:00
Zakk Chen	ad0fe5db2f	[RISCV][MC] Mask load should not have VMConstraint. Add a test, dest register could be v0. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100825	2021-04-21 15:21:37 +08:00
Serge Pavlov	d20a2376d8	[RISCV] Introduce floating point control and state registers New registers FRM, FFLAGS and FCSR was defined. They represent corresponding system registers. The new registers are necessary to properly order floating point instructions in non-default modes. Differential Revision: https://reviews.llvm.org/D99083	2021-04-21 12:55:30 +07:00
Ben Shi	30e2c7be99	[RISCV] Refactor an optimization of addition with immediate Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100769	2021-04-20 18:04:25 +08:00
Fraser Cormack	b4a358a7ba	[RISCV] Fix missing emergency slots for scalable stack offsets This patch adds an additional emergency spill slot to RVV code. This is required as RVV stack offsets may require an additional register to compute. This patch includes an optimization by @HsiangKai <kai.wang@sifive.com> to reduce the number of registers required for the computation of stack offsets from 3 to 2. Otherwise we'd need two additional emergency spill slots. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100574	2021-04-20 09:59:41 +01:00
Zakk Chen	d5fa71e9ec	[RISCV] Handle PseudoVRELOAD and PseudoVSPILL in getInstSizeInBytes. It's necessary to calculate correct instruction size because PseudoVRELOAD and PseudoSPILL will be expanded into multiple instructions. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100702	2021-04-19 22:30:03 -07:00
Craig Topper	87afefcd22	[RISCV] Fix mistake in comment. NFC	2021-04-19 11:15:32 -07:00
Craig Topper	7ed01a420a	[RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte. As noted in the FIXME there's a sort of agreement that the any extra bits stored will be 0. The generated code is pretty terrible. I was really hoping we could use a tail undisturbed trick, but tail undisturbed no longer applies to masked destinations in the current draft spec. Fingers crossed that it isn't common to do this. I doubt IR from clang or the vectorizer would ever create this kind of store. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100618	2021-04-19 11:05:18 -07:00
Fraser Cormack	c9a93c3e01	[RISCV] Lower vector shuffles to vrgather operations This patch extends the lowering of RVV fixed-length vector shuffles to avoid the default stack expansion and instead lower to vrgather instructions. For "permute"-style shuffles where one vector is swizzled, we can lower to one vrgather. For shuffles involving two vector operands, we lower to one unmasked vrgather (or splat, where appropriate) followed by a masked vrgather which blends in the second half. On occasion, when it's not possible to create a legal BUILD_VECTOR for the indices, we use vrgatherei16 instructions with 16-bit index types. For 8-bit element vectors where we may have indices over 255, we have a fairly blunt fallback to the stack expansion to avoid custom-splitting of the vector types. To enable the selection of masked vrgather instructions, this patch extends the various RISCVISD::VRGATHER nodes to take a passthru operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100549	2021-04-19 11:13:13 +01:00
Jim Lin	2893570e86	[RISCV] Don't emit save-restore call if function is a interrupt handler It has to save all caller-saved registers before a call in the handler. So don't emit a call that save/restore registers. Reviewed By: simoncook, luismarques, asb Differential Revision: https://reviews.llvm.org/D100532	2021-04-16 12:54:47 +08:00
Craig Topper	1656df13da	[RISCV] Share RVInstIShift and RVInstIShiftW instruction format classes with the B extension. This generalizes RVInstIShift/RVInstIShiftW to take the upper 5 or 7 bits of the immediate as an input instead of only bit 30. Then we can share them. For RVInstIShift I left a hardcoded 0 at bit 26 where RV128 gets a 7th bit for the shift amount. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100424	2021-04-15 11:08:28 -07:00
Craig Topper	c3f1271464	[RISCV] Add a PatFrag to shorten repeated (XLenVT (VLOp GPR:$vl)) in V extension patterns. Reduces the amount of changes needed in D100288.	2021-04-14 22:36:35 -07:00
ShihPo Hung	d5e962f1f2	[RISCV] Implement COPY for Zvlsseg registers When copying Zvlsseg register tuples, we split the COPY to NF whole register moves as below: $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2 => $v10m2 = PseudoVMV2R_V $v4m2 $v12m2 = PseudoVMV2R_V $v6m2 This patch copies forwardCopyWillClobberTuple from AArch64 to check register overlapping. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100280	2021-04-13 18:55:51 -07:00
root	645ce31c20	Title: [RISCV] Add missing part of instruction vmsge {u}. VX Review By: craig.topper Differential Revision : https://reviews.llvm.org/D100115	2021-04-14 06:41:59 +08:00
Craig Topper	6aa6f748ae	[RISCV] Add a generic PatGprImm class and use it to simplify patterns in RISCVInstrInfoB.td. NFC	2021-04-13 12:07:24 -07:00
Craig Topper	cb073f1bc0	[RISCV] Make use of PatGprGpr and PatGpr in RISCVInstrInfoB.td. NFC	2021-04-13 12:06:58 -07:00
Craig Topper	1afdfc6169	[RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant. Prep work for adding intrinsics for these instructions in the future.	2021-04-13 11:04:28 -07:00
Sander de Smalen	fd1f8a5462	[TTI] NFC: Change getGatherScatterOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100200	2021-04-13 14:20:59 +01:00
Craig Topper	7c9bbbf735	[RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate. Prep work for adding intrinsics in the future. Left an assert that the input is constant in ReplaceNodeResults, as the intrinsic shouldn't go through that path.	2021-04-12 23:46:50 -07:00
Fraser Cormack	d737c47137	[RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates This patch adds more optimized codegen for the above SETCC forms, by matching the '.vi' vector forms when the immediate is a 5-bit signed immediate plus 1. The immediate can be decremented and the corresponding SET[U]LE or SET[U]GT forms can be matched. This work was left as a TODO from D94168. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100096	2021-04-12 18:36:45 +01:00
Jim Lin	a3bfddbb6a	[RISCV][NFC] Remove unneeded explict XLenVT type on codegen patterns Customized SDNode has been specified the explict XLenVT type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100190	2021-04-12 10:16:06 +08:00
Craig Topper	cb4c793e46	[RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned. According to the 0.10 spec, VLEN is at least 128 bits and is a power of 2.	2021-04-11 17:54:23 -07:00
Craig Topper	ff902080a9	[RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64. We don't need the sign extending behavior here and SLLI/SRLI are able to compress to C.SLLI/C.SRLI.	2021-04-11 13:59:51 -07:00
Craig Topper	3ae71226ef	[RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, vfwadd.wf and vfwsub.wf. The first source has the same EEW as the destination and the other source is a scalar so the overlap constraints don't apply to the unmasked version. For the masked version we have a constraint that the destination can't be V0 so that covers the only overlap issue there. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D100217	2021-04-11 10:19:45 -07:00
Craig Topper	bc0e052730	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported. Similar to what we do for zext.w. Disable the (srl (and X, 0xffff), C) custom isel when zext.h is available.	2021-04-11 10:03:35 -07:00
Jim Lin	7eaa2810c4	[RISCV][NFC] Replace explicit type i64 with riscv customized SDTypeProfile. New SDTypeProfile can be reused for other word operation patterns without explicit i64 type in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100097	2021-04-09 17:06:17 +08:00
Jim Lin	6169f1537c	[RISCV][NFC] Fix formatting	2021-04-09 14:41:09 +08:00
Jim Lin	49c79e3b56	[RISCV][NFC] Add explicit type i64 to RV64 only patterns. Add explicit type i64 to RV64 only patterns to stop emitting unneeded i32 patterns. It can reduce the isel table size. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100089	2021-04-09 09:37:04 +08:00
Craig Topper	872931e5d8	[RISCV] Use multiclass inheritance where possible for the VPat* multiclasses in RISVInstrInfoVPseudos. NFCI Instead of instantiating multiclasses inside multiclasses, just inherit from them. We can do the same for the VPseudo* multiclasses, but that may interfere with the scheduler class work.	2021-04-08 15:14:06 -07:00
Craig Topper	ac347a8a0f	[RISCV] Remove empty string after 'defm' at top level of vector .td files. NFC This doesn't do anything so it's just wasted characters. I have other plans for the ones in multiclasses.	2021-04-08 15:14:06 -07:00
Levy Hsu	461b554999	[RISCV] Add InstAlias for Zbb Zbp and Zbs extension Add InstAlias that allows the last operand to be an imm for following instructions: 1. Zbb or Zbp: - ror - rorw (RV64 Only) 2. Zbs - best - bclr - binv - bext Reviewed By: craig.topper, jrtc27 Differential Revision: https://reviews.llvm.org/D100083	2021-04-08 11:51:31 -07:00
Fraser Cormack	a5693445ca	[RISCV] Support OR/XOR/AND reductions on vector masks This patch adds RVV codegen support for OR/XOR/AND reductions for both scalable- and fixed-length vector types. There are a few possible codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be used to some extent -- but the vpopc.m instruction was chosen since it produces the scalar result in one instruction, after which scalar instructions can finish off the computation. The reductions are lowered identically for both scalable- and fixed-length vectors, although some alternate strategies may be more optimal on fixed-length vectors since it's cheaper to get the length of those types. Other reduction types were not deemed to be relevant for mask vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100030	2021-04-08 09:46:38 +01:00
Hsiangkai Wang	ba72bdef32	[RISCV] Add scalable offset under very large stack size. If the stack size is larger than 12 bits, we have to use a scratch register to store the stack size. Before we introduce the scalable stack offset, we could simplify %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %0 = ADD %fp, %scratch However, if the offset contains scalable part, we need to consider it. %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %scratch = ADD %fp, %scratch %scalable_offset = ... # sequence of instructions for vscaled-offset. %0 = ADD/SUB %scratch, %scalable_offset Differential Revision: https://reviews.llvm.org/D100035	2021-04-08 14:46:05 +08:00
Serge Pavlov	65b1103798	[RISCV] DAG nodes and pseudo instructions for CSR access New custom DAG nodes were added to represent operations on CSR. These nodes are lowered to corresponding pseudo instruction. Using the pseudo instructions allows to specify different scheduling information for operations on different system registers. It also make possible to specify dependencies of instructions on specific system registers. Differential Revision: https://reviews.llvm.org/D98936	2021-04-08 10:36:36 +07:00
Craig Topper	56ea2e2fdd	[RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition. If the constants have a difference of 1 we can convert one to the other by adding or subtracting the condition. We have a DAG combine for this, but it only runs before type legalization. If the select is introduced later during type legalization or op legalization we will miss it. We don't need a specific condition, but some conditions are harder to materialize than others on RISCV. I know that SETLT will be a single instruction and it is what is used by the motivating pattern from signed saturating add/sub. Differential Revision: https://reviews.llvm.org/D99021	2021-04-07 13:47:17 -07:00
Craig Topper	9895285191	[RISCV] Replace 'return ReplaceNode' with 'ReplaceNode; return;' NFC ReplaceNode is a void function as is the function that we were doing this in. While this is valid code, it was a bit confusing.	2021-04-07 12:18:41 -07:00
Craig Topper	f087d7544a	[RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32. This can't use our normal strategy of splatting the scalar and using a .vv operation instead of .vx. Instead this patch bitcasts the vector to the equivalent SEW=32 vector and inserts the scalar parts using two vslide1up/down. We do that unmasked and apply the mask separately at the end with a vmerge. For vslide1up there maybe some other options here like getting i64 into element 0 and using vslideup.vi with this vector as vd and the original source as vs1. Masking would still need to be done afterwards. That idea doesn't work for vslide1down. We need to slidedown and then insert a single scalar at vl-1 which we could do with a vslideup, but that assumes vl > 0 which I don't think we can assume. The i32 double slide1down implemented here is the best I could come up with and I just made vslide1up consistent. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99910	2021-04-07 10:44:53 -07:00
Craig Topper	01a23dccb1	[RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer.	2021-04-06 16:48:40 -07:00
Craig Topper	2641c1f15e	[RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal. We encountered a hang in our internal code base. I'm having trouble creating a test case because the test that hit it was testing some code that is not upstream.	2021-04-06 15:00:33 -07:00
Craig Topper	3ae03f67fe	[RISCV] Add helper function to share some of the code for isel of vector load/store intrinsics. Many of the operands are handled the same or in the same order for all these intrinsics. Factor out the code for selecting and pushing them into the Operands vector. Differential Revision: https://reviews.llvm.org/D99923	2021-04-06 09:54:24 -07:00
Craig Topper	cb1028a0b9	[RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register. I missed a few intrinsics in `3dd4aa7d09` when I did this for masked loads and masked segment loads/stores. Found while trying to share more code between these custom isel functions.	2021-04-05 21:28:32 -07:00
Craig Topper	780a47285a	[RISCV] Add SDTCisInt to the SDTRVVSlide1 since it is only used for vslide1up.vx/vslide1down.vx. The scalar type is already marked as XLenVT. The floating point version would need a different rule.	2021-04-05 13:03:39 -07:00
Craig Topper	af2837675a	[RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP. It's a bit silly, but it allows us to write stricter type constraints for isel. There's still some extra type checks in the generated table due to some type interference limitations around HWMode.	2021-04-05 12:57:35 -07:00
Craig Topper	7edda698c0	[RISCV] Move VSLIDE1UP_VX pattern out of a loop that includes FP types. FP would need VFSLIDE1UP_VF which uses an FP register.	2021-04-05 12:05:54 -07:00
Fraser Cormack	af3a839c70	[RISCV] Add support for bitcasts between scalars and fixed-length vectors This patch supports bitcasts from scalar types to fixed-length vectors and vice versa. It custom-lowers and custom-legalizes them to EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element vectors to hold the scalar where appropriate. Previously, some of these would fail to select, others would be expanded through stack loads and stores. Effort was made to ensure the codegen avoids the stack for both legal and illegal scalar types. Some of the codegen could be improved, but on first glance it looks like a general optimization of EXTRACT_VECTOR_ELT when extracting an i64 element on RV32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99667	2021-04-05 17:21:55 +01:00
Fraser Cormack	3f0df4d7b0	[RISCV] Expand scalable-vector truncstores and extloads Caught in internal testing, these operations are assumed legal by default, even for scalable vector types. Expand them back into separate truncations and stores, or loads and extensions. Also add explicit fixed-length vector tests for these operations, even though they should have been correct already. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99654	2021-04-05 17:03:45 +01:00
Craig Topper	4708a05da0	[RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled. The W version of orc.b does not exist in Zbp so we need to use gorci encoding. If we have Zbp, we can use gorciw which can avoid a sext.w in some cases.	2021-04-04 17:14:28 -07:00
Craig Topper	98d5db3e3a	[RISCV] Lower orc.b intrinsic to RISCVISD::GORCI. This will allow us to share any future known bits, demaned bits, or sign bits improvements.	2021-04-04 12:31:41 -07:00
Craig Topper	a2ea003fcb	[RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant. As long as it's a constant we can directly pattern match it without any problems. It's only when it isn't a constant that we need to add an AND. In theory this should allow more target independent optimizations to remain active.	2021-04-03 23:13:30 -07:00
Levy Hsu	f78d932cf2	[RISCV] Add IR intrinsics for Zbc extension Head files are included in a separate patch in case the name needs to be changed. RV32 / 64: clmul clmulh clmulr Differential Revision: https://reviews.llvm.org/D99711	2021-04-02 12:09:13 -07:00
Levy Hsu	944adbf285	Recommit "[RISCV] Add IR intrinsic for Zbb extension" Forgot to amend the Author. Original commit message: Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b Differential Revision: https://reviews.llvm.org/D99320	2021-04-02 11:50:19 -07:00
Craig Topper	1f0b309f24	Revert "[RISCV] Add IR intrinsic for Zbb extension" This reverts commit `1808194590`. I forgot to change the author.	2021-04-02 11:47:02 -07:00
Craig Topper	1808194590	[RISCV] Add IR intrinsic for Zbb extension Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b	2021-04-02 11:23:57 -07:00
Levy Hsu	b001d574d7	[RISCV] Add IR intrinsic for Zbr extension Implementation for RISC-V Zbr extension intrinsic. Header files are included in separate patch in case the name needs to be changed RV32 / 64: crc32b crc32h crc32w crc32cb crc32ch crc32cw RV64 Only: crc32d crc32cd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99009	2021-04-02 10:58:45 -07:00
Craig Topper	d7ffa82a8e	[RISCV] Improve 64-bit integer constant materialization for more cases. For positive constants we try shifting left to remove leading zeros and fill the bottom bits with 1s. We then materialize that constant shift it right. This patch adds a new strategy to try filling the bottom bits with zeros instead. This catches some additional cases.	2021-04-02 10:18:08 -07:00
Fraser Cormack	3b48d849d4	[RISCV] Optimize more redundant VSETVLIs D99717 introduced some test cases which showed that the output of one vsetvli into another would not be picked up by the RISCVCleanupVSETVLI pass. This patch teaches the optimization about such a pattern. The pattern is quite common when using the RVV vsetvli intrinsic to pass the VL onto other intrinsics. The second test case introduced by D99717 is left unoptimized by this patch. It is a rarer case and will require us to rewire any uses of the redundant vset[i]vli's output to the previous one's. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99730	2021-04-02 10:04:07 +01:00
Craig Topper	766d27dc85	[RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands. This occurs when we type legalize an i64 scalar input on RV32. We need to manually splat, which requires a vector input. Rather than special case this in lowering just pattern match it.	2021-04-01 14:10:21 -07:00
Craig Topper	dbbc95e3e5	[RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat. The default legalization strategy is PromoteFloat which keeps half in single precision format through multiple floating point operations. Conversion to/from float is done at loads, stores, bitcasts, and other places that care about the exact size being 16 bits. This patches switches to the alternative method softPromoteHalf. This aims to keep the type in 16-bit format between every operation. So we promote to float and immediately round for any arithmetic operation. This should be closer to the IR semantics since we are rounding after each operation and not accumulating extra precision across multiple operations. X86 is the only other target that enables this today. See https://reviews.llvm.org/D73749 I had to update getRegisterTypeForCallingConv to force f16 to use f32 when the F extension is enabled. This way we can still pass it in the lower bits of an FPR for ilp32f and lp64f ABIs. The softPromoteHalf would otherwise always give i16 as the argument type. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D99148	2021-04-01 12:41:57 -07:00
Craig Topper	d157e3f387	[RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32. We need to splat the scalar separately and use .vv, but there is no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with swapped operands. We also need to get VT to use for the splat from an operand rather than the result since the result VT is nxvXi1. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99704	2021-04-01 10:38:05 -07:00
Craig Topper	b7c2e577cc	[RISCV] Add custom type legalization to form MULHSU when possible. There's no target independent ISD opcode for MULHSU, so custom legalize 2*XLen multiplies ourselves. We have to be a little careful to prefer MULHU or MULHSU. I thought about doing this in isel by pattern matching the (add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided against this because the add might become part of a chain of adds. I don't trust DAG combine not to reassociate with other adds making it difficult to find both pieces again. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D99479	2021-04-01 10:15:55 -07:00
Craig Topper	d61b40ed27	[RISCV] Improve 64-bit integer materialization for some cases. This adds a new integer materialization strategy mainly targeted at 64-bit constants like 0xffffffff where there are 32 or more trailing ones with leading zeros. We can materialize these by using an addi -1 and srli to restore the leading zeros. This matches what gcc does. I haven't limited to just these cases though. The implementation here takes the constant, shifts out all the leading zeros and shifts ones into the LSBs, creates the new sequence, adds an srli, and checks if this is shorter than our original strategy. I've separated the recursive portion into a standalone function so I could append the new strategy outside of the recursion. Since external users are no longer using the recursive function, I've cleaned up the external interface to return the sequence instead of taking a vector by reference. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D98821	2021-04-01 09:12:52 -07:00
Craig Topper	c88ee1a094	[RISCV] Add UnsupportedSchedZfh multiclass to reduce duplicate lines from RISCVSchedRocket.td and RISCVSchedSiFive7.td. NFC	2021-03-31 15:06:14 -07:00
Craig Topper	2a8b7cab6a	[RISCV] Add RISCVISD opcodes for CLZW and CTZW. Our CLZW isel pattern is quite easily broken by surrounding code preventing it from matching sometimes. This usually results in failing to remove the and X, 0xffffffff inserted by type legalization. The add with -32 that type legalization also inserts will often gets combined into other add/sub nodes. That doesn't usually result in extra code when we don't use clzw. CTTZ seems to be less fragile, but I wanted to keep it consistent with CTLZ. Reviewed By: asb, HsiangKai Differential Revision: https://reviews.llvm.org/D99317	2021-03-31 09:40:07 -07:00
Craig Topper	04f10ab367	[RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens to be INT64_MIN. I don't think the compiler would optimize based on that in this usage, but it could fail UBSan or -ftrapv. Reviewed By: HsiangKai, frasercrmck Differential Revision: https://reviews.llvm.org/D99637	2021-03-31 09:26:41 -07:00
Fraser Cormack	10fc6e4358	[RISCV] Add support for the stepvector intrinsic This adds almost everything required for supporting the new stepvector intrinsic on RVV. It is lowered to the existing VID_VL SDNode. The only exception is a limitation that RV32 cannot yet lower the intrinsic on i64 vectors. This is because the step operand is (currently) required to be at least as large as the vector element type. I will look into patching that out and loosening the requirement to only an integer pointer type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99594	2021-03-31 11:41:17 +01:00
Craig Topper	5db19cc010	[RISCV] simm12_plus1 should not inherit from Operand. NFC We only use this in Pat patterns, so it just needs to be an ImmLeaf. If we did need it as an instruction operand, the ParserMatchClass, EncoderMethod, and DecoderMethod were probably wrong.	2021-03-30 19:02:11 -07:00
Craig Topper	05998701b9	[RISCV] Remove some unused ImmLeafs. NFC These got left behind when we switched RV32 to use selectImm to match RV64.	2021-03-30 18:54:11 -07:00
Craig Topper	a33fcafaf0	[RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not. Without Zfh the half type isn't legal, but it could still be used as an argument/return in IR. Clang will not generate this today. Previously we promoted the half value to float for arguments and returns if the F extension is enabled but Zfh isn't. Then depending on which ABI is enabled we would pass it in either an FPR or a GPR in float format. If the F extension isn't enabled, it would get passed in the lower 16 bits of a GPR in half format. With this patch the value will always in half format and will be in the lower bits of a GPR or FPR. This should be consistent with where the bits are located when Zfh is enabled. I've based this implementation off of how this is done on ARM. I've manually nan-boxed the value to 32 bits using integer ops. It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't canonicalize nans so should leave the value alone. I think those are the instructions that could get used on this value. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D98670	2021-03-30 09:47:54 -07:00
Tomas Matheson	a9968c0a33	[NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions Currently needsStackRealignment returns false if canRealignStack returns false. This means that the behavior of needsStackRealignment does not correspond to it's name and description; a function might need stack realignment, but if it is not possible then this function returns false. Furthermore, needsStackRealignment is not virtual and therefore some backends have made use of canRealignStack to indicate whether a function needs stack realignment. This patch attempts to clarify the situation by separating them and introducing new names: - shouldRealignStack - true if there is any reason the stack should be realigned - canRealignStack - true if we are still able to realign the stack (e.g. we can still reserve/have reserved a frame pointer) - hasStackRealignment = shouldRealignStack && canRealignStack (not target customisable) Targets can now override shouldRealignStack to indicate that stack realignment is required. This change will make it easier in a future change to handle the case where we need to realign the stack but can't do so (for example when the register allocator creates an aligned spill after the frame pointer has been eliminated). Differential Revision: https://reviews.llvm.org/D98716 Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87	2021-03-30 17:31:39 +01:00
Craig Topper	f069000b43	[RISCV] Remove floating point condition code legalization from lowerFixedLengthVectorSetccToRVV. After D98939, this is done by LegalizeVectorOps making this code dead. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99519	2021-03-30 09:11:56 -07:00
Evandro Menezes	fd94cfeeb5	[RISCV] Move scheduling resources for B into a separate file (NFC) Differential Revision: https://reviews.llvm.org/D99557	2021-03-29 20:37:22 -05:00
Craig Topper	3dd4aa7d09	[RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register. This matches what we do in our isel patterns. In our internal testing we've found this is needed to make the fast register allocator happy at -O0. Otherwise it may assign V0 to an earlier operand and find itself with no registers left when it reaches the mask operand. By using V0 explicitly, the fast register allocator will see it when it checks for phys register usages before it starts allocating vregs. I'll try to update this with a test case. Unfortunately, this does appear to prevent some instruction reordering by the pre-RA scheduler which leads to the increased spills seen in some tests. I suspect that problem could already occur for other instructions that already used V0 directly. There's a lot of repeated code here that could do with some wrapper functions. Not sure if that should be at the level of the new code that deals with V0. That would require multiple output parameters to pass the glue, chain and register back. Maybe it should be at a higher level over the entire set of push_backs. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D99367	2021-03-29 10:20:43 -07:00
Roger Ferrer Ibanez	ef76a333fa	[RISCV] Fix offset computation for RVV In D97111 we changed the RVV frame layout when using sp or bp to address the stack slots so we could address the emergency stack slot. The idea is to put the RVV objects as far as possible (in offset terms) from the frame reference register (sp / fp / bp). When using fp this happens naturally because the RVV objects are already the top of the stack and due to the constraints of RVV (VLENB being a power of two >= 128) the stack remains aligned. The rest of this summary does not apply to this case. When using sp / bp we need to skip the non-RVV stack slots. The size of the the non-RVV objects is computed subtracting the callee saved register size (whose computation is added in D97111 itself) to the total size of the stack (which does not account for RVV stack slots). However, when doing so we round to 16 bytes when computing that size and we end emitting a smaller offset that may belong to a scalar stack slot (see D98801). So this change removes that rounding. Also, because we want the RVV objects be between the non-RVV stack slots and the callee-saved register slots, we need to make sure the RVV objects are properly aligned to 8 bytes. Adding a padding of 8 would render the stack unaligned. So when allocating space for RVV (only when we don't use fp) we need to have extra padding that preserves the stack alignment. This way we can round to 8 bytes the offset that skips the non-RVV objects and we do not misalign the whole stack in the way. In some circumstances this means that the RVV objects may have padding before (=lower offsets from sp/bp) and after (before the CSR stack slots). Differential Revision: https://reviews.llvm.org/D98802	2021-03-29 17:03:49 +00:00
Hsiangkai Wang	bc82e9bf25	[RISCV] Add vfabs.v pseudo instruction. Differential Revision: https://reviews.llvm.org/D99454	2021-03-28 10:24:05 +08:00
Craig Topper	5692fc38e0	[RISCV] Add a pattern for (sext_inreg (mul (and X, 0xffffffff), (and Y, 0xffffffff)), i32) to suppress MULW formation We have a special pattern for (mul (and X, 0xffffffff), (and Y, 0xffffffff)), to optimize the ANDs to shift. But if a sext_inreg coms first, we'll form a MULW and limit the effectiveness of the special match. So this patch adds a larger pattern to suppress the MULW formation by emitting a sext.w and then the same output we use for the (mul (and X, 0xffffffff), (and Y, 0xffffffff)). This should all get CSEd. This is the issue I was trying to fix with D99029, but that affected many more tests.	2021-03-27 15:37:18 -07:00
Craig Topper	4d5ee71b52	[RISCV] Merge FMulAdd and FMulSub scheduler classes to a single FMA scheduler class. NFC It's unlikely that FMADD and FMSUB would have different scheduling information so merge them. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99140	2021-03-26 16:37:20 -07:00
Craig Topper	c41f2f6492	[RISCV] Add scheduler classes for the Zba and Zbb extensions. I've used IALU for the simplest operations from Zbb: min, minu, max, maxu, sext.b, sext.h, zext.h, andn, orn, xnor I've put add.uw in IALU32 and slli.uw in ShiftImm32. Remaining instructions have received new classes. All 3 shadd are grouped together. shadd.uw are grouped together. Rotate left and right are together. Everything else got their own class containing one instruction. I think what I have here is the minimum granularity we need. I could be convinced that we need more classes. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D99040	2021-03-26 14:15:29 -07:00
Zakk Chen	9049cf77e3	[RISCV] Add constraint for RVV indexed loads. Add the constraint when destination EEW not equals the source EEW for correctness. The RVV spec has three register overlap rules and I implement the first stricter constraint because the others are difficult to enforce. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D98920	2021-03-26 07:23:24 -07:00
Craig Topper	8f62a80328	[RISCV] Optimize (and (shl GPR:, uimm5:), 0xffffffff) to use 2 shifts instead of 3. The and would normally become SLLI+SRLI, giving us 2 SLLI+SRLI. We can detect this and combine the 2 SLLIs into 1.	2021-03-25 23:31:01 -07:00
Craig Topper	5a18c576c4	[RISCV] Don't call CheckAndMask from selectZExti32. Now that targetShrinkDemandedConstant preserves 0xffffffff masks we shouldn't need to call computeKnownBits here.	2021-03-25 22:07:41 -07:00
Craig Topper	5797feaa55	[RISCV] Reorder checks in RISCVTTIImpl::getGatherScatterOpCost to avoid calling getMinRVVVectorSizeInBits() when V extension is not enabled. getMinRVVVectorSizeInBits() asserts if the V extension isn't enabled. So check that gather/scatter is legal first since it already contains a check for V extension being enabled. It also already checks getMinRVVVectorSizeInBits for fixed length vectors so we don't need a check in getGatherScatterOpCost.	2021-03-25 14:20:47 -07:00
Craig Topper	c40cea6f08	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff). We look for this pattern frequently in isel patterns so its a good idea to try to preserve it. This also let's us remove our special isel handling for srliw and use a direct pattern match of (srl (and X, 0xffffffff), C) since no bits will be removed from the and mask. Differential Revision: https://reviews.llvm.org/D99042	2021-03-25 09:03:25 -07:00
Fraser Cormack	99211352c1	[RISCV] Optimize select-like vector shuffles This patch adds a small optimization for vector shuffle lowering, detecting shuffles which can be re-expressed as vector selects. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99270	2021-03-25 11:39:57 +00:00
Fraser Cormack	321a71a772	[RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats This patch adds further optimization techniques to RVV BUILD_VECTOR lowering. It teaches the compiler to find splats of larger vector element types "hidden" in smaller ones. For example, a v4i8 build_vector (0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally more optimal than the dominant-element BUILD_VECTORs and so takes priority. This optimization is currently limited to all-constant-or-undef BUILD_VECTORs as those were found to be the most common. There's no reason this couldn't be extended to other BUILD_VECTORs, but the additional bit-manipulation instructions may require more sophisticated heuristics. There are some cases where the materialization of the larger constant takes more scalar instructions than it does to build the vector with vector instructions. We could add heuristics to try and catch this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99195	2021-03-25 10:35:31 +00:00
Serge Pavlov	ddb0bcbdff	Add missing cases in RISCVMCExpr::getVariantKindName Differential Revision: https://reviews.llvm.org/D98929	2021-03-25 12:57:05 +07:00
Craig Topper	0f99c6c56e	[RISCV] Remove duplicate DebugLoc variables from cases in ReplaceNodeResults. NFC We already created a DebugLoc at the top of the function. We can just use that one.	2021-03-24 20:23:03 -07:00
Craig Topper	512bae81cc	[RISCV] Add basic cost modelling for fixed vector gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99142	2021-03-24 11:14:14 -07:00
Craig Topper	f24f09d256	[RISCV] Add TTI support for cpop with Zbb This will tell loop idiom recognize that it can make popcount loops countable using the ctpop intrinsic. I didn't bother checking for illegal types. Type legalization knows how to split a ctpop into multiple ctops added together. Assuming we only receive reasonable integer bit widths, a few cpop instructions added together is probably better than the loop. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99203	2021-03-24 10:58:42 -07:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Jim Lin	503f1d845f	[RISCV] Add HasStdExtD predicate to copysign from double and to double patterns Copysign from double and to double patterns have lack of HasStdExtD predicate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99234	2021-03-24 14:29:23 +08:00
Craig Topper	839a46d88f	[RISCV] Use selectImm for RV32. NFC Previously we used selectImm for RV64 and isel patterns for RV32. This should be NFC, but will allow RV32 and RV64 to share improvements in the future. For example, it might be useful to use BSETI from Zbs to make single bit constants. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98877	2021-03-23 08:57:15 -07:00
Fraser Cormack	feff66a082	[RISCV] Further optimize BUILD_VECTORs with repeated elements This patch builds upon the initial BUILD_VECTOR work introduced in D98700. It further optimizes the lowering of BUILD_VECTOR by using VSELECT operations to effectively insert repeated elements into the vector with relatively few instructions. This allows us to optimize more BUILD_VECTORs without significantly increasing the size of the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98969	2021-03-23 14:14:48 +00:00
Fraser Cormack	5bfbd9d938	[RISCV] Optimize all-constant mask BUILD_VECTORs This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose elements are all constants or undef. It lowers such operations by building up the vector via a series of integer operations, in which multiple mask elements are inserted into a vector at a time via i8/i16/i32/i64 element types. The final result is then bitcast from that integer vector. We restrict this optimization in certain circumstances when optimizing for size. If we are required to use more than one integer insert operation, then it will likely increase code size compared with using a load from a constant pool. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98860	2021-03-23 10:11:19 +00:00
Craig Topper	d7b0c19823	[RISCV] Add scheduler classes to Zfh instructions. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99053	2021-03-22 20:30:09 -07:00
Craig Topper	8db4804da7	[RISCV] Remove unused SchedWrites WriteFConv32/WriteFConv64/WriteFMov32/WriteFMov64. It doesn't look like any instructions have ever been assigned to these classes. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99050	2021-03-22 20:29:18 -07:00
Craig Topper	294efcd6f7	[RISCV] Add support for fixed vector masked gather/scatter. I've split the gather/scatter custom handler to avoid complicating it with even more differences between gather/scatter. Tests are the scalable vector tests with the vscale removed and dropped the tests that used vector.insert. We're probably not as thorough on the splitting cases since we use 128 for VLEN here but scalable vector use a known min size of 64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98991	2021-03-22 10:17:30 -07:00
luxufan	02ffbac844	[RISCV] remove redundant instruction when eliminate frame index The reason for generating mv a0, a0 instruction is when the stack object offset is large then int<12>. To deal this situation, in the elimintateFrameIndex function, it will create a virtual register, which needs the register scavenger to scavenge it. If the machine instruction that contains the stack object and the opcode is ADDI(the addi was generated by frameindexNode), and then this instruction's destination register was the same as the register that was generated by the register scavenger, then the mv a0, a0 was generated. So to eliminnate this instruction, in the eliminateFrameIndex function, if the instrution opcode is ADDI, then the virtual register can't be created. Differential Revision: https://reviews.llvm.org/D92479	2021-03-21 18:54:00 +08:00
Jessica Clarke	b2bb003774	[RISCV] Update comment in RISCVInstrInfoM.td Missed in `07ed62b7d5`.	2021-03-20 22:35:40 +00:00
Craig Topper	07ed62b7d5	[RISCV] Disable (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization when Zba is enabled. This optimization is trying to save SRLI instructions needed to implement the ANDs. If we have zext.w we won't save anything. Because we don't check that the multiply is the only user of the AND we might even increase instruction count.	2021-03-20 15:31:45 -07:00
Craig Topper	b0d8823a8a	[RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y, 0xffffffff)) on RV64 This patterns computes the full 64 bit product of a 32x32 unsigned multiply. This requires a two pairs of SLLI+SRLI to zero the upper 32 bits of the inputs. We can do better than this by using two SLLI to move the lower bits to the upper bits then use MULHU to compute the product. This is the high half of a full 64x64 product. Since we put 32 0s in the lower bits of the inputs we know the 128-bit product will have zeros in the lower 64 bits. So the upper 64 bits, which MULHU computes, will contain the original 64 bit product we were after. The same trick would work for (mul (sext_inreg X, i32), (sext_inreg Y, i32)) using MULHS, but sext_inreg is sext.w which is already one instruction so we wouldn't save anything. Differential Revision: https://reviews.llvm.org/D99026	2021-03-20 14:55:46 -07:00
Craig Topper	d5c1d305b3	[RISCV] Rename WriteShift/ReadShift scheduler classes to WriteShiftImm/ReadShiftImm. Move variable shifts from WriteIALU/ReadIALU to new WriteShiftReg/ReadShiftReg. Previously only immediate shifts were in WriteShift. Register shifts were grouped with IALU. Seems likely that immediate shifts would be as fast or faster than register shifts. And that immediate shifts wouldn't be any faster than IALU. So if any deserved to be in their own group it should be register shifts not immediate shifts. Rather than try to flip them let's just add more granularity and give each kind their own class. I've used new names for both to make them unambiguous and to force any downstream implementations to be forced to put correct information in their scheduler models. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D98911	2021-03-19 20:39:49 -07:00
Craig Topper	5d315691c4	[RISCV] Add missing bitcasts to the results of lowerINSERT_SUBVECTOR and lowerEXTRACT_SUBVECTOR when handling mask vectors. Found by adding asserts to LegalizeDAG to catch incorrect result types being returned. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98964	2021-03-19 10:54:33 -07:00
Craig Topper	85f3f6b3cc	[RISCV] Lower scalable vector masked loads to intrinsics to match fixed vectors and reduce isel patterns. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98840	2021-03-19 10:39:35 -07:00
Fraser Cormack	d399b82e2a	[RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs I'm not sure how I failed to notice this before, but when optimizing dominant-element BUILD_VECTORs we would lower via the scalable container type, which lost us the information about the fixed length of the vector types. By lowering via the fixed-length type we can preserve that information and eliminate redundant vsetvli instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98938	2021-03-19 17:21:06 +00:00
Fraser Cormack	550292ecb1	[RISCV] Fix missing scalable->fixed-length vector conversion Returning the scalable-vector container type would present problems when the fixed-length INSERT_VECTOR_ELT was used by later operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98776	2021-03-19 16:49:47 +00:00
Hsiangkai Wang	aa8d33a6d6	[RISCV] Spilling for Zvlsseg registers. For Zvlsseg, we create several tuple register classes. When spilling for these tuple register classes, we need to iterate NF times to load/store these tuple registers. Differential Revision: https://reviews.llvm.org/D98629	2021-03-19 07:46:16 +08:00
Craig Topper	c9861f722e	[RISCV] Correct the output chain in lowerFixedLengthVectorMaskedLoadToRVV We returned the input chain instead of the output chain from the new load. This bypasses the load in the chain. I haven't found a good way to test this yet. IR order prevents my initial attempts at causing reordering.	2021-03-18 16:34:35 -07:00
Fraser Cormack	3495031a39	[RISCV] Support scalable-vector masked scatter operations This patch adds support for masked scatter intrinsics on scalable vector types. It is mostly an extension of the earlier masked gather support introduced in D96263, since the addressing mode legalization is the same. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96486	2021-03-18 10:17:50 +00:00
Fraser Cormack	0331399dc9	[RISCV] Support scalable-vector masked gather operations This patch supports the masked gather intrinsics in RVV. The RVV indexed load/store instructions only support the "unsigned unscaled" addressing mode; indices are implicitly zero-extended or truncated to XLEN and are treated as byte offsets. This ISA supports the intrinsics directly, but not the majority of various forms of the MGATHER SDNode that LLVM combines to. Any signed or scaled indexing is extended to the XLEN value type and scaled accordingly. This is done during DAG combining as widening the index types to XLEN may produce illegal vectors that require splitting, e.g. nxv16i8->nxv16i64. Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the stack when lowering split legalized index operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96263	2021-03-18 09:26:18 +00:00
Fraser Cormack	c2b4600ec8	[RISCV] Support bitcasts of fixed-length mask vectors Without this patch, bitcasts of fixed-length mask vectors would go through the stack. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98779	2021-03-18 08:52:42 +00:00
ShihPo Hung	fca5d63aa8	[RISCV] Fix isel pattern of masked vmslt[u] This patch changes the operand order of masked vmslt[u] from (mask, rs1, scalar, maskedoff, vl) to (maskedoff, rs1, scalar, mask, vl). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98839	2021-03-17 20:18:11 -07:00
Craig Topper	92b39c6907	[RISCV] Use getTargetExtractSubreg and getTargetInsertSubreg to simplify some code. NFCI	2021-03-17 12:10:19 -07:00
Craig Topper	696ddef569	[RISCV] Support masked load/store for fixed vectors. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98561	2021-03-17 10:26:15 -07:00
Fraser Cormack	70251759a2	[RISCV] Optimize "dominant element" BUILD_VECTORs This patch adds an optimization path for BUILD_VECTOR nodes where the majority of the elements are identical. These can be splatted, with the remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can be tweaked as required - it is currently conservative. Undef elements are disregarded when judging the dominance of a particular element. This allows them to be covered by the splat value. In addition, vectors of 2 elements are always optimized to a splat (for the upper element) and an insert at element zero. This optimization is disabled when optimizing for size. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98700	2021-03-17 10:09:04 +00:00
Fangrui Song	6ab8927931	[RISCV] Support clang -fpatchable-function-entry && GNU function attribute 'patchable_function_entry' Similar to D72215 (AArch64) and D72220 (x86). ``` % clang -target riscv32 -march=rv64g -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o ... 0000000000000000 <main>: 0: 13 00 00 00 nop 4: 13 00 00 00 nop % clang -target riscv32 -march=rv64gc -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o ... 00000002 <main>: 2: 01 00 nop 4: 01 00 nop ``` Recently the mainline kernel started to use -fpatchable-function-entry=8 for riscv (https://git.kernel.org/linus/afc76b8b80112189b6f11e67e19cf58301944814). Differential Revision: https://reviews.llvm.org/D98610	2021-03-16 10:02:35 -07:00
Craig Topper	229eeb187d	[RISCV] Look through copies when trying to find an implicit def in addVSetVL. The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF before connecting it to the vector instruction. This occurs when constrainRegClass reduces to a class with less than 4 registers. I believe LMUL8 on masked instructions triggers this since the result can only use the v8, v16, or v24 register group as the mask is using v0. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98567	2021-03-16 07:59:09 -07:00
Craig Topper	a33ce06cf5	[RISCV] Improve i32 UADDSAT/USUBSAT on RV64. The default promotion uses zero extends that become shifts. We cam use sign extend instead which is better for RISCV. I've used two different implementations based on whether we have minu/maxu instructions. Differential Revision: https://reviews.llvm.org/D98683	2021-03-16 07:44:06 -07:00
Craig Topper	41759c3d92	[RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC. This allows me to introduce similar combines for branches as we have recently added for SELECT_CC. Some of them are less useful for standalone setccs and only help branch instructions. By having a BR_CC node its easier to only affect branches. I'm using CondCodeSDNode to make isel patterns easier to write so we can refer to the codes by name. SELECT_CC uses a constant instead. I've translated the condition code just like SELECT_CC so we need less patterns for the swapped conditions. This includes special cases for X < 1 and X > -1 that get translated to blez and bgez by using a 0 constant. computeKnownBitsForTargetNode support for SELECT_CC is added to allow MaskedValueIsZero to work for cases where the true and false values of the SELECT_CC are setccs and the result of the SELECT_CC is used by a BR_CC. This was needed to avoid regressions in some of the overflow tests. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98159	2021-03-15 11:54:01 -07:00
Philipp Tomsich	018e96f71f	[RISCV] Add isel-patterns to optimize (a < 1) into blez (a <= 0) The following code-sequence showed up in a testcase (isolated from SPEC2017) for if-conversion and vectorization when searching for the maximum in an array: addi a2, zero, 1 blt a1, a2, .LBB0_5 which can be expressed as `bge zero,a1,.LBB0_5`/`blez a1,/LBB0_5`. More generally, we want to express (a < 1) as (a <= 0). This adds the required isel-pattern and updates the testcases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98449	2021-03-15 11:32:43 -07:00
Craig Topper	3dc5b533e0	[RISCV] Improve legalization of i32 UADDO/USUBO on RV64. The default legalization uses zero extends that require pair of shifts on RISCV. Instead we can take advantage of the fact that unsigned compares work equally well on sign extended inputs. This allows us to use addw/subw and sext.w. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98233	2021-03-15 09:30:23 -07:00
Fraser Cormack	0c5b789c73	[RISCV] Support fixed-length vectors in the calling convention This patch adds fixed-length vector support to the calling convention when RVV is used to lower fixed-length vectors. The scheme follows the regular vector calling convention for the argument/return registers, but uses scalable vector container types as the LocVTs, and converts to/from the fixed-length vector value types as required. Fixed-length vector types may be split when the combination of minimum VLEN and the maximum allowable LMUL is not large enough to fully contain the vector. In this case the behaviour differs between fixed-length vectors passed as parameters and as return values: 1. For return values, vectors must be passed entirely via registers or via the stack. 2. For parameters, unlike scalar values, split vectors continue to be passed by value, and are split across multiple registers until there are no remaining registers. Thus vector parameters may be found partly in registers and partly on the stack. As with scalable vectors, the first fixed-length mask vector is passed via v0. Split mask fixed-length vectors are passed first via v0 and then via the next available vector register: v8,v9,etc. The handling of vector return values uses all available argument registers v8-v23 which does not adhere to the calling convention we're supposedly implementing, but since this issue affects both fixed-length and scalable-vector values, it was left as-is. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97954	2021-03-15 10:43:51 +00:00
Hsiangkai Wang	a81dff1e58	[RISCV] Support inline asm for vector instructions. Types of fractional LMUL and LMUL=1 are all using VR register class. When using inline asm, it will use the first type in the register class as the type for the register. It is not necessary the same as the value type. We need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal to put the value in the corresponding register class. Differential Revision: https://reviews.llvm.org/D97480	2021-03-15 11:02:18 +08:00
Craig Topper	fcdf7f6224	[RISCV] Give an explicit error if 'generic' CPU is passed instead of 'generic-rv32' or 'generic-rv64'. Validate 64Bit feature against the triple. I encountered a project that uses llvm that passes "generic" by default. While I could fix that project, I wouldn't be surprised if other projects did something similar. So it seems like a good idea to provide a better error here. I've also added validation of the 64Bit feature against the triple so that we can catch a mismatched CPU before failing in a mysterious way. We can make it pretty far in isel because we calculate XLenVT from the triple and use that to set up the legal integer type. Reviewed By: luismarques, khchen Differential Revision: https://reviews.llvm.org/D98307	2021-03-14 17:21:31 -07:00
luxufan	a9b9c64fd4	change rvv frame layout This patch change the rvv frame layout that proposed in D94465. In patch D94465, In the eliminateFrameIndex function, to eliminate the rvv frame index, create temp virtual register is needed. This virtual register should be scavenged by class RegsiterScavenger. If the machine function has other unused registers, there is no problem. But if there isn't unused registers, we need a emergency spill slot. Because of the emergency spill slot belongs to the scalar local variables field, to access emergency spill slot, we need a temp virtual register again. This makes the compiler report the "Incomplete scavenging after 2nd pass" error. So I change the rvv frame layout as follows: ``` \|--------------------------------------\| \| arguments passed on the stack \| \|--------------------------------------\|<--- fp \| callee saved registers \| \|--------------------------------------\| \| rvv vector objects(local variables \| \| and outgoing arguments \| \|--------------------------------------\| \| realignment field \| \|--------------------------------------\| \| scalar local variable(also contains\| \| emergency spill slot) \| \|--------------------------------------\|<--- bp \| variable-sized local variables \| \|--------------------------------------\|<--- sp ``` Differential Revision: https://reviews.llvm.org/D97111	2021-03-13 16:05:55 +08:00
Craig Topper	51151828ac	[RISCV] Teach normaliseSetCC to canonicalize X > -1 to X >= 0 and X < 1 to 0 >= X. This allows the use of BGE with X0 instead of puting -1/1 in a register. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D98542	2021-03-12 11:50:10 -08:00
Craig Topper	45d3ed0304	[RISCV] Add support for scalable vector masked load/store. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98460	2021-03-12 10:32:33 -08:00
Fraser Cormack	641f5700f9	[RISCV] Optimize INSERT_VECTOR_ELT sequences This patch optimizes the codegen for INSERT_VECTOR_ELT in various ways. Primarily, it removes the use of vslidedown during lowering, and the vector element is inserted entirely using vslideup with a custom VL and slide index. Additionally, lowering of i64-element vectors on RV32 has been optimized in several ways. When the 64-bit value to insert is the same as the sign-extension of the lower 32-bits, the codegen can follow the regular path. When this is not possible, a new sequence of two i32 vslide1up instructions is used to get the vector element into a vector. This sequence was suggested by @craig.topper. From there, the value is slid into the final position for more consistent lowering across RV32 and RV64. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98250	2021-03-12 09:13:38 +00:00
Fraser Cormack	4d2d5855c7	[RISCV] Fix up stale VECREDUCE comments. NFC. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98399	2021-03-12 08:49:46 +00:00
Craig Topper	1d26bbcf9b	[RISCV] Return false from isShuffleMaskLegal except for splats. We don't support any other shuffles currently. This changes the bswap/bitreverse tests that check for this in their expansion code. Previously we expanded a byte swapping shuffle through memory. Now we're scalarizing and doing bit operations on scalars to swap bytes. In the future we can probably use vrgather.vx to do a byte swap shuffle.	2021-03-11 20:02:49 -08:00
Craig Topper	c82f442954	[RISCV] Support fixed vector copysign. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98394	2021-03-11 09:57:24 -08:00
Craig Topper	0dff8a9627	[RISCV] Handle vmv.x.s intrinsic for i64 vectors on RV32. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98372	2021-03-11 09:39:50 -08:00
Craig Topper	9c841cb8e8	[RISCV] Support extract_vector_elt for fixed and scalable masked registers. This uses a really simple approach of converting to an i8 vector and extracting. This is probably not the best approach especially if you know the index is constant. Other ideas: -Store to stack temporary using vse1, load as scalar and shift. -Sort of bitcast the vector to a vector of i8, slide down the appropriate 8 bit element, copy to scalar, shift down the correct bit within the 8 bits we extracted. Not exactly sure how to describe such a bitcast from i1 vector to i8 vector within the type system for elements less than 8. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98310	2021-03-11 09:26:44 -08:00
Craig Topper	9106d04554	[RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32. On riscv32, i64 isn't a legal scalar type but we would like to support scalable vectors of i64. This patch introduces a new node that can represent a splat made of multiple scalar values. I've used this new node to solve the current crashes we experience when getConstant is used after type legalization. For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS when needed and then handling the SPLAT_VECTOR_PARTS later during LegalizeOps. I've remove the special case I previously put in for ABS for D97991 as the default expansion is now able to succesfully use getConstant. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98004	2021-03-10 09:46:18 -08:00
Craig Topper	0c73a506e8	[RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32. Currently we crash in type legalization any time an intrinsic uses a scalar i64 on RV32. This patch adds support for type legalizing this to prevent crashing. I don't promise that it uses the best possible codegen just that it is functional. This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x intrinsic and intrinsics that take a scalar input, splat it and then do some operation. For vmv.v.x we'll either rely on hardware sign extension for constants or we'll convert it to multiple splats and bit manipulation. For vmv.s.x we use a really unoptimal sequence inspired by what we do for an INSERT_VECTOR_ELT. For the third case we'll either try to use the .vi form for constants or convert to a complicated splat and bitmanip and use the .vv form of the operation. I've renamed the ExtendOperand field to SplatOperand now use it specifically for the third case. The first two cases are handled by custom lowering specifically for those intrinsics. I haven't updated all tests yet, but I tried to cover a subset that includes single-width, widening, and narrowing. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97895	2021-03-10 09:45:38 -08:00
Craig Topper	1e39118638	[RISCV] Manually split vector operands to VECREDUCE when handling vXi64 vectors on RV32. The type legalizer will visit the result before the operands. To avoid creating an illegal target specific node or falling back to scalarization, we need to manually split vector operands. This still doesn't handle the case of non-power of 2 operands which need to be widened. I'm not sure the type legalizer is ready for it. I think we would need to insert an INSERT_SUBVECTOR with the power of 2 type we want, with an undef first operand, and the non-power of 2 orignal operand as the vector to insert. Then fill in the neutral elements into the elements the padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector. From there we carry on splitting if needed to get to a legal type then do the target specific code. The problem with this is the type legalizer doesn't know how to widen an insert_subvector yet. We would need to add that including the handling for a non-undef first vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98292	2021-03-10 09:27:38 -08:00
Craig Topper	351844edf1	[RISCV] Add support for VECTOR_REVERSE for scalable vector types. I've left mask registers to a future patch as we'll need to convert them to full vectors, shuffle, and then truncate. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97609	2021-03-09 10:03:45 -08:00
Craig Topper	77ac3166e5	[RISCV] Add support for fixed vector reductions. I've included tests that require type legalization to split the vector. The i64 version of these scalarizes on RV32 due to type legalization visiting the result before the vector type. So we have to abort our custom expansion to avoid creating target specific nodes with an illegal type. Then type legalization ends up scalarizing. We might be able to fix this by doing custom splitting for large vectors in our handler to get down to a legal type. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98102	2021-03-09 09:39:59 -08:00
Craig Topper	1c7ad4dd88	[RISCV] Don't modify the SEW immediate on the V extension pseudo instructions after inserting VSETVLI. Previously we set the value to -1, but the SEW information could be useful for scheduling. Reviewed By: frasercrmck, rogfer01 Differential Revision: https://reviews.llvm.org/D98062	2021-03-09 09:02:19 -08:00
Craig Topper	72ecf2f43f	[RISCV] Optimize fixed vector ABS. Fix crash on scalable vector ABS for SEW=64 with RV32. The default fixed vector expansion uses sra+xor+add since it can't see that smax is legal due to our custom handling. So we select smax(X, sub(0, X)) manually. Scalable vectors are able to use the smax expansion automatically for most cases. It crashes in one case because getConstant can't build a SPLAT_VECTOR for nxvXi64 when i64 scalars aren't legal. So we manually emit a SPLAT_VECTOR_I64 for that case. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97991	2021-03-09 08:51:03 -08:00
Craig Topper	478317fbb7	[RISCV] Make the hasStdExtM() check in RISCVInstrInfo::getVLENFactoredAmount emit a diagnostic rather than an assert. As far as I know we're not enforcing the StdExtM must be enabled to use the V extension. If we use an assert here and hit this code in a release build we'll silently emit an invalid instruction. By using a diagnostic we report the error to the user in release builds. I think there may still be a later fatal error from the code emitter though. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97970	2021-03-09 08:50:02 -08:00
ShihPo Hung	5cdb2e9860	[RISCV][MC] Fix nf encoding for vector ld/st whole register The three bit nf is one less than the number of NFIELDS, so we manually decrement 1 for VS1/2/4/8R & VL1/2/4/8R. Reviewed By: craig.topper Differential revision: https://reviews.llvm.org/D98185	2021-03-08 19:30:24 -08:00
Craig Topper	7a64cc4a76	[RISCV] Make use of DAG.getNeutralElement in lowerVECREDUCE to avoid repeating the same list of constants. NFC Reviewed By: frasercrmck, khchen Differential Revision: https://reviews.llvm.org/D98091	2021-03-08 09:11:10 -08:00
Craig Topper	a2651266c5	[RISCV] Add explicit i64 types to RV64 isel patterns to stop tablegen from generating unneeded i32 patterns for RV32 HwMode.	2021-03-08 09:06:56 -08:00
Fraser Cormack	18173c57bd	[RISCV] Add new entry points to getContainerForFixedLengthVector While working on adding fixed-length vectors to the calling convention, it was necessary to be able to query for a fixed-length vector container type without access to an instance of SelectionDAG. This patch modifies the "main" getContainerForFixedLengthVector function to use an instance of TargetLowering rather than SelectionDAG, and preserves the SelectionDAG overload as a wrapper. An additional non-static version of the function was also added to simplify the common case in RISCVTargetLowering. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97925	2021-03-08 09:26:19 +00:00
Craig Topper	c91b3c9e63	[RISCV] Fold (select_cc (setlt X, Y), 0, ne, trueV, falseV) -> (select_cc X, Y, lt, trueV, falseV) A setcc can be created during LegalizeDAG after select_cc has been created. This combine will enable us to fold these late setccs. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98132	2021-03-07 09:44:56 -08:00

... 2 3 4 5 6 ...

1330 Commits