llvm-project

Commit Graph

Author	SHA1	Message	Date
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Fraser Cormack	02dd4b59bc	[RISCV] Optimize floating-point "dominant value" BUILD_VECTORs This patch aims to improve the performance of BUILD_VECTORs which are identified as containing a dominant element. Given that most floating-point constants themselves require a load from the constant pool, it was possible for the optimization to actually increase the number of individual loads on small vectors. The exception is the zero constant -- +0.0 -- which can be materialized efficiently. While this optimization could do with a proper cost model to weigh the benfits of a single vector load vs. the manipulation of individual elements -- even for integer vectors which often require several instructions to materialize -- without a concrete RVV implementation to work with any heuristic is likely to be both more obtuse and inaccurate. Until then, this patch fixes at least one known obvious deficiency. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106963	2021-07-29 09:22:34 +01:00
Fraser Cormack	a33f60db39	[RISCV] Add test case showing suboptimal BUILD_VECTOR lowering The second test case added here was pointed out to me by @craig.topper and shows how we "optimize" a two-element BUILD_VECTOR from being one load from the constant pool to two loads from the constant pool. The first test case shows that since materialization for the floating-point +0.0 value is cheap and doesn't involve a load, the optimization is more clearly beneficial here. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106962	2021-07-29 09:21:26 +01:00
Craig Topper	5edccc4581	[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant. Since we're changing VTYPE, we may change VLMAX which could invalidate the previous VL. If we can't tell if it is safe we should use an AVL of 1 instead of keeping the old VL. This is a quick fix. We may want to thread VL to the pseudo instruction instead of making up a value. That will require ISD opcode changes and changes to the C intrinsic interface. This fixes the issue raised in D106286. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106403	2021-07-23 09:12:05 -07:00
Fraser Cormack	d991b7212b	[RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR Often when lowering vector shuffles, we split the shuffle into two LHS/RHS shuffles which are then blended together. To do so we split the original indices into two, indexed into each respective vector. These two index vectors are then separately lowered as BUILD_VECTORs. This patch forwards on any undef indices to the BUILD_VECTOR, rather than having the VECTOR_SHUFFLE lowering decide on an optimal concrete index. The motiviation for ths change is so that we don't duplicate optimization logic between the two lowering methods and let BUILD_VECTOR do what it does best. Propagating undef in this way allows us, for example, to generate `vid.v` to produce the LHS indices of commonly-used interleave-type shuffles. I have designs on further optimizing interleave-type and other common shuffle patterns in the near future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104789	2021-07-13 10:41:54 +01:00
Jim Lin	242ddd5089	[RISCV][NFC] Add a single space after comma for VType In most of cases, it has a single space after comma in assembly operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103790	2021-06-09 11:18:22 +08:00
Fraser Cormack	8790e85255	[RISCV] Reserve an emergency spill slot for any RVV spills This patch addresses an issue in which fixed-length (VLS) vector RVV code could fail to reserve an emergency spill slot for their frame index elimination. This is because we were previously only reserving a spill slot when there were `scalable-vector` frame indices being used. However, fixed-length codegen uses regular-type frame indices if it needs to spill. This patch does the fairly brute-force method of checking ahead of time whether the function contains any RVV spill instructions, in which case it reserves one slot. Note that the second RVV slot is still only reserved for `scalable-vector` frame indices. This unfortunately causes quite a bit of churn in existing tests, where we chop and change stack offsets for spill slots. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103269	2021-06-03 10:44:34 +01:00
Craig Topper	527cd01314	[RISCV] Teach vsetvli insertion to use vsetvl x0, x0 form when we can tell that VLMAX and AVL haven't changed. This can help avoid needing a virtual register for the vsetvl output when the AVL is X0. For other register AVLs it can shorter the live range of the AVL register if it isn't needed later. There's probably no advantage when AVL is a 5 bit immediate that can use vsetivli. But do it anyway for consistency. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103215	2021-05-27 10:11:38 -07:00
Craig Topper	fdf10e6197	[RISCV] Use X0 as destination of inserted vsetvli when possible. We aren't going to connect the result to anything so we might as well avoid allocating a register. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D102031	2021-05-26 13:08:51 -07:00
Fraser Cormack	7a211ed110	[RISCV] Prevent store combining from infinitely looping RVV code generation does not successfully custom-lower BUILD_VECTOR in all cases. When it resorts to default expansion it may, on occasion, be expanded to scalar stores through the stack. Unfortunately these stores may then be picked up by the post-legalization DAGCombiner which merges them again. The merged store uses a BUILD_VECTOR which is then expanded, and so on. This patch addresses the issue by overriding the `mergeStoresAfterLegalization` hook. A lack of granularity in this method (being passed the scalar type) means we opt out in almost all cases when RVV fixed-length vector support is enabled. The only exception to this rule are mask vectors, which are always either custom-lowered or are expanded to a load from a constant pool. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102913	2021-05-24 10:19:32 +01:00
Fraser Cormack	797e580db9	[RISCV][NFC] Simplify test run lines Several tests had -verify-machineinstrs twice, and several tests were explicitly specifying the default FileCheck prefix of CHECK.	2021-05-13 12:41:00 +01:00
Craig Topper	ce6e4f27dd	[RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min. My thought process is that if v2i64 is an LMUL=1 type then v2i32 should be an LMUL=1/2 type. We limit the fractional LMUL so that SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This ensures there's always a fractional LMUL available to truncate a type. This does reduce the number of vsetvlis in some cases. Some tests increase vsetvlis because the best container type for a mask type is dependent on the LMUL+SEW that the mask was produced from, but you can't tell that from the type. I think this is something we need to solve this in the machine IR when optimizing vsetvlis. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D101215	2021-05-11 09:42:48 -07:00
Craig Topper	e2cd92cb9b	[RISCV] Match splatted load to scalar load + splat. Form strided load during isel. This modifies my previous patch to push the strided load formation to isel. This gives us opportunity to fold the splat into a .vx operation first. Using a scalar register and a .vx operation reduces vector register pressure which can be important for larger LMULs. If we can't fold the splat into a .vx operation, then it can make sense to use a strided load to free up the vector arithmetic ALU to do actual arithmetic rather than tying it up with vmv.v.x. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101138	2021-04-26 13:32:03 -07:00
Fraser Cormack	feff66a082	[RISCV] Further optimize BUILD_VECTORs with repeated elements This patch builds upon the initial BUILD_VECTOR work introduced in D98700. It further optimizes the lowering of BUILD_VECTOR by using VSELECT operations to effectively insert repeated elements into the vector with relatively few instructions. This allows us to optimize more BUILD_VECTORs without significantly increasing the size of the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98969	2021-03-23 14:14:48 +00:00
Fraser Cormack	d399b82e2a	[RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs I'm not sure how I failed to notice this before, but when optimizing dominant-element BUILD_VECTORs we would lower via the scalable container type, which lost us the information about the fixed length of the vector types. By lowering via the fixed-length type we can preserve that information and eliminate redundant vsetvli instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98938	2021-03-19 17:21:06 +00:00
Fraser Cormack	70251759a2	[RISCV] Optimize "dominant element" BUILD_VECTORs This patch adds an optimization path for BUILD_VECTOR nodes where the majority of the elements are identical. These can be splatted, with the remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can be tweaked as required - it is currently conservative. Undef elements are disregarded when judging the dominance of a particular element. This allows them to be covered by the splat value. In addition, vectors of 2 elements are always optimized to a splat (for the upper element) and an insert at element zero. This optimization is disabled when optimizing for size. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98700	2021-03-17 10:09:04 +00:00
Craig Topper	efcdd598b7	[RISCV] Teach VSETVLI inserter to use VSETIVLI when possible. We always create the VL operand using a register, but if we can determine that it came from an ADDI X0, imm with a sufficiently small immediate, we can use VSETIVLI. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97332	2021-02-24 16:07:33 -08:00
Fraser Cormack	0264ee536b	[RISCV] Remove unused CHECKs from recent test addition These didn't show up as failures locally.	2021-02-16 10:33:33 +00:00
Fraser Cormack	04977ce5ce	[RISCV] Fix a crash in fixed-length build_vector lowering Non-splatted non-integer build_vector nodes were mistakenly being lowered as VID expressions, which should not happen. VID can only be used to select integer build_vector nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96718	2021-02-16 10:25:15 +00:00

19 Commits