llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Fraser Cormack	7b3a69bc16	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This relands `a6ca88e908` which was originally reverted due to overflow bugs in `e3fa2b1eab`. This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-22 09:36:12 +01:00
Fraser Cormack	e3fa2b1eab	Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID" This reverts commit `a6ca88e908`. More caution is required to avoid overflow/underflow. Thanks to the santizers for catching this.	2021-07-16 15:00:20 +01:00
Fraser Cormack	a6ca88e908	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-16 10:35:13 +01:00
Fraser Cormack	d991b7212b	[RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR Often when lowering vector shuffles, we split the shuffle into two LHS/RHS shuffles which are then blended together. To do so we split the original indices into two, indexed into each respective vector. These two index vectors are then separately lowered as BUILD_VECTORs. This patch forwards on any undef indices to the BUILD_VECTOR, rather than having the VECTOR_SHUFFLE lowering decide on an optimal concrete index. The motiviation for ths change is so that we don't duplicate optimization logic between the two lowering methods and let BUILD_VECTOR do what it does best. Propagating undef in this way allows us, for example, to generate `vid.v` to produce the LHS indices of commonly-used interleave-type shuffles. I have designs on further optimizing interleave-type and other common shuffle patterns in the near future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104789	2021-07-13 10:41:54 +01:00
Jim Lin	242ddd5089	[RISCV][NFC] Add a single space after comma for VType In most of cases, it has a single space after comma in assembly operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103790	2021-06-09 11:18:22 +08:00
Craig Topper	527cd01314	[RISCV] Teach vsetvli insertion to use vsetvl x0, x0 form when we can tell that VLMAX and AVL haven't changed. This can help avoid needing a virtual register for the vsetvl output when the AVL is X0. For other register AVLs it can shorter the live range of the AVL register if it isn't needed later. There's probably no advantage when AVL is a 5 bit immediate that can use vsetivli. But do it anyway for consistency. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103215	2021-05-27 10:11:38 -07:00
Craig Topper	fdf10e6197	[RISCV] Use X0 as destination of inserted vsetvli when possible. We aren't going to connect the result to anything so we might as well avoid allocating a register. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D102031	2021-05-26 13:08:51 -07:00
Fraser Cormack	797e580db9	[RISCV][NFC] Simplify test run lines Several tests had -verify-machineinstrs twice, and several tests were explicitly specifying the default FileCheck prefix of CHECK.	2021-05-13 12:41:00 +01:00
Craig Topper	ce6e4f27dd	[RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min. My thought process is that if v2i64 is an LMUL=1 type then v2i32 should be an LMUL=1/2 type. We limit the fractional LMUL so that SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This ensures there's always a fractional LMUL available to truncate a type. This does reduce the number of vsetvlis in some cases. Some tests increase vsetvlis because the best container type for a mask type is dependent on the LMUL+SEW that the mask was produced from, but you can't tell that from the type. I think this is something we need to solve this in the machine IR when optimizing vsetvlis. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D101215	2021-05-11 09:42:48 -07:00
Craig Topper	ce09dd54e6	[RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter. This adds a special operand type that is allowed to be either an immediate or register. By giving it a unique operand type the machine verifier will ignore it. This perturbs a lot of tests but mostly it is just slightly different instruction orders. Something bad did happen to some min/max reduction tests. We're spilling vector registers when we weren't before. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101246	2021-04-27 14:38:16 -07:00
Craig Topper	e2cd92cb9b	[RISCV] Match splatted load to scalar load + splat. Form strided load during isel. This modifies my previous patch to push the strided load formation to isel. This gives us opportunity to fold the splat into a .vx operation first. Using a scalar register and a .vx operation reduces vector register pressure which can be important for larger LMULs. If we can't fold the splat into a .vx operation, then it can make sense to use a strided load to free up the vector arithmetic ALU to do actual arithmetic rather than tying it up with vmv.v.x. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101138	2021-04-26 13:32:03 -07:00
Fraser Cormack	c9a93c3e01	[RISCV] Lower vector shuffles to vrgather operations This patch extends the lowering of RVV fixed-length vector shuffles to avoid the default stack expansion and instead lower to vrgather instructions. For "permute"-style shuffles where one vector is swizzled, we can lower to one vrgather. For shuffles involving two vector operands, we lower to one unmasked vrgather (or splat, where appropriate) followed by a masked vrgather which blends in the second half. On occasion, when it's not possible to create a legal BUILD_VECTOR for the indices, we use vrgatherei16 instructions with 16-bit index types. For 8-bit element vectors where we may have indices over 255, we have a fairly blunt fallback to the stack expansion to avoid custom-splitting of the vector types. To enable the selection of masked vrgather instructions, this patch extends the various RISCVISD::VRGATHER nodes to take a passthru operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100549	2021-04-19 11:13:13 +01:00
Fraser Cormack	eae0ac3a1f	[RISCV] Pre-commit vector shuffle test cases This codegen will be improved by future patches.	2021-04-15 10:31:13 +01:00
Fraser Cormack	99211352c1	[RISCV] Optimize select-like vector shuffles This patch adds a small optimization for vector shuffle lowering, detecting shuffles which can be re-expressed as vector selects. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99270	2021-03-25 11:39:57 +00:00
Fraser Cormack	1e56e8717f	[RISCV] Pre-commit shuffle test cases for D99270	2021-03-25 10:41:40 +00:00

17 Commits