llvm-project

Commit Graph

Author	SHA1	Message	Date
Fraser Cormack	e6b1ac8546	[LegalizeTypes][VP] Add widening support for binary VP ops This patch adds the beginnings of more thorough support in the legalizers for vector-predicated (VP) operations. The first step is the ability to widen illegal vectors. The more complicated scenario in which the result/operands need widening but the mask doesn't has not been handled here. That would require a lot of code without an in-tree target on which to test it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107904	2021-08-19 13:08:47 +01:00
Craig Topper	d9ba1a9c5c	[RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used. We normally select these when the root node is a sext_inreg, but SimplifyDemandedBits can sometimes bypass the sext_inreg for some users. This can create situation where sext_inreg+add/sub/mul/shl is selected to a W instruction, and then the add/sub/mul/shl is separately selected to a non-W instruction with the same inputs. This patch tries to detect when it would still be ok to use a W instruction without the sext_inreg by checking the direct users. This can allow the W instruction to CSE with one created for a sext_inreg+add/sub/mul/shl. To minimize complexity and cost of checking, we make no attempt to determine if the CSE will happen and just always use a W instruction when we can. Differential Revision: https://reviews.llvm.org/D107658	2021-08-18 10:22:00 -07:00
Petr Hosek	2d4470ab89	Revert "Allow rematerialization of virtual reg uses" This reverts commit `877572cc19` which introduced PR51516.	2021-08-18 00:12:41 -07:00
jacquesguan	a7ebc4d145	[DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR Make DAGCombine turn mul by power of 2 into shl for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107883	2021-08-18 10:10:40 +08:00
Stanislav Mekhanoshin	877572cc19	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-16 12:42:42 -07:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Fraser Cormack	b5c608c377	[RISCV] Add tests covering shuffles which can be optimized These shuffles all take the form of a "splat" of the LHS and/or RHS to some degree, with one or two elements needing patched up afterwards. We currently lower all of these to full LHS/RHS vector-index shuffles with vrgather.vv. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107447	2021-08-09 10:20:42 +01:00
Craig Topper	f7076cfd3a	[DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding. We don't have real demanded bits support for MULHU, but we can still use the known bits based constant folding support at the end of SimplifyDemandedBits to simplify a MULHU. This helps with cases where we know the LHS and RHS have enough leading zeros so that the high multiply result is always 0. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106471	2021-08-05 08:31:26 -07:00
Fraser Cormack	cba6aab971	[RISCV] Support simple fractional steps in matching VID sequences This patch extends the optimization of VID-sequence BUILD_VECTORs introduced in D104921 to include simple fractional steps composed of a separated integer numerator and denominator. A notable limitation in this sequence detection is that only sequences with steps N/1 or 1/D are found, meaning that the step between elements and the frequency with which it changes is consistent across the whole sequence. Fractional steps such as 2/3 won't be matched as those would involve more complex tracking of state or some level of backtracking. As is stands, however, this patch is sufficient to match common interleave-type shuffle indices, for example matching `<0,0,1,1>` (or commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2. While the optimization is relatively `undef`-tolerant, due to greedy pattern-matching there even are some simple patterns which confuse the sequence detection into identifying either a suboptimal sequence or no sequence at all. Currently only fractional-step sequences identified as having a power-of-two denominator are actually lowered to RVV instructions. This is to avoid introducing divisions into the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106533	2021-08-03 10:38:24 +01:00
jacquesguan	7900ee0b61	[RISCV] Teach VSETVLI insertion to merge the unused VSETVLI with the one need to be insert after it. If a vsetvli instruction is not compatible with the next vector instruction, and there is no other things that may update or use VL/VTYPE, we could merge it with the next vsetvli instruction that should be insert for the vector instruction. This commit only merge VTYPE with the former vsetvli instruction which has the same VL. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106857	2021-08-03 12:06:59 +08:00
jacquesguan	ed80458834	[RISCV][test] Precommit tests for VSETVLI insertion improvement (D106857). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106865	2021-08-03 12:06:59 +08:00
Hsiangkai Wang	8b33839f01	[RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR. Differential Revision: https://reviews.llvm.org/D107139	2021-08-01 05:58:17 +08:00
Eli Friedman	bdd55b2f18	Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove lowering of load/store operations for predicates with less than 16 elements. The layout the backend was assuming for SVE predicates with less than 16 elements doesn't agree with the frontend. More work probably needs to be done here. This change is, strictly speaking, not backwards-compatible at the bitcode level. But probably nobody is actually depending on that; i1 vectors in memory are rare, and the code that does use them probably ends up forcing the alignment to something sane anyway. If we think this is a concern, I can restrict this to scalable vectors for now (where it's actually causing issues for me at the moment). Differential Revision: https://reviews.llvm.org/D88994	2021-07-31 14:09:59 -07:00
Fraser Cormack	02dd4b59bc	[RISCV] Optimize floating-point "dominant value" BUILD_VECTORs This patch aims to improve the performance of BUILD_VECTORs which are identified as containing a dominant element. Given that most floating-point constants themselves require a load from the constant pool, it was possible for the optimization to actually increase the number of individual loads on small vectors. The exception is the zero constant -- +0.0 -- which can be materialized efficiently. While this optimization could do with a proper cost model to weigh the benfits of a single vector load vs. the manipulation of individual elements -- even for integer vectors which often require several instructions to materialize -- without a concrete RVV implementation to work with any heuristic is likely to be both more obtuse and inaccurate. Until then, this patch fixes at least one known obvious deficiency. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106963	2021-07-29 09:22:34 +01:00
Fraser Cormack	a33f60db39	[RISCV] Add test case showing suboptimal BUILD_VECTOR lowering The second test case added here was pointed out to me by @craig.topper and shows how we "optimize" a two-element BUILD_VECTOR from being one load from the constant pool to two loads from the constant pool. The first test case shows that since materialization for the floating-point +0.0 value is cheap and doesn't involve a load, the optimization is more clearly beneficial here. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106962	2021-07-29 09:21:26 +01:00
Craig Topper	3852b8c70f	[RISCV] Select vector shl by 1 to a vector add. A vector add may be faster than a vector shift. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106689	2021-07-27 10:57:28 -07:00
Fraser Cormack	7b33b849bd	[SelectionDAG] Support scalable splats in U(ADD\|SUB)SAT combines This patch builds on top of D106575 in which scalable-vector splats were supported in `ISD::matchBinaryPredicate`. It teaches the DAGCombiner how to perform a variety of the pre-existing saturating add/sub combines on scalable-vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106652	2021-07-27 10:52:34 +01:00
Fraser Cormack	172487fe4c	[RISCV] Add support for vector saturating add/sub operations This patch adds support for lowering the saturating vector add/sub intrinsics to RVV instructions, for both fixed-length and scalable-vector forms alike. Note that some of the DAG combines are still not triggering for the scalable-vector tests. These require a bit more work in the DAGCombiner itself. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106651	2021-07-27 10:04:14 +01:00
Fraser Cormack	80e0266577	[RISCV] Add tests showing missed vector saturating add/sub combines These will be optimized by upcoming patches. The tests are primarily not being optimized due to the lack of support for saturating vector arithmetic in the RISC-V backend. On top of that, however, a large percentage of the scalable-vector tests are also lacking support in the DAGCombiner: either in `ISD::matchBinaryPredicate` or due to checks specifically for `BUILD_VECTOR` and not `SPLAT_VECTOR`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106649	2021-07-27 09:11:05 +01:00
Fraser Cormack	f924a3d474	[SelectionDAG] Support scalable-vector splats in yet more cases This patch extends support for (scalable-vector) splats in the DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a variety of simple combines of constants. Users of this function may now have to distinguish between `BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing with this in-tree follows the approach added for `ISD::matchUnaryPredicate` implemented in D94501. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106575	2021-07-26 10:15:08 +01:00
Fraser Cormack	1ffc369394	[RISCV] Add a test showing an incorrect vsetvli insertion This patch adds a reduced test case which identifies an illegal vsetvli inserted by the compiler. The compiler emits a vsetvli which is intended to preserve VL with the SEW/LMUL ratio e32/m1 when in fact the VL could have been set by e64/m2 in a predecessor block. Differential Revision: https://reviews.llvm.org/D106286	2021-07-23 09:27:06 -07:00
Craig Topper	5edccc4581	[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant. Since we're changing VTYPE, we may change VLMAX which could invalidate the previous VL. If we can't tell if it is safe we should use an AVL of 1 instead of keeping the old VL. This is a quick fix. We may want to thread VL to the pseudo instruction instead of making up a value. That will require ISD opcode changes and changes to the C intrinsic interface. This fixes the issue raised in D106286. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106403	2021-07-23 09:12:05 -07:00
Fraser Cormack	99ed6ce2bd	[SelectionDAG][RISCV] Add tests showing missed scalable-splat optimizations These tests show missed opportunities in the SelectionDAG layer when dealing with scalable-vector splats. All of these are handled for the equivalent `ISD::BUILD_VECTOR` code, and the tests have largely been translated from the equivalent X86 tests. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106574	2021-07-23 06:58:16 +01:00
Hsiangkai Wang	4b2dd318dd	[RISCV] Add FrameSetup/FrameDestroy flag to prologue/epilog instructions. Differential Revision: https://reviews.llvm.org/D105086	2021-07-23 11:35:19 +08:00
Fraser Cormack	7b3a69bc16	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This relands `a6ca88e908` which was originally reverted due to overflow bugs in `e3fa2b1eab`. This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-22 09:36:12 +01:00
ShihPo Hung	8d86562e5f	[RegisterCoalescer] Make resolveConflicts aware of earlyclobber Prior to this patch, it skipped the instruction defining VNI when checking if the tainted lanes are used. In the given example, VRGATHER is an illegal instruction because its DstReg overlaps with SrcReg. Therefore we need to check the defining instruction as well when there is an earlyclobber constraint. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D105684	2021-07-22 12:11:10 +08:00
Eli Friedman	0ca46a1757	[SelectionDAG] Fix the representation of ISD::STEP_VECTOR. The existing rule about the operand type is strange. Instead, just say the operand is a TargetConstant with the right width. (Legalization ignores TargetConstants, so it doesn't matter if that width is legal.) Highlights: 1. I had to substantially rewrite the AArch64 isel patterns to expect a TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe worth considering a target-specific node with some dagcombines instead of this complicated nest of isel patterns. 2. Our behavior on RV32 for vectors of i64 has changed slightly. In particular, we correctly preserve the width of the arithmetic through legalization. This changes the DAG a bit. Maybe room for improvement here. 3. I explicitly defined the behavior around overflow. This is necessary to make the DAGCombine transforms legal, and I don't think it causes any practical issues. Differential Revision: https://reviews.llvm.org/D105673	2021-07-21 10:58:40 -07:00
Craig Topper	81efb82570	[RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants. If we need to shift left anyway we might be able to take advantage of LUI implicitly shifting its immediate left by 12 to cover part of the shift. This allows us to use more bits of the LUI immediate to avoid an ADDI. isDesirableToCommuteWithShift now considers compressed instruction opportunities when deciding if commuting should be allowed. I believe this is the same or similar to one of the optimizations from D79492. Reviewed By: luismarques, arcbbb Differential Revision: https://reviews.llvm.org/D105417	2021-07-20 09:22:06 -07:00
Craig Topper	84877a098a	[RISCV] Use unordered indexed loads for MGATHER. I don't think the semantics of the llvm masked gather intrinsic care about the order the elements are loaded. For example, type legalization by splitting will chain them in parallel. This is different than scatter which we do chain in order. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106025	2021-07-20 08:46:02 -07:00
Craig Topper	d0f8047d37	[RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8.	2021-07-17 11:24:20 -07:00
ShihPo Hung	be8159bfa5	[RISCV][RVV] Precommit a test case for D105684 Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105685	2021-07-18 00:43:17 +08:00
Craig Topper	173332d175	[RISCV] Manually emit the best shift for VSCALE lowering to improve codegen. We assume VLENB is a multiple of 8 and previously relied on shift pairs being optimized to an AND+SHL/SHR and computeKnownBits removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd to have multiple uses. This patch manually emits the best shift to workaround this.	2021-07-17 00:52:07 -07:00
Craig Topper	2e65ec1010	[RISCV] Rename the fixed vector vwmacc tests to have the 'm' in their filenames. NFC	2021-07-16 10:43:17 -07:00
Craig Topper	8f0343cc9c	[RISCV] Use tail agnostic policy for fixed vector vwmacc(u). This adds new pseudoinstructions with ForceTailAgnostic set. This matches what we did for non-widening VMACC. We should move to a tail policy operand on the pseudos when we expand the intrinsic interface to include the tail policy.	2021-07-16 10:41:09 -07:00
Fraser Cormack	e3fa2b1eab	Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID" This reverts commit `a6ca88e908`. More caution is required to avoid overflow/underflow. Thanks to the santizers for catching this.	2021-07-16 15:00:20 +01:00
Fraser Cormack	a6ca88e908	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-16 10:35:13 +01:00
Fraser Cormack	03a4702c88	[RISCV] Fix the neutral element in vector 'fadd' reductions Using positive zero as the neutral element in 'fadd' reductions, while it generates better code, is incorrect. The correct neutral element is negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0. There are perhaps more optimal lowerings of negative zero avoiding constant-pool loads which could be left as future work. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D105902	2021-07-14 10:18:38 +01:00
Fraser Cormack	d991b7212b	[RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR Often when lowering vector shuffles, we split the shuffle into two LHS/RHS shuffles which are then blended together. To do so we split the original indices into two, indexed into each respective vector. These two index vectors are then separately lowered as BUILD_VECTORs. This patch forwards on any undef indices to the BUILD_VECTOR, rather than having the VECTOR_SHUFFLE lowering decide on an optimal concrete index. The motiviation for ths change is so that we don't duplicate optimization logic between the two lowering methods and let BUILD_VECTOR do what it does best. Propagating undef in this way allows us, for example, to generate `vid.v` to produce the LHS indices of commonly-used interleave-type shuffles. I have designs on further optimizing interleave-type and other common shuffle patterns in the near future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104789	2021-07-13 10:41:54 +01:00
Eli Friedman	ec1cdee6aa	[SelectionDAG][RISCV] Support @llvm.vscale.i64() on 32-bit targets. Not really useful on its own, but D105673 depends on it. Differential Revision: https://reviews.llvm.org/D105840	2021-07-12 14:53:42 -07:00
Craig Topper	2b5e53111a	[RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors. This adds a DAG combine to detect sext/zext inputs and emit a new ISD opcode. The extends will either be removed or replaced with narrower extends. Isel patterns are used to match add and widening mul to vwmacc similar to the recently added vmacc patterns. There's still some work to be to match vmulsu. We should also rewrite splats that were extended as scalars and then splatted. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D104802	2021-07-06 10:24:31 -07:00
Matt Arsenault	fae05692a3	CodeGen: Print/parse LLTs in MachineMemOperands This will currently accept the old number of bytes syntax, and convert it to a scalar. This should be removed in the near future (I think I converted all of the tests already, but likely missed a few). Not sure what the exact syntax and policy should be. We can continue printing the number of bytes for non-generic instructions to avoid test churn and only allow non-scalar types for generic instructions. This will currently print the LLT in parentheses, but accept parsing the existing integers and implicitly converting to scalar. The parentheses are a bit ugly, but the parser logic seems unable to deal without either parentheses or some keyword to indicate the start of a type.	2021-06-30 16:54:13 -04:00
Craig Topper	010f0f000f	Revert "[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions." I thought this might help with another optimization I was thinking about, but I don't think it will. So it just wastes compile time calling computeKnownBits for no benefit. This reverts commit `81b2f95971`.	2021-06-27 10:33:43 -07:00
Craig Topper	81b2f95971	[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions.	2021-06-26 11:57:26 -07:00
Fraser Cormack	ab1bd25593	[RISCV] Permit larger RVV stacks and stack offsets This patch teaches the compiler to generate code to handle larger RVV stack sizes and stack offsets which resolve an amount larger than 2047 vector registers in size. The previous behaviour was asserting on such large values as it was only able to materialize the constant by feeding it to the 12-bit immediate of an `ADDI` instruction. The compiler can now materialize this amount into a temporary register before continuing with the computation. A test case for this scenario is included which also checks that the temporary register used to materialize the amount doesn't require an additional spill slot over what we're already reserving for RVV code. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D104727	2021-06-25 07:17:33 +01:00
Fraser Cormack	a4729f7f88	[RISCV] Lower RVV vector SELECTs to VSELECTs This patch optimizes the code generation of vector-type SELECTs (LLVM select instructions with scalar conditions) by custom-lowering to VSELECTs (LLVM select instructions with vector conditions) by splatting the condition to a vector. This avoids the default expansion path which would either introduce control flow or fully scalarize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104772	2021-06-24 10:12:51 +01:00
Joe Ellis	3c4dbf6ea9	[Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics With regards to overrunning, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. (llvm.experimental.vector.extract) Elements ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. For the non-mixed cases (e.g. inserting/extracting a scalable into/from another scalable, or inserting/extracting a fixed into/from another fixed), it is possible to statically check whether or not the above conditions are met. This was previously missing from the verifier, and if the conditions were found to be false, the result of the insertion/extraction would be replaced with an undef. With regards to invalid indices, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) ``idx`` represents the starting element number at which ``subvec`` will be inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum vector length. (llvm.experimental.vector.extract) The ``idx`` specifies the starting element number within ``vec`` from which a subvector is extracted. ``idx`` must be a constant multiple of the known-minimum vector length of the result type. Similarly, these conditions were not previously enforced in the verifier. In some circumstances, invalid indices were permitted silently, and in other circumstances, an undef was spawned where a verifier error would have been preferred. This commit adds verifier checks to enforce the constraints above. Differential Revision: https://reviews.llvm.org/D104468	2021-06-23 10:33:22 +00:00
Craig Topper	9080659ac7	[RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and mul. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104163	2021-06-21 11:27:44 -07:00
Craig Topper	ac87133f1d	[RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch. Previously we went directly to unknown state on VTYPE mismatch. If we instead remember the partial match, we can use this to still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL ratio match. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104069	2021-06-18 12:16:07 -07:00

1 2 3 4 5 ...

422 Commits