llvm-project

Commit Graph

Author	SHA1	Message	Date
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00
Philip Reames	b12930e133	[RISCV] Switch to using get.active.lane.mask when tail folding The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change. The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice. Differential Revision: https://reviews.llvm.org/D129221	2022-07-08 10:24:59 -07:00
Philip Reames	aadc9d26a3	[RISCV] Cost model for scalable reductions This extends the existing cost model for reductions for scalable vectors. The existing cost model assumes that reductions are roughly logarithmic in cost for unordered variants and linear for ordered ones. This change keeps that same basic model, and extends it out to the maximum number of elements a scalable vector could possibly have. This results in costs which aren't terribly high for unordered reductions, but are for ordered ones. This seems about right; we want to strongly bias away from using scalable ordered reductions if the cost might be linear in VL. Differential Revision: https://reviews.llvm.org/D127447	2022-06-27 12:44:38 -07:00
Philip Reames	9803b0d1e7	[RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN). The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable. As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended. Differential Revision: https://reviews.llvm.org/D128547	2022-06-25 11:25:23 -07:00
Philip Reames	7bfad7b9d8	[RISCV] Replace two calls to getMinRVVVectorSizeInBits with useRVVForFixedLengthVectors [nfc]	2022-06-23 15:59:33 -07:00
Philip Reames	f7bb691d61	[RISCV] Implement isElementTypeLegalForScalableVector TTI hook This brings us into alignment with AArch64, and in the process fixes a compiler crash bug in uniform store handling in the vectorizer. Before the recent invalid cost bailout work, this would have also avoided crashes on invalid costs in some cases. I honestly think the vectorizer should gracefully bailout on uniform stores it can't use a scatter for, but it doesn't, so lets take the path of least resistance here. It's also possible that there are other vectorizer bugs AArch64 isn't seeing because of this hook; we don't want to be finding them either. Differential Revision: https://reviews.llvm.org/D127514	2022-06-10 13:20:58 -07:00
Craig Topper	0c66deb498	[RISCV] Scalarize gather/scatter on RV64 with Zve32* extension. i64 indices aren't supported on Zve32*. Scalarize gathers to prevent generating illegal instructions. Since InstCombine will aggressively canonicalize GEP indices to pointer size, we're pretty much always going to have an i64 index. Trying to predict when SelectionDAG will find a smaller index from the TTI hook used by the ScalarizeMaskedMemIntrinPass seems fragile. To optimize this we probably need an IR pass to rewrite it earlier. Test RUN lines have also been added to make sure the strided load/store optimization still works. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127179	2022-06-07 08:07:50 -07:00
yanming	bc93d51d36	[NFC][RISCV][format] Blank line between functions, remove unnecessary semicolon.	2022-06-06 15:38:14 +08:00
yanming	8d9d8f866a	[RISCV] Define risc-v's own register class to model FP Register. The default RegisterClass is not enough to model RISCV Register. We define risc-v's own register class to model FP Register. This helps to better estimate the register pressure in the loop-vectorize. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D126854	2022-06-06 14:43:52 +08:00
Peter Waller	ade47bdc31	[LV] Improve register pressure estimate at high VFs Previously, `getRegUsageForType` was implemented using `getTypeLegalizationCost`. `getRegUsageForType` is used by the loop vectorizer to estimate the register pressure caused by using a vector type. However, `getTypeLegalizationCost` currently only appears to understand splitting and not scalarization, so significantly underestimates the register requirements. Instead, use `getNumRegisters`, which understands when scalarization can occur (via computeRegisterProperties). This was discovered while investigating D118979 (Set maximum VF with shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the loop vectorizer previously ends up costing an v128i1 as 2 v64i* registers where it actually occupies 128 i32 registers. I'm sending this patch early for comment, I'm still doing some sanity checking with LNT. I note that getRegisterClassForType appears to return VectorRC even though the type in question (large vNi1 types) end up occupying scalar registers. That might be worth fixing too. Differential Revision: https://reviews.llvm.org/D125918	2022-05-23 07:57:45 +00:00
Liqin.Weng	d95513ae3a	[RISCV] remove useless code When legality check for vectoring reduction， hasVInstructions() check be unneeded. RISCV can only loop vectorization with hasVInstructions() Reviewed By: kito-cheng, craig.topper Differential Revision: https://reviews.llvm.org/D125460	2022-05-16 12:54:03 +00:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Craig Topper	76192182d0	[RISCV] Remove riscv-v-fixed-length-vector-elen-max command line option. This was added before Zve extensions were defined. I think users should use Zve32x or Zve32f now. Though we will lose support for limiting ELEN to 16 or 8, but I hope no one was using that. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D123418	2022-04-11 10:14:48 -07:00
LiaoChunyu	505fce5a9e	[RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic Scalable vectors llvm.experimental.stepvector intrinsic will crash due to an invalid cost when run the code through the loopunroll. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D122782	2022-04-11 10:19:23 +08:00
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `e6ead19b77`.	2022-03-23 18:32:17 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit `27bd8f9492`. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `f7d7d2a08d`.	2022-03-22 16:41:55 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit `79613185d3`. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
Yeting Kuo	ecd7a0132a	[RISCV] Add basic cost model for vector casting To perform the cost model of vector casting, the patch consider most vector casts as their scalar form and consider those vector form of free scalr castings as 1. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121771	2022-03-22 14:17:08 +08:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit `9136145eb0` broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Yeting Kuo	ae7c6647f3	[RISCV] Add basic code modeling for fixed length vector reduction. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121447	2022-03-14 11:04:31 +08:00
Alex Tsao	89f15fc687	[RISCV] Add cost modelling for masked memory op The patch adds very basic cost model for masked memory op on scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D117884	2022-03-03 20:47:58 +08:00
Nikita Popov	98cfcae4e9	Revert "[RISCV] Add cost modelling for masked memory op" This reverts commit `76f243b53b`. The newly added test fails.	2022-03-02 17:32:10 +01:00
Alex Tsao	76f243b53b	[RISCV] Add cost modelling for masked memory op The patch adds very basic cost model for masked memory op on scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D117884	2022-03-02 22:48:41 +08:00
Craig Topper	09629215c2	[RISCV] Add a really basic cost model for SK_Splice. While testing scalable vectors I found that if we generate a vector splice intrinsic and run the code through the loop unroller, we'll crash due to an invalid cost. This adds a basic cost based on the 2 slide instructions used by the lowering in D119303. We probably need to factor LMUL into this, but that's true for arithmetic instructions too. So I've ignored for the moment. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D119316	2022-02-09 11:43:31 -08:00
Kito Cheng	cc35161dc7	[RISCV] Add initial support for getRegUsageForType and getNumberOfRegisters Those two TTI hooks are used during vectorization for calculating register pressure, the default implementation isn't consider for LMUL, and that's also definitly wrong value for register number (all register class are 8 registers). So in this patch we tried to: 1. Calculate right register usage for vector type and scalar type. 2. Return right number of register for general purpose register and vector register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116890	2022-01-17 15:27:54 +08:00
Craig Topper	042394b69e	[RISCV] Add a command line option to control the LMUL used by TTI's getRegisterBitWidth. By default we return the width of an LMUL=1 register. We can enable testing with larger LMUL values by returning a larger bit width. This patch adds a RISCV specific option to provide a LMUL which will be multiplied by the LMUL=1 bit width. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D116339	2022-01-07 20:02:10 -08:00
Kito Cheng	f142c45f1e	[RISCV] Set getMinVectorRegisterBitWidth to 16 if enable fixed length vector code gen for RVV getMinVectorRegisterBitWidth means what vector types is supported in this target, and actually RISC-V support all fixed length vector types with vector length less than `getMinRVVVectorSizeInBits`, so set it to 16, means 2 x i8, that is minimal fixed length vector size in theory. That also fixed one issue, some testcase migth become non-vectorizable when `-riscv-v-vector-bits-min` set to larger value, because the vector size is smaller than `-riscv-v-vector-bits-min`. For example, following code can vectorize by SLP with `-riscv-v-vector-bits-min=128` or `-riscv-v-vector-bits-min=256`, but can't vectorize `-riscv-v-vector-bits-min=512` or larger: ``` void foo(double *da) { da[0] = 0; da[1] = 1; da[2] = 2; da[3] = 3; } ``` Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116534	2022-01-08 11:16:21 +08:00
Craig Topper	a9486a40f7	[RISCV] Disable interleaving scalar loops in the loop vectorizer. The loop vectorizer can interleave scalar loops even if it doesn't vectorize them. I don't believe we intended to enable this when we enabled interleaving for vector instructions. Disable interleaving for VF=1 like X86 and AMDGPU already do. Test lifted from AMDGPU. Differential Revision: https://reviews.llvm.org/D115975	2021-12-23 08:37:24 -06:00
Michael Berg	f95ee6074a	[RISCV] Add target specific loop unrolling and peeling preferences Both these preference helper functions have initial support with this change. The loop unrolling preferences are set with initial settings to control thresholds, size and attributes of loops to unroll with some tuning done. The peeling preferences may need some tuning as well as the initial support looks much like what other architectures utilize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113798	2021-12-18 12:54:50 -08:00
Michael Berg	3e363f14e1	Revert "[RISCV] Add target specific loop unrolling and peeling preferences" This reverts commit `8487981a72`.	2021-12-07 15:13:42 -08:00
Michael Berg	8487981a72	[RISCV] Add target specific loop unrolling and peeling preferences Both these preference helper functions have initial support with this change. The loop unrolling preferences are set with initial settings to control thresholds, size and attributes of loops to unroll with some tuning done. The peeling preferences may need some tuning as well as the initial support looks much like what other architectures utilize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113798	2021-12-07 15:06:42 -08:00
Craig Topper	1387483e72	[RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI Add new hasVInstructions() which is currently equivalent. Replace vector uses of hasStdExtZfh/F/D with new vector specific versions. The vector spec no longer requires that the vectors implement the same types as scalar. It only requires that the scalar type is the maximum size the vectors can support. This is currently implemented using the scalar rule we were using before. Add new hasVInstructionsI64() begin using to qualify code that requires i64 vector elements. This is all NFC for now, but we can start using this to better implement D112408 which introduces the Zve extensions. Reviewed By: frasercrmck, eopXD Differential Revision: https://reviews.llvm.org/D112496	2021-10-27 19:33:48 -07:00
Craig Topper	d85e347a28	[RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter. For strided accesses the loop vectorizer seems to prefer creating a vector induction variable with a start value of the form <i32 0, i32 1, i32 2, ...>. This value will be incremented each loop iteration by a splat constant equal to the length of the vector. Within the loop, arithmetic using splat values will be done on this vector induction variable to produce indices for a vector GEP. This pass attempts to dig through the arithmetic back to the phi to create a new scalar induction variable and a stride. We push all of the arithmetic out of the loop by folding it into the start, step, and stride values. Then we create a scalar GEP to use as the base pointer for a strided load or store using the computed stride. Loop strength reduce will run after this pass and can do some cleanups to the scalar GEP and induction variable. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107790	2021-09-20 09:39:44 -07:00
Luke	a78dd726f4	[SLP][RISCV] Implement unsigned getMinVectorRegisterBitWidth() for RISCV Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108973	2021-09-01 14:25:15 +08:00
Nikita Popov	c1b7540645	[TTI] Sink IVDescriptors.h include (NFC) Forward declare RecurrenceDescriptor and include IVDescritor.h only in implementation code that actually needs it.	2021-08-30 22:41:58 +02:00
Craig Topper	0eeab8b282	[RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization. This adds an ELEN limit for fixed length vectors. This will scalarize any elements larger than this. It will also disable some fractional LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32 vectors can't use any fractional LMULs, i16/f16 can only use mf2, and i8 can use mf2 and mf4. We may also need something for the scalable vectors, but that has interactions with the intrinsics and we can't scalarize a scalable vector. Longer term this should come from one of the Zve* features	2021-08-27 10:17:35 -07:00
Craig Topper	8f6cea43e7	[RISCV] Use RISCV::RVVBitsPerBlock for RGK_ScalableVector in getRegisterBitWidth. I might be wrong, but I think this is should be width of the known min size we use for scalable vectors. It shouldn't scale with minimum vlen. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107945	2021-08-17 11:13:15 -07:00
Luke	c2e97ba85e	[RISCV] Don't enable Interleaved Access Vectorization The patch https://reviews.llvm.org/D101469 is intended to enable loop unrolling, not interleaved access vectorization. The method bool enableInterleavedAccessVectorization() should not be implemented.	2021-06-18 12:32:30 +08:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Fraser Cormack	3b0a33d0ad	[RISCV] Expand unaligned fixed-length vector memory accesses RVV vectors must be aligned to their element types, so anything less is unaligned. For regular loads and stores, our custom-lowering of fixed-length vectors meant that we opted out of LegalizeDAG's built-in unaligned expansion. This patch adds that logic in to our custom lower function. For masked intrinsics, we declare that anything unaligned is not legal, leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us. Note that neither of these methods can handle the expansion of scalable-vector memory ops, so those cases are left alone by this patch. Scalable loads and stores already go through expansion by default but hit an assertion, and scalable masked intrinsics will silently generate incorrect code. It may be prudent to return an error in both of these cases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102493	2021-06-02 09:27:44 +01:00
Luke	c4c3869554	[RISCV] Enable interleaved vectorization for RVV Enable interleaved vectorization for RVV. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101469	2021-05-29 11:03:27 +08:00
Luke	1595994b28	[RISCV] Add legality check for vectorizing reduction Check if it is legal to vectorize reduction. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99509	2021-05-20 17:45:46 +08:00
Fraser Cormack	6f17613bfb	[RISCV][VP] Lower VP ISD nodes to RVV instructions This patch supports all of the current set of VP integer binary intrinsics by lowering them to to RVV instructions. It does so by using the existing RISCVISD *_VL custom nodes as an intermediate layer. Both scalable and fixed-length vectors are supported by using this method. One notable change to the existing vector codegen strategy is that scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length BUILD_VECTOR counterparts. This allows them to reuse the existing "all-ones" VL patterns. To reduce the size of the phabricator diff, some tests are intentionally left out and will be added later if the patch is accepted. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101826	2021-05-05 12:32:24 +01:00
Sander de Smalen	f9a50f04ba	[TTI] NFC: Change getIntImmCost[Inst\|Intrin] to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565	2021-04-23 16:06:36 +01:00
Sander de Smalen	fd1f8a5462	[TTI] NFC: Change getGatherScatterOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100200	2021-04-13 14:20:59 +01:00
Craig Topper	512bae81cc	[RISCV] Add basic cost modelling for fixed vector gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99142	2021-03-24 11:14:14 -07:00
Craig Topper	f24f09d256	[RISCV] Add TTI support for cpop with Zbb This will tell loop idiom recognize that it can make popcount loops countable using the ctpop intrinsic. I didn't bother checking for illegal types. Type legalization knows how to split a ctpop into multiple ctops added together. Assuming we only receive reasonable integer bit widths, a few cpop instructions added together is probably better than the loop. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99203	2021-03-24 10:58:42 -07:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Craig Topper	294efcd6f7	[RISCV] Add support for fixed vector masked gather/scatter. I've split the gather/scatter custom handler to avoid complicating it with even more differences between gather/scatter. Tests are the scalable vector tests with the vscale removed and dropped the tests that used vector.insert. We're probably not as thorough on the splitting cases since we use 128 for VLEN here but scalable vector use a known min size of 64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98991	2021-03-22 10:17:30 -07:00

1 2

58 Commits