llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	68884dde70	[LV] Move LoopVersioning creation to LVP::execute. At the moment LoopVersioning is only created for inner-loop vectorization. This patch moves it to LVP::execute, which means it will also be added for epilogue vectorization. As a consequence, the proper noalias metadata is now also added to epilogue vector loops. LVer will be moved to VPTransformState as follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127966	2022-06-30 12:14:32 +01:00
Florian Hahn	24b5f8e0d0	[VPlan] Make sure optimizeInductions removes wide ind from scalar plan. In some cases, there may be widened users of inductions even though the plan includes the scalar VF. In those cases, make sure we still replace the VPWidenIntOrFpInductionRecipe with scalar steps, as otherwise we may try to execute a VPWidenIntOrFpInductionRecipe with a scalar VF. Alternatively the patch could also split the range if needed. This fixes a crash exposed by D123720. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128755	2022-06-30 09:11:48 +01:00
Florian Hahn	476f9c909c	[LV] Add test case showing dead recipe blocking region merging.	2022-06-29 16:34:12 +01:00
Philip Reames	f239cddbac	[RISCV] Pin two tests to fixed length vectorization to preserve test intent	2022-06-28 13:53:31 -07:00
Philip Reames	20dd3297b1	[LV] Allow scalable vectorization with vscale = 1 This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling. (I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.) This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable. This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code. In practice, this patch unblocks scalable vectorization for ELEN types on RISCV. Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1. Differential Revision: https://reviews.llvm.org/D128542	2022-06-27 13:38:57 -07:00
Philip Reames	9803b0d1e7	[RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN). The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable. As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended. Differential Revision: https://reviews.llvm.org/D128547	2022-06-25 11:25:23 -07:00
Philip Reames	ae8fac6f98	[LV][RISCV] Add coverage showing scalable codegen when etype != ELEN We currently have a costing bug around the etype == ELEN case, so add otherwise duplicate tests to show test diffs as I work on other parts of costing.	2022-06-24 11:38:54 -07:00
Philip Reames	056d63938a	[RISCV] Split a vectorizer test runline so that upcoming changes in defaults are visible	2022-06-24 08:48:11 -07:00
Philip Reames	adbe718675	[RISCV] Modify a test line so it exercises the intended configuration once we turn on scalable vectorization	2022-06-24 08:48:11 -07:00
Philip Reames	46ea4b5ea1	[LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it. (Well, we could, but we haven't implemented that case.) This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.	2022-06-23 12:41:09 -07:00
Florian Hahn	569d84fe99	[VPlan] Remove dead recipes across whole plan. This extends removeDeadRecipe to remove recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127580	2022-06-23 13:36:02 +02:00
Serguei Katkov	8f891b7c39	[LoopVectorize] Uninitialized phi node leads to a crash in SSAUpdater. createInductionResumeValues creates a phi node placeholder without filling incoming values. Then it generates the incoming values. It includes triggering of SCEV expander which may invoke SSAUpdater. SSAUpdater has an optimization to detect number of predecessors basing on incoming values if there is phi node. In case phi node is not filled with incoming values - the number of predecessors is detected as 0 and this leads to segmentation fault. In other words SSAUpdater expects that phi is in good shape while LoopVectorizer breaks this requirement. The fix is just prepare all incoming values first and then build a phi node. Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D128033	2022-06-22 10:49:27 +07:00
Philip Reames	8ae0664282	LoopVect, tests] Add some basic coverage for scalable costing of scatter/gather patterns on RISCV This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled.	2022-06-21 13:54:53 -07:00
Philip Reames	2cf320d41e	[LoopVect, tests] Add some basic coverage for scalable costing on RISCV This just adds some very basic vectorizer testing with both fixed and scalable vectorization enabled. For context, I just yesterday fixed a crash in costing of the splat_ptr example - see bbf3fd.	2022-06-21 13:35:38 -07:00
Florian Hahn	88ce403c6a	[LV] Add new block to place recurrence splice, if needed. In some cases, a recurrence splice instructions needs to be inserted between to regions, for example if the regions get re-arranged during sinking. Fixes #56146.	2022-06-21 21:54:37 +02:00
Florian Hahn	e9cced2739	Recommit "[LAA] Initial support for runtime checks with pointer selects." This reverts commit `7aa8a67882`. This version includes fixes to address issues uncovered after the commit landed and discussed at D11448. Those include: * Limit select-traversal to selects inside the loop. * Freeze pointers resulting from looking through selects to avoid branch-on-poison.	2022-06-17 21:06:26 +02:00
Malhar Jajoo	6bb40552f2	[LoopVectorize] Add support for invariant stores of ordered reductions Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D126772	2022-06-17 14:56:21 +01:00
Tiehu Zhang	b329156f4f	[AArch64][LV] AArch64 does not prefer vectorized addressing TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads. For aarch64, only gather/scatter (supported by SVE) can deal with vectors of addresses. This patch specializes the hook for AArch64, to return true only when we enable SVE. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D124612	2022-06-17 18:32:50 +08:00
Florian Hahn	467491202e	[LV] Update test to use GEP so it is not dead. The test should use the GEP for the store, so it is not dead.	2022-06-12 16:57:47 +01:00
Philip Reames	f7bb691d61	[RISCV] Implement isElementTypeLegalForScalableVector TTI hook This brings us into alignment with AArch64, and in the process fixes a compiler crash bug in uniform store handling in the vectorizer. Before the recent invalid cost bailout work, this would have also avoided crashes on invalid costs in some cases. I honestly think the vectorizer should gracefully bailout on uniform stores it can't use a scatter for, but it doesn't, so lets take the path of least resistance here. It's also possible that there are other vectorizer bugs AArch64 isn't seeing because of this hook; we don't want to be finding them either. Differential Revision: https://reviews.llvm.org/D127514	2022-06-10 13:20:58 -07:00
Philip Reames	0e29a80fdc	[RISCV] Add cost model for reverse shuffle The majority of the cost appears to be forming the indices vector. Differential Revision: https://reviews.llvm.org/D127141	2022-06-09 07:21:40 -07:00
Florian Hahn	20d798bd47	Recommit "[SCEV] Look through single value PHIs." (take 3) This reverts commit `1fbdbb5595`. All known issues surfaced by this patch should have been fixed now. The fixes included fixing issues with SCEV expansion in LV and DA's reliance on LCSSA phis.	2022-06-09 15:20:10 +01:00
Florian Hahn	85983ca42e	[VPlan] Replace remaining use of needsScalarIV. All information is already available in VPlan. Note that there are some test changes, because we now can correctly look through instructions like truncates to analyze the actual users. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123541	2022-06-09 12:05:37 +01:00
Florian Hahn	3d663308a5	[LV] Add test that caused revert of D123720.	2022-06-08 12:25:17 +01:00
David Sherwood	997ecb0036	[LoopVectorize] Add FastMathFlags to the select used for reductions with tail-folding Based on reviewer comments on https://reviews.llvm.org/D126692 I've added FastMathFlags to the select instruction used when tail-folding with reductions. These flags can then be used by InstCombine to decide upon the most optimal floating point identity value for fadd/fsub. Doing so unlocks further optimisations, such as folding selects into masked loads. Differential Revision: https://reviews.llvm.org/D126778	2022-06-07 10:21:31 +01:00
Philip Reames	6071de3db6	[RISCV] Autogen a test for ease of update	2022-06-06 12:44:34 -07:00
Florian Hahn	eaf48dd9b0	[VPlan] Replace BranchOnCount with BranchOnCond if TC <= UF * VF. Try to simplify BranchOnCount to `BranchOnCond true` if TC <= UF * VF. This is an alternative to D121899 which simplifies the VPlan directly instead of doing so late in code-gen. The potential benefit of doing this in VPlan is that this may help cost-modeling in the future. The reason this is done in prepareToExecute at the moment is that a single plan may be used for multiple VFs/UFs. There are further simplifications that can be applied as follow ups: 1. Replace inductions with constants 2. Replace vector region with regular block. Fixes #55354. Depends on D126679. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126680	2022-06-06 09:38:53 +01:00
yanming	8d9d8f866a	[RISCV] Define risc-v's own register class to model FP Register. The default RegisterClass is not enough to model RISCV Register. We define risc-v's own register class to model FP Register. This helps to better estimate the register pressure in the loop-vectorize. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D126854	2022-06-06 14:43:52 +08:00
Florian Hahn	a5bb4a3b4d	[VPlan] Replace CondBit with BranchOnCond VPInstruction. This patch removes CondBit and Predicate from VPBasicBlock. To do so, the patch introduces a new branch-on-cond VPInstruction opcode to model a branch on a condition explicitly. This addresses a long-standing TODO/FIXME that blocks shouldn't be users of VPValues. Those extra users can cause issues for VPValue-based analyses that don't expect blocks. Addressing this fixme should allow us to re-introduce `266ea446ab`. The generic branch opcode can also be used in follow-up patches. Depends on D123005. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D126618	2022-06-03 11:48:31 +01:00
Florian Hahn	72aca94b90	[LV] Add additional tests for pointer select support. Additional test cases for D114487.	2022-06-01 21:19:03 +01:00
Florian Hahn	05776122b6	[VPlan] Use region for each loop in native path. This patch updates the VPlan native path to use VPRegionBlocks for all loops in a loop nest. Up to now, only the outermost loop used a region. This is a step towards unifying both paths and keep things consistent between them. It also prepares various code-gen parts for modeling the pre-header in the inner loop vectorizer (D121624). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123005	2022-06-01 10:41:05 +01:00
Nikita Popov	03aceab08b	[ValueTracking] Enable -branch-on-poison-as-ub by default Now that SimpleLoopUnswitch and other transforms no longer introduce branch on poison, enable the -branch-on-poison-as-ub option by default. The practical impact of this is mostly better flag preservation in SCEV, and some freeze instructions no longer being necessary. Differential Revision: https://reviews.llvm.org/D125299	2022-06-01 10:46:06 +02:00
Nikita Popov	36cbdaa163	[InstCombine] Fix inbounds preservation when swapping GEPs (PR44206) When reassociating GEPs, we can only keep inbounds if both original GEPs were inbounds, and their offsets have the same sign. For the sake of simplicity, I only handle the case where both offsets are non-negative here. It would probably be fine to just not preserve inbounds at all here, but as I don't see a compile-time impact for adding the isKnownNonNegative() calls I went with this more conservative approach. Fixes https://github.com/llvm/llvm-project/issues/44206. Differential Revision: https://reviews.llvm.org/D126687	2022-05-31 15:45:02 +02:00
Florian Hahn	b7d2b160c3	[VPlan] Add test for printing VPlan for outer loop vectorization. Test coverage for D123005.	2022-05-30 18:19:52 +01:00
Nikita Popov	a770f534e6	[InstCombine] When swapping GEPs, only keep inbounds if both are If only one of the GEPs is inbounds, then after swapping, there is no guarantee that one of them will be inbounds as well (see e.g. https://alive2.llvm.org/ce/z/agaCnp). This is only a partial fix, because even if both are inbounds, the result is not necessarily inbounds (if the offsets have different signs).	2022-05-30 17:04:42 +02:00
Liqin.Weng	a84026821b	[RISCV] Add test for experimental.vector.reverse ``` void vector_reverse_i64(int A, int B, int n) { #pragma clang loop vectorize_width(4, scalable) for (int i = n-1; i >= 0; i--) A[i] = B[i] + 1; } ``` When option: scalable-vectorization is on (or set #pragma clang loop vectorize_width(elements, scalable)), Reverse Iterators can't loop vectorization as <vscale x elements x elementType> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125866	2022-05-27 06:30:07 +00:00
Ivan Kosarev	ad1d60c3be	[FileCheck] Catch missspelled directives. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125604	2022-05-26 11:37:19 +01:00
David Green	75631438e3	[AArch64] Costmodel tests for llvm.vscale intrinsics. NFC These shows that the cost of a @llvm.vscale is indeed 1, not 10.	2022-05-26 10:16:21 +01:00
David Sherwood	87936c7b13	[LoopVectorize] Fix assertion failure in fixReduction when tail-folding When compiling the attached new test in scalable-reductions-tf.ll we were hitting this assertion in fixReduction: Assertion `isa<PHINode>(U) && "Reduction exit must feed Phi's or select" The loop contains a reduction and an intermediate store of the reduction value. When vectorising with tail-folding the contains of 'U' in the assertion above happened to be a scatter_store. It turns out that we were still creating a widen recipe for the invariant store, despite knowing that we can actually sink it. The simplest fix is to change buildVPlanWithVPRecipes so that we look for invariant stores before attempting to widen it. Differential Revision: https://reviews.llvm.org/D126295	2022-05-25 11:46:32 +01:00
Jingu Kang	bb82f74612	Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit `42ebfa8269`. The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure. Differential Revision: https://reviews.llvm.org/D118979	2022-05-23 16:15:45 +01:00
Peter Waller	ade47bdc31	[LV] Improve register pressure estimate at high VFs Previously, `getRegUsageForType` was implemented using `getTypeLegalizationCost`. `getRegUsageForType` is used by the loop vectorizer to estimate the register pressure caused by using a vector type. However, `getTypeLegalizationCost` currently only appears to understand splitting and not scalarization, so significantly underestimates the register requirements. Instead, use `getNumRegisters`, which understands when scalarization can occur (via computeRegisterProperties). This was discovered while investigating D118979 (Set maximum VF with shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the loop vectorizer previously ends up costing an v128i1 as 2 v64i* registers where it actually occupies 128 i32 registers. I'm sending this patch early for comment, I'm still doing some sanity checking with LNT. I note that getRegisterClassForType appears to return VectorRC even though the type in question (large vNi1 types) end up occupying scalar registers. That might be worth fixing too. Differential Revision: https://reviews.llvm.org/D125918	2022-05-23 07:57:45 +00:00
Florian Hahn	419e49621f	[LV] Add check line to test interleaving only with induction cast. Also simplify the value names a bit in the test.	2022-05-22 20:11:47 +01:00
Florian Hahn	145fe57106	[LV] Use exiting block instead of latch in addUsersInExitBlock. The latch may not be the exiting block. Use the exiting block instead when looking up the incoming value of the LCSSA phi node. This fixes a crash with early-exit loops.	2022-05-22 18:27:41 +01:00
Florian Hahn	c230ab6db8	[LV] Re-generate check lines for loop-form.ll test.	2022-05-22 18:20:33 +01:00
Florian Hahn	97590baead	[LV] Widen ptr-inductions with scalar uses for scalable VFs. Current codegen only supports scalarization of pointer inductions for scalable VFs if they are uniform. After `3bebec659` we now may enter the scalarization code path in VPWidenPointerInductionRecipe::execute for scalable vectors. Fall back to widening for scalable vectors if necessary. This should fix a build failure when bootstrapping LLVM with SVE, e.g. https://lab.llvm.org/buildbot/#/builders/176/builds/1723	2022-05-22 16:24:13 +01:00
Florian Hahn	3bebec6592	[VPlan] Model first exit values using VPLiveOut. This patch introduces a new VPLiveOut subclass of VPUser to model exit values explicitly. The initial version handles exit values that are neither part of induction or reduction chains nor first order recurrence phis. Fixes #51366, #54867, #55167, #55459 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123537	2022-05-21 16:01:38 +01:00
Florian Hahn	a84896f270	[LV] Precommit test for PR55167. Test for #55167.	2022-05-21 16:01:33 +01:00
Florian Hahn	cd61d4bd2f	[LV] Do not LoopSimplify/LCSSA after generating main vector loop. At the moment LV runs LoopSimplify and reconstructs LCSSA form after generating the main vector loop and before generating the epilogue vector loop. In practice, this adds a new exit block for the scalar loop because the middle block now also branches to the original exit block of the scalar loop. It also requires adding a new LCSSA phi in the newly created exit block. This complicates things when modeling exit values in VPlan, because we would need to update the VPlan for the epilogue loop to update the newly created LCSSA phi node. But none of that should be necessary, as all analysis requiring loop-simplify form is already done at this point and LCSSA form of the original loop is not broken. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D125810	2022-05-20 09:58:40 +01:00
Florian Hahn	c90235f0ef	[LV] Drop wrap flags for reductions using VP def-use chain. Update clearReductionWrapFlags to use the VPlan def-use chain from the reduction phi recipe to drop reduction wrap flags. This addresses an existing FIXME and fixes a crash when instructions in the reduction chain are not used and have been removed before VPlan codegeneration. Fixes #55540.	2022-05-19 20:36:46 +01:00
Tiehu Zhang	3ed9f603fd	[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be generated for some cases. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D122126	2022-05-19 23:29:00 +08:00
Tiehu Zhang	94a2bd5a27	[LoopVectorize] Precommit a test for D122126	2022-05-19 23:28:39 +08:00
lizhijin	90ea81fcb2	[LV] Widen freeze instead of scalarizing it This patch changes the strategy for vectorizing freeze instrucion, from replicating multiple times to widening according to selected VF. Fixes #54992 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D125016	2022-05-19 12:28:01 +08:00
Florian Hahn	d92cec4c96	[LV] Regenerate check lines for some tests. Make sure the auto-generated check lines are up-to-date for some files, to reduce the test diff in upcoming changes	2022-05-17 17:45:01 +01:00
Nikita Popov	356d47ccb9	[ValueTracking] Handle and/or on RHS of isImpliedCondition() isImpliedCondition() currently handles and/or on the LHS, but not on the RHS, resulting in asymmetric behavior. This patch adds two new implication rules: * LHS ==> (RHS1 \|\| RHS2) if LHS ==> RHS1 or LHS ==> RHS2 * LHS ==> !(RHS1 && RHS2) if LHS ==> !RHS1 or LHS ==> !RHS2 Differential Revision: https://reviews.llvm.org/D125551	2022-05-16 16:30:26 +02:00
Florian Hahn	b7315ffc3c	[LAA,LV] Add initial support for pointer-diff memory checks. This patch adds initial support for a pointer diff based runtime check scheme for vectorization. This scheme requires fewer computations and checks than the existing full overlap checking, if it is applicable. The main idea is to only check if source and sink of a dependency are far enough apart so the accesses won't overlap in the vector loop. To do so, it is sufficient to compute the difference and compare it to the `VF * UF * AccessSize`. It is sufficient to check `(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards dependence in the vector loop with the given VF and UF. If Src >=u Sink, there is not dependence preventing vectorization, hence the overflow should not matter and using the ULT should be sufficient. Note that the initial version is restricted in multiple ways: 1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information) 2. Source and sink pointers must be add-recs, with matching steps 3. The step must be a constant. 3. abs(step) == AccessSize. Most of those restrictions can be relaxed in the future. See https://github.com/llvm/llvm-project/issues/53590. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119078	2022-05-16 15:27:22 +01:00
David Sherwood	befc952045	[LoopVectorize] Permit tail-folding for low trip counts using scalable vectors When the loop vectoriser encounters a known low trip count it tries to create a single predicated loop in order to get the benefit of vectorisation and eliminate the scalar tail. However, until now the vectoriser prevented the use of scalable vectors in this case due to concerns in the past about stability. I believe that tail-folded loops using scalable vectors are now sufficiently well tested that we can enable this. For the same reason I've also enabled it when optimising for code size too. Tests added here: Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll Transforms/LoopVectorize/RISCV/low-trip-count.ll Differential Revision: https://reviews.llvm.org/D121595	2022-05-16 09:14:24 +01:00
Florian Hahn	8b7c3d2179	[LV] Set SCEVCheckCond to nullptr whenever it was used. Under some circumstances, SCEVExpander will insert new instructions when expanding a predicate, but the final result of the expansion can be a false constant. In those cases, the expanded instructions may later be used by other expansions, e.g. the trip count. This may trigger an assertion during SCEVExpander cleanup. To avoid this, always mark the result as used. Fixes #55100.	2022-05-15 21:52:07 +01:00
Florian Hahn	39552964e1	[VPlan] Improve printing of VPReplicateRecipe with calls. Suggested as part of D124718.	2022-05-15 15:51:26 +01:00
Nikita Popov	0c00dbb975	[LoopVectorize] Regenerate test checks (NFC)	2022-05-13 16:41:48 +02:00
David Sherwood	92c645b5c1	[LoopVectorize] Add overflow checks when tail-folding with scalable vectors In InnerLoopVectorizer::getOrCreateVectorTripCount there is an assert that the known minimum value for the VF is a power of 2 when tail-folding is enabled. However, for scalable vectors the value of vscale may not be a power of 2, which means we have to worry about the possibility of overflow. I have solved this problem by adding preheader checks that prevent us from entering the vector body if the canonical IV would overflow, i.e. if ((IntMax - TripCount) < (VF * UF)) ... skip vector loop ... Differential Revision: https://reviews.llvm.org/D125235	2022-05-13 14:09:43 +01:00
Florian Hahn	38189438b6	[LV] Add crashing test from #55096 .	2022-05-12 22:40:28 +01:00
Florian Hahn	635b752211	[VPlan] VPInterleaveRecipe only uses first lane if op not stored. With opaque pointers, both the stored value and the address can be the same. Only consider the recipe using the first lane only if the address is not stored. Fixes #55375.	2022-05-11 11:24:56 +01:00
Florian Hahn	e79c1962b9	[LV] Add opaque pointer test for #55375 .	2022-05-11 11:24:52 +01:00
Nikita Popov	ff20ee32d8	[LoopVectorize] Remove incorrect nuw flag from test (NFC) nuw does not make sense for reverse iteration.	2022-05-10 12:17:09 +02:00
David Sherwood	45f2e92d97	[NFC][LoopVectorize] Add SVE test for tail-folding combined with interleaving Differential Revision: https://reviews.llvm.org/D125001	2022-05-09 13:08:25 +01:00
Simon Pilgrim	cbfa857346	[CostModel][X86] Adjust 128-bit select costs to account for slow BLENDV op Based off the script from D103695 - Jaguar, Bulldozer, Silvermont (et al) and Haswell all have slow BLENDV ops, so adjust the worse case cost values	2022-05-06 13:07:34 +01:00
Florian Hahn	ff8d0b338f	[VPlan] Add test for printing plan with an exit value. Test for printing plan with additions from D123537.	2022-05-04 17:19:02 +01:00
Igor Kirillov	4e5e042d9a	[LoopVectorize] Support reductions that store intermediary result Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop. Ordered fadd reductions are not yet supported. Differential Revision: https://reviews.llvm.org/D110235	2022-05-03 10:12:30 +01:00
David Green	6f81903e89	[LV][SLP] Mark fptosi_sat as vectorizable This adds fptosi_sat and fptoui_sat to the list of trivially vectorizable functions, mainly so that the loop vectorizer can vectorize the instruction. Marking them as trivially vectorizable also allows them to be SLP vectorized, and Scalarized. The signature of a fptosi_sat requires two type overrides (@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first operand of the intrinsic as a overloaded (but not scalar) operand. Differential Revision: https://reviews.llvm.org/D124358	2022-05-03 09:32:34 +01:00
Florian Hahn	0ef8ca6d88	[VPlan] Do not create VPWidenCall recipes for scalar vector factors. 'Widen' recipe are only used when actual vector values are generated. Fix tryToWidenCall to do not create VPWidenCallRecipes for scalar vector factors. This was exposed by D123720, because the widened recipes are considered vector users. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D124718	2022-05-02 19:40:33 +01:00
David Green	c7d39fd61a	[LV][SLP] Add tests for vectorizing fptoi_sat intrinsics. NFC	2022-05-02 15:11:44 +01:00
Simon Pilgrim	cff0afc184	[LoopVectorize][X86] Regenerate invariant-store-vectorization.ll	2022-05-01 13:04:24 +01:00
Simon Pilgrim	c2964746e3	[CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput	2022-05-01 09:32:14 +01:00
Florian Hahn	841fffa745	[LV] Add test for interleaving multiple iterations with call.	2022-04-30 20:43:22 +01:00
Bjorn Pettersson	2e14900db9	[test][NewPM] Use -passes=loop-vectorize instead of -loop-vectorize Update a bunch of loop-vectorize regression tests to use the new PM syntax (opt -passes=loop-vectorize) instead of the deprecated legacy PM syntax (opt -loop-vectorize).	2022-04-28 16:46:00 +02:00
Florian Hahn	bea69b232f	[VPlan] Initial modeling of middle block in VPlan. This patch extends the scope of VPlan to also include the exit (aka middle) block. For now, the exit block remains empty, but handling of exit values will subsequently be moved to VPlan, by adding recipes to model exit values in the exit block. As a first step, this will allow fixing #51366. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123457	2022-04-20 19:34:41 +01:00
Florian Hahn	a65f2730d2	[VPlan] Expand induction step in VPlan pre-header. This patch moves SCEV expansion of steps used by VPWidenIntOrFpInductionRecipes to the pre-header using VPExpandSCEVRecipe. This ensures that those steps are expanded while the CFG is in a valid state. Previously, SCEV expansion may happen during vector body code-generation, during which the CFG may be invalid, causing issues with SCEV expansion. Depends on D122095. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D122096	2022-04-19 13:06:39 +02:00
Craig Topper	ac8c720d48	[IR] Allow constant folding (insertelement <vscale x 2 x i32> zeroinitializer, i32 0, i32 i32 0. Most of insertelement constant folding is blocked if the vector type is scalable. I believe we can make an exception for inserting null into an all zeros vector. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123413	2022-04-15 17:44:32 -07:00
Florian Hahn	73f5d7d0d6	[VPlan] Handle equal address and store ops in onlyFirstLaneDemanded. With opaque pointers, the stored value and address can be the same. Previously the code in VPWidenMemoryInstructionRecipe::onlyFirstLaneDemanded incorrectly considers stores with matching store and pointer operands as only demanding the first lane, causing a crash.	2022-04-15 22:53:33 +02:00
Muhammad Omair Javaid	42ebfa8269	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit `64b6192e81`. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch.	2022-04-13 04:53:07 +05:00
Simon Pilgrim	431e93f4f5	[InstCombine] Fold sub(add(x,y),min/max(x,y)) -> max/min(x,y) (PR38280) As discussed on Issue #37628, we can flip a min/max node if we're subtracting from the sum of the node's operands Alive2: https://alive2.llvm.org/ce/z/W_KXfy Differential Revision: https://reviews.llvm.org/D123399	2022-04-11 11:32:56 +01:00
Florian Hahn	5f1eb74850	[VPlan] Place VPExpandSCEVRecipe in pre-header. After D121624 models the pre-header in VPlan, VPExpandSCEVRecipes can be placed there. This ensures SCEV expansion happens before modifying the CFG during VPlan execution, when CFG is incomplete. Depends on D121624. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D122095	2022-04-10 10:26:20 +02:00
Florian Hahn	256c6b0ba1	[VPlan] Model pre-header explicitly. This patch extends the scope of VPlan to also model the pre-header. The pre-header can be used to place recipes that should be code-gen'd outside the loop, like SCEV expansion. Depends on D121623. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121624	2022-04-09 14:19:47 +02:00
Simon Pilgrim	450f0d76b4	[LoopVectorize] Regenerate first-order-recurrence.ll	2022-04-09 10:33:03 +01:00
Stanislav Mekhanoshin	fced87d457	[AMDGPU] Fix regression with vectorization limiting D67148 has removed TTI::getNumberOfRegisters(bool Vector) and started to call TTI::getNumberOfRegisters(unsigned ClassID) from the LoopVectorize. This has resulted in an unrestricted vectorization on AMDGPU blowing up register pressure. Differential Revision: https://reviews.llvm.org/D122850	2022-04-08 17:46:49 -07:00
Florian Hahn	467dbcd9f1	[LV] Set debug loc after setting insert point. This fixes the code to actually use the location of the instruction, if available. Previously, SetInsertPoint would overwrite the insert point set from the instruction.	2022-04-08 20:34:40 +02:00
Florian Hahn	4c0d5db9c9	[LV] Add test case for wrong debug location with replicate recipe.	2022-04-08 20:34:16 +02:00
Florian Hahn	29fe998eaa	[VPlan] Preserve debug location when creating branch. Update createEmptyBasicBlock to preserve the debug location of the previous terminator.	2022-04-08 17:22:53 +02:00
Florian Hahn	547567fe2b	[LV] Add test for missing debug info on branch in vector loop. Adds a test case where currently no debug location is added to branches in the vector body.	2022-04-08 17:22:53 +02:00
Florian Hahn	631016a853	[LV] Add test case for PR54427. Reduced test for #54427.	2022-04-07 23:21:21 +02:00
Jingu Kang	64b6192e81	[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth Set the maximum VF of AArch64 with 128 / the size of smallest type in loop. Differential Revision: https://reviews.llvm.org/D118979	2022-04-05 13:16:52 +01:00
Florian Hahn	1ff022e21b	[LV] Add vector.body block to parent loop during skeleton creation. When creating induction resume values, SCEV queries may rely on LoopInfo. Make sure vector.body gets added to the loop of the pre-header during skeleton construction. %vector.body will be moved to the vector preheader during VPlan execution. Fixes #54745.	2022-04-05 11:54:17 +01:00
Florian Hahn	368d35a894	[LV] Add addiitonal tests for pointer difference memory checks. Additional tests for D119078.	2022-04-04 17:58:48 +01:00
Philip Reames	88de27e3fd	[LV] Handle non-integral types when considering interleave widening legality In general, anywhere we might need to insert a blind bitcast, we need to make sure the types are losslessly convertible. This fixes pr54634.	2022-04-03 20:16:20 -07:00
Dávid Bolvanský	872f7000fc	Revert "[NFCI] Regenerate SROA/LoopVectorize test checks" This reverts commit `14e3450fb5`.	2022-04-04 01:15:30 +02:00
Dávid Bolvanský	a113a582b1	[NFCI] Regenerate LoopVectorize test checks	2022-04-03 21:56:24 +02:00
Florian Hahn	95b2aa511e	[VPlan] Set VPlan header block name to vector.body. This brings the VPlan block naming in line with the naming of the generated basic blocks.	2022-04-02 19:34:32 +01:00
Florian Hahn	a08c90a402	[LV] Re-use TripCount from EPI.TripCount. During skeleton construction for the epilogue vector loop, generic helpers use getOrCreateTripCount, which will re-expand the trip count computation. Instead, re-use the TripCount created during main loop vectorization.	2022-04-01 13:47:34 +01:00
David Green	b65267ca7b	[LV] Invalidate widening decisions after maximizing vector bandwidth When MaximizeVectorBandwidth is enabled, we can end up (via calls to collectUniformsAndScalars/setCostBasedWideningDecision through calculateRegisterUsage) making widening decisions before we have decided whether to fold the tail by masking. These decisions will be wrong if we later decided to fold the tail, for example when the trip count is very low. It will use incorrect costs for loads that should get masked, using standard memory operation costs instead. This still at the moment uses the EmulatedMaskMemRefHack costs (a bit unfortunately), but the old costs without this change were 1, leading to too optimistic vectorization. This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set. Differential Revision: https://reviews.llvm.org/D120215	2022-03-31 09:19:31 +01:00
Florian Hahn	ecb4171dcb	[LV] Handle zero cost loops in selectInterleaveCount. In some case, like in the added test case, we can reach selectInterleaveCount with loops that actually have a cost of 0. Unfortunately a loop cost of 0 is also used to communicate that the cost has not been computed yet. To resolve the crash, bail out if the cost remains zero after computing it. This seems like the best option, as there are multiple code paths that return a cost of 0 to force a computation in selectInterleaveCount. Computing the cost at multiple places up front there would unnecessarily complicate the logic. Fixes #54413.	2022-03-29 22:52:43 +01:00

1 2 3 4 5 ...

1772 Commits