llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since `151c144` which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre `151c144` any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Florian Hahn	17167005d5	[LV] Add test for #57912 . Add test showing miscompilation during epilogue vectorization with SVE.	2022-09-23 11:49:55 +01:00
Florian Hahn	05b3493819	[LV] Convert sve-epilog-vect.ll to use opaque pointers.	2022-09-23 10:24:19 +01:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Simon Pilgrim	e030be64d8	[CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad. This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 11:24:11 +01:00
Simon Pilgrim	b2cd8118d0	[CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 10:19:23 +01:00
Philip Reames	8c46881a53	[TTI] Recognize fp constants in getOperandInfo We were recognizing vectors of floats, but not scalars. That's a tad odd.	2022-09-21 14:34:34 -07:00
Graham Hunter	7b420a4a8b	[NFC][LV] Scalarizing test for masked vector calls	2022-09-21 15:43:25 +01:00
Simon Pilgrim	71162ad957	[LoopVectorize] Fix test name - the test is for fshl not cttz intrinsic costs	2022-09-21 15:24:43 +01:00
Sanjay Patel	0f32a5dea0	[InstCombine] don't canonicalize shl+sub to mul+add This stops Negator from transforming: `C1 - shl X, C2 --> mul X, (1<<C2) + C1` ...in the general case. There does not seem to be any analysis benefit to using mul in IR, and there's definitely downside in codegen (particularly when the multiply has to be expanded). If `C1` is 0, then there's a stronger argument that the single mul is a better canonicalization than negate-of-shl, but we may want to remove that too. This was noted as a potential conflict for D133667. Differential Revision: https://reviews.llvm.org/D134310	2022-09-21 08:39:07 -04:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Vitaly Buka	bbef90ace4	[IRBuilder] Use PoisonValue in CreateMasked* Followup to `72b776168c` Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D133967	2022-09-19 11:01:41 -07:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before `151c144`. This effectively reverts `151c144`, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Sebastian Peryt	99c9b37d11	[NFC][1/n] Remove -enable-new-pm=0 flags from lit tests This is the first patch in a series intended for removing flag -enable-new-pm=0 from lit tests. This is part of a bigger effort of completely removing legacy code related to legacy pass manager in favor of currently default new pass manager. In this patch flag has been removed only from tests where no significant change has been required because checks has been duplicated for both PMs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134150	2022-09-19 09:57:37 -07:00
Florian Hahn	f02ff5348f	[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir. The test requires the AArch64 backend, so move it to the right subdir.	2022-09-19 17:13:06 +01:00
Florian Hahn	6087b6386e	[LV] Add tests for epilogue vectorization with widened inductions. Includes a test for the miscompile in #57712.	2022-09-19 17:10:41 +01:00
Simon Pilgrim	393cc6a354	[LoopVectorize] Regenerate runtime-check.ll	2022-09-19 10:25:48 +01:00
Simon Pilgrim	7e626d7a89	[LoopVectorize][X86] Use quotes around the pass list to appease DOS cmd evaluation DOS can't handle -passes='default<O3>' correctly	2022-09-19 10:24:37 +01:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Vitaly Buka	ed188b39ab	[test] Regenerate few tests	2022-09-15 12:36:32 -07:00
Simon Pilgrim	0ec028fe10	[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 14:05:30 +01:00
jacquesguan	ecf327f154	[RISCV] Add cost model for vector insert/extract element. This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133007	2022-09-14 11:10:18 +08:00
Simon Pilgrim	8ae9cf550b	[LoopVectorize][X86] Add uniform shift costs checks for VF=1/2/4	2022-09-13 13:46:52 +01:00
Philip Reames	4e295cb1ce	[LV] Autogen a test for ease of update	2022-09-09 08:16:22 -07:00
Philip Reames	edb26268ce	[VPlan] Only generate single instr for stores uniform across all parts. Extend the approach taken by D133019 to store instructions. Differential Revision: https://reviews.llvm.org/D133497	2022-09-09 07:15:12 -07:00
Graham Hunter	1f639d1bd2	[NFC][LV] Convert masked call tests to use update script	2022-09-09 10:07:39 +01:00
Craig Topper	5f3a8b585b	[RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133511	2022-09-08 12:34:59 -07:00
Philip Reames	4c4c0d2c06	[LV] Use safe-divisor lowering for fixed vectors if profitable This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well. Differential Revision: https://reviews.llvm.org/D132591	2022-09-08 09:15:54 -07:00
Florian Hahn	422cf99161	[VPlan] Only generate single instr for loads uniform across all parts. VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a scalar instruction is generated per-part. This is a potential alternative D132892. For now the current patch only catches cases where the address is trivially invariant (defined outside VPlan), while D132892 catches any address that is considered invariant by SCEV AFAICT. It should be possible to hoist fully invariant recipes feeding loads out of the vector loop region as well, but in practice LICM should do that already. This version of the patch artificially limits this to loads to make it easier to compare, but this restriction should be easily liftable. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133019	2022-09-08 14:27:58 +01:00
Florian Hahn	ba3d29f871	[LCSSA] Update unreachable uses with poison. Users of LCSSA may not expect non-phi uses when checking the uses outside a loop, which may cause crashes. This is due to the fact that we do not update uses in unreachable blocks. To ensure all reachable uses outside the loop are phis, update uses in unreachable blocks to use poison in dead code. Fixes #57508.	2022-09-04 22:26:18 +01:00
Florian Hahn	a10d42dd45	[LV] Update test use opaque pointers, regenerate checks. Modernize the test to make it easier to extend in a follow-up patch.	2022-09-04 22:26:18 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Florian Hahn	faad567589	[LV] Add test case where SCEV is needed to remove vector backedge. Test case mentioned in the discussion for D115261.	2022-08-31 14:01:42 +01:00
Florian Hahn	1ed555a62b	[LV] Fix test cases where vector loop never executed. It looks like the vector loops in the modified test cases unintentionally never get executed. Update the exit condition to ensure it does to avoid them getting optimized away in upcoming changes.	2022-08-31 13:24:49 +01:00
Philip Reames	4c10646367	[LV] Refresh autogen tests to reflect naming changes [nfc] Purely so that these can be easily autogened without spurious diffs	2022-08-29 14:16:54 -07:00
Florian Hahn	005d1a8ff5	[LV] Add test where either a libfunc or intrinsic is chosen. In the newly added test either a libfunc (VF=2) or a intrinsic (VF=4) can be chosen. Test coverage for D132585.	2022-08-29 10:51:20 +01:00
Philip Reames	b45a262679	[RISCV] Enable fixed length vectors and loop vectorization with same This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size. For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware. The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. SLP has been disabled for now, even when fixed vectors are enabled. See `a310637` and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable. Differential Revision: https://reviews.llvm.org/D131508	2022-08-26 14:45:23 -07:00
Florian Hahn	9405af1c85	[LAA] Require AddRecs to be in the innermost loop for diff-checks. The simpler diff-checks require pointers with add-recs from the same innermost loop, but this property wasn't check completely. Add the missing check to ensure both addrecs are in the innermost loop. Fixes #57315.	2022-08-26 20:39:52 +01:00
Florian Hahn	e117137af0	[LV] Add another test for incorrect runtime check generation. Add a variation of @nested_loop_outer_iv_addrec_invariant_in_inner with the dependence sink and source swapped to extend test coverage. Also simplifies the test by removing an unneeded reduction.	2022-08-26 17:28:55 +01:00
Florian Hahn	6e56779e6b	[LV] Add test for incorrect runtime check generation #57315 . Test for PR57315 based on a test provided by @kpdev42.	2022-08-26 16:29:20 +01:00
Florian Hahn	3b135ef446	[LV] Convert runtime diff check test to use opaque pointers. Modernize the test to make it easier to extend with up-to-date IR.	2022-08-26 16:02:38 +01:00
Philip Reames	86b67a310d	[LAA] Prune dependencies with distance large than access implied by trip count When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances. Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all. Differential Revision: https://reviews.llvm.org/D131924	2022-08-25 14:24:13 -07:00
Florian Hahn	637da77e66	[LV] Add additional test coverage for SCEVexp and LCSSA interaction. Also converts the test to use opaque pointers while I am here.	2022-08-25 20:59:47 +01:00
Philip Reames	190cdf51ff	[RISCV][LV] Add predicated div/rem test for fixed length vectorization	2022-08-24 11:24:22 -07:00
Philip Reames	b20104f644	[LV] Update a test which appears to have been editted without regen [nfc]	2022-08-24 11:05:49 -07:00
Philip Reames	f79214d1e1	[LV] Support predicated div/rem operations via safe-divisor select idiom This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors. The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better. I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches. Differential Revision: https://reviews.llvm.org/D130164	2022-08-24 10:07:59 -07:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
David Green	e29f9f7572	[AArch64][X86] Add some fixed-order-recurrence tests to check the costmodel of fixed order recurrences. NFC	2022-08-24 08:18:01 +01:00
Graham Hunter	14212c968f	[NFC][LoopVectorize] Precommit masked vector function call tests	2022-08-23 09:47:10 +01:00
Jay Foad	2754ff883d	[InstCombine] Try not to demand low order bits for Add Don't demand low order bits from the LHS of an Add if: - they are not demanded in the result, and - they are known to be zero in the RHS, so they can't possibly overflow and affect higher bit positions This is intended to avoid a regression from a future patch to change the order of canonicalization of ADD and AND. Differential Revision: https://reviews.llvm.org/D130075	2022-08-22 20:03:53 +01:00
David Green	04a68fce13	[ARM] Add a couple of MVE fixed-order-reduction tests. NFC	2022-08-22 10:58:14 +01:00
Sanjay Patel	15e3d86911	[InstCombine] reassociate bitwise logic chains based on uses (X op Y) op Z --> (Y op Z) op X This isn't a complete solution (see TODO tests for possible refinements), but it shows some nice wins and doesn't seem to cause any harm. I think the most potential danger is from conflicting with other folds and causing an infinite loop - that's the reason for avoiding patterns with constant operands. Alternatively, we could try this in the reassociate pass, but we would not immediately see all of the logic folds that instcombine provides. I also looked at improving ValueTracking's isImpliedCondition() (and we should still add some enhancements there), but that would not work in general for bitwise logic reduction. The tests that reduce completely to 0/-1 are motivated by issue #56653. Differential Revision: https://reviews.llvm.org/D131356	2022-08-21 09:42:14 -04:00
David Sherwood	666d2a925f	[SVE][LoopVectorize][NFC] Tidy up some tests Whilst writing a patch to add extra tail-folding RUN lines to existing tests I noticed a few areas where they can be cleaned up a little: 1. scalable-reductions.ll: fmin_fast does not mark fcmp as fast. 2. sve-inductions-unusual-types.ll: remove direct references to SSA variable names. 3. sve-strict-fadd-cost.ll: don't force vector width so we see costs for different VFs in one go. This will be important for the follow-on patch. 4. sve-vector-reverse.ll,vector-reverse-mask4.ll: add noalias keyword to simplify IR. 4. sve-widen-gep.ll,sve-widen-phi.ll: regenerate using script. These changes will make the subsequent patch adding RUN lines much easier to review! Differential Revision: https://reviews.llvm.org/D132219	2022-08-19 15:12:58 +01:00
Philip Reames	4d87591028	[RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies. When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all. This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants. This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist. Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine. Differential Revision: https://reviews.llvm.org/D131519	2022-08-18 13:10:03 -07:00
Florian Hahn	b8709a9d03	[LV] Support fixed order recurrences. If the incoming previous value of a fixed-order recurrence is a phi in the header, go through incoming values from the latch until we find a non-phi value. Use this as the new Previous, all uses in the header will be dominated by the original phi, but need to be moved after the non-phi previous value. At the moment, fixed-order recurrences are modeled as a chain of first-order recurrences. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D119661	2022-08-18 19:15:52 +01:00
Philip Reames	531dd3634d	[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops) This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is not an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it not to be a predicated instruction, but were considering it to be scalable with predication. As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious. Differential Revision: https://reviews.llvm.org/D131093	2022-08-18 07:14:04 -07:00
Florian Hahn	a34428f07d	[LV] Use variables instead of hard-coded metadata IDs in tests.	2022-08-16 12:21:49 +01:00
Zain Jaffal	94d21a94d9	[AArch64] Add tests to check for loop vectorization of non temporal loads Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131899	2022-08-16 09:40:51 +01:00
Philip Reames	33e7a0a33b	[RISCV][LV] Add test coverage for upcoming dependence distance handling change	2022-08-15 15:20:36 -07:00
Florian Hahn	4f04be5649	[LV] Add tests for vectorizing select of minimum idx idiom. Test cases for selecting the index with the minimum value.	2022-08-14 17:44:11 +01:00
Martin Sebor	0dcfe7aa35	[InstCombine] Tighten up known library function signature tests (PR #56463 ) Replace a switch statement used to validate arguments to known library functions with a more consistent table-driven approach and tighten it up.	2022-08-10 14:15:46 -06:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
jacquesguan	45bae1be90	[RISCV][test] Add inloop reduction vectorize test. NFC	2022-08-04 15:06:44 +08:00
Philip Reames	0b47615fcf	[LV] Recognize store of invariant value to invariant address as uniform This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result. For context, the basic structure of the existing code and how the change fits in: * First, we select a widening strategy. (The result is irrelevant for this patch.) * Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.) * If it is, we overrule the widening strategy, and unconditionally scalarize. * VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.) An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions. Differential Revision: https://reviews.llvm.org/D130364	2022-08-02 08:09:49 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
Florian Hahn	ff5ae948a7	[LV] Add variation of test cases with order of phis flipped. Additional tests with integer and pointer inductions for D119661.	2022-08-01 11:38:16 +01:00
Florian Hahn	6e1ba62d0d	[LV] Add additional tests with multiple chained recurrences. Adds more extra tests for D119661. Also update the test to use opaque pointers.	2022-08-01 10:01:19 +01:00
Philip Reames	82c1b136db	[LV] Don't predicate uniform mem op stores unneccessarily We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible. Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would not be legal to make this same change for merely "uniform" operations. Differential Revision: https://reviews.llvm.org/D130637	2022-07-28 08:55:52 -07:00
Philip Reames	15c645f7ee	[RISCV] Enable (scalable) vectorization by default This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration. At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible. This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised. Given that, having issues fall out is not unexpected. If you find issues, please make sure to include as much information as you can when reverting this change. Differential Revision: https://reviews.llvm.org/D129013	2022-07-27 12:36:04 -07:00
Philip Reames	e8ceadd0ce	[LV][RISCV] Add a test case for a quality problem mixing vector index and data types The problem here is target independent, but particularly painful on RISCV. If we chose to vectorize such that vscale x 2 x i32 is our widest type and fits in a register, a naive expansion of i64 comparisons results in comparisons and index types at <scalabe x 2 x i64>. This requires both an LMUL of 2, and a VSETVLI toggle in the loop. Note that we could have used <vscale x 2 x i32> for the compairons legally given the range of the trip count.	2022-07-27 11:42:28 -07:00
Florian Hahn	16e0620d6d	[VPlan] Mark VPPredInstPHIRecipe as not having side-effects. Now that all uses of VPPredInstPHIRecipes are properly modeled, they can be treated as not having side-effects, enabling removal.	2022-07-27 19:29:26 +01:00
Philip Reames	43b5e12159	[LV] Refresh an autogened test to pickup naming changes	2022-07-27 10:54:15 -07:00
Philip Reames	ebee4fbb34	[RISCV][LV] Add basic tests for default configuration All of our other tests are functionality tests constrained to some specific configuration. This one is intended to float with the default configuration so that changes in that default are visible in reviews. Note that our current default does not enable vectorization at all; thus the current output is unvectorized.	2022-07-27 09:16:44 -07:00
Florian Hahn	a8fdc247e9	[LV] Add missing uses to test to make them more robust. The changes ensure the VPPredInstPHIRecipes are actually used and cannot be remove by VP-DCE.	2022-07-27 16:06:52 +01:00
Sanjay Patel	bfb9b8e075	[Passes] add a tail-call-elim pass near the end of the opt pipeline We call tail-call-elim near the beginning of the pipeline, but that is too early to annotate calls that get added later. In the motivating case from issue #47852, the missing 'tail' on memset leads to sub-optimal codegen. I experimented with removing the early instance of tail-call-elim instead of just adding another pass, but that appears to be slightly worse for compile-time: +0.15% vs. +0.08% time. "tailcall" shows adding the pass; "tailcall2" shows moving the pass to later, then adding the original early pass back (so 1596886802 is functionally equivalent to 180b0439dc ): https://llvm-compile-time-tracker.com/index.php?config=NewPM-O3&stat=instructions&remote=rotateright Note that there was an effort to split the tail call functionality into 2 passes - that could help reduce compile-time if we find that this change costs more in compile-time than expected based on the preliminary testing: D60031 Differential Revision: https://reviews.llvm.org/D130374	2022-07-25 15:25:47 -04:00
Nuno Lopes	a30e77b6f6	fix tests for commit `9df0b254d2`	2022-07-23 22:32:30 +01:00
Nuno Lopes	9df0b254d2	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-23 21:50:11 +01:00
Philip Reames	bd75350180	[LV] Fix a conceptual mistake around meaning of uniform in isPredicatedInst This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the common name, these are different. * LVs notion means that only the first lane of each unrolled part is required. That is, lanes within a single unroll factor are considered uniform. This allows e.g. widenable memory ops to be considered uses of uniform computations. * LVL and LAI's notion refers to all lanes across all unrollings. IsUniformMem is in turn defined in terms of LAI's notion. Thus a UniformMemOpmeans is a memory operation with a loop invariant address. This means the same address is accessed in every iteration. The tweaked piece of code was trying to match a uniform mem op (i.e. fully loop invariant address), but instead checked for LV's notion of uniformity. In theory, this meant with UF > 1, we could speculate a load which wasn't safe to execute. This ends up being mostly silent in current code as it is nearly impossible to create the case where this difference is visible. The closest I've come in the test case from 54cb87, but even then, the incorrect result is only visible in the vplan debug output; before this change we sink the unsafely speculated load back into the user's predicate blocks before emitting IR. Both before and after IR are correct so the differences aren't "interesting". The other test changes are uninteresting. They're cases where LV's uniform analysis is slightly weaker than SCEV isLoopInvariant.	2022-07-21 15:44:34 -07:00
Philip Reames	54cb87964d	[LV] Add a load focused version of the r45679 test This a reproducer for bug in predicated instruction handling. The final result code is correct, but the reasoning by which we get there isn't.	2022-07-21 15:33:42 -07:00
Philip Reames	83993d666b	[LV][SVE] Autogen a test for ease of update	2022-07-21 13:12:53 -07:00
Philip Reames	27945f9282	[RISCV][LV] Split coverage of uniform load with outside use Turns out this has a large effect of tail folding, so split out a single test to cover that case and remove it from the others.	2022-07-21 12:07:26 -07:00
Philip Reames	bb5dc2918f	{RISCV][LV] Add tail folding coverage of uniform load store cases	2022-07-21 11:15:36 -07:00
Philip Reames	56a25ed208	{RISCV][LV] Add a test for uniform store of a loop varying value	2022-07-21 11:15:36 -07:00
Philip Reames	0ae46693f0	{RISCV][LV] Split out and expand tests for uniform loads and stores	2022-07-21 10:42:18 -07:00
David Sherwood	f15b6b2907	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560	2022-07-21 17:20:06 +01:00
David Sherwood	ceb6c23b70	[NFC][LoopVectorize] Explicitly disable tail-folding on some SVE tests This patch is in preparation for enabling vectorisation with tail-folding by default for SVE targets. Once we do that many existing tests will break that depend upon having normal unpredicated vector loops. For all such tests I have added the flag: -prefer-predicate-over-epilogue=scalar-epilogue Differential Revision: https://reviews.llvm.org/D129137	2022-07-21 15:23:00 +01:00
Philip Reames	f934b9b073	[LV] Refresh a couple of autogen tests for naming change These appear to just be changes in temporary identifiers; bit suprising we have so many.	2022-07-20 14:47:52 -07:00
Philip Reames	1a73ef75fa	[LV] Autogen a test for ease of update	2022-07-20 08:19:38 -07:00
Philip Reames	be25f52fec	[LV] Autogen several tests for ease of update in upcoming change	2022-07-20 07:17:51 -07:00
Philip Reames	523a526a02	[LV] Fix miscompile due to srem/sdiv speculation safety condition An srem or sdiv has two cases which can cause undefined behavior, not just one. The existing code did not account for this, and as a result, we miscompiled when we encountered e.g. a srem i64 %v, -1 in a conditional block. Instead of hand rolling the logic, just use the utility function which exists exactly for this purpose. Differential Revision: https://reviews.llvm.org/D130106	2022-07-20 05:35:23 -07:00
David Sherwood	79660d339e	[LoopVectorize][AArch64] Add TTI hook preferPredicatedReductionSelect By default if SVE is enabled we want the select instruction used for reductions to be inside the loop, rather than outside. This makes it possible for the backend to fold the select into the operation to produce a single predicated add, fadd, etc. Differential Revision: https://reviews.llvm.org/D129763	2022-07-20 09:33:29 +01:00
Philip Reames	f1243fa193	[LV] Autogen a partially autogened test for ease of update	2022-07-19 14:18:53 -07:00
Philip Reames	8353403f08	[LV] Add test for generic predicated sdiv	2022-07-19 12:33:36 -07:00
Philip Reames	2247fe856a	[LV] Add test coverage for a bug in srem handling	2022-07-19 11:29:17 -07:00
Philip Reames	b7d3ba4bdb	[LV] Add test coverage for scalable div/rem patterns	2022-07-19 11:02:14 -07:00
David Sherwood	34f81cfa3d	[LoopVectorize][NFC] Split reductions out from sve-tail-folding into new file In sve-tail-folding-reductions.ll I've also added an extra RUN line to test normal reductions, i.e. not in-loop. This patch is a pre-commit in preparation for a follow-on patch that changes how reduction selects are generated in the vector loop. Differential Revision: https://reviews.llvm.org/D129761	2022-07-18 13:56:39 +01:00
David Sherwood	1e77b0c871	[AArch64][NFC] Simplify loop vectoriser tail-folding tests I've simplified all of the SVE vectoriser tail-folding tests to only care about testing the flag: -prefer-predicate-over-epiloge=predicate-else-scalar-epilogue In practice we always want to fall back on unpredicated vector loops if tail-folding is not possible. Differential Revision: https://reviews.llvm.org/D129843	2022-07-18 13:37:29 +01:00
Graham Hunter	db8fcb2c25	[LAA] Add recursive IR walker for forked pointers This builds on the previous forked pointers patch, which only accepted a single select as the pointer to check. A recursive function to walk through IR has been added, which searches for either a loop-invariant or addrec SCEV. This will only handle a single fork at present, so selects of selects or a GEP with a select for both the base and offset will be rejected. There is also a recursion limit with a cli option to change it. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D108699	2022-07-18 12:06:17 +01:00
Florian Hahn	105032f549	[LV] Use PHI recipe instead of PredRecipe for subsequent uses. At the moment, the VPPRedInstPHIRecipe is not used in subsequent uses of the predicate recipe. This incorrectly models the def-use chains, as all later uses should use the phi recipe. Fix that by delaying recording of the recipe. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D129436	2022-07-18 09:35:34 +01:00
Florian Hahn	6813b41d57	[LV] Avoid creating new run-time VF expression for each runtime checks. At the moment, the cost of runtime checks for scalable vectors is overestimated due to creating separate vscale * VF expressions for each check. Instead re-use the first expression.	2022-07-16 17:24:07 +01:00

1 2 3 4 5 ...

1896 Commits