llvm-project

Commit Graph

Author	SHA1	Message	Date
Juneyoung Lee	e639bccefd	run update_test_checks.py for the tests in D101191 (NFC) This is an NFC that reruns update_test_checks.py on the tests that are going to be updated in D101191.	2021-05-02 13:11:57 +09:00
Sander de Smalen	51d648c119	Revert "[LV] Calculate max feasible scalable VF." Temporarily reverting this patch due to some unexpected issue found by one of the PPC buildbots. This reverts commit `584e9b6e4b`.	2021-04-29 16:04:37 +01:00
Sjoerd Meijer	837fded984	Follow up of rGddb3b26a1269: added 'requires asserts' to test case.	2021-04-29 08:34:24 +01:00
Bardia Mahjour	ddb3b26a12	[LV] Consider Loop Unroll Hints When Making Interleave Decisions This patch causes the loop vectorizer to not interleave loops that have nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)). Note that if a particular interleave count is being requested (through llvm.loop.interleave_count), it will still be honoured, regardless of the presence of nounroll hints. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D101374	2021-04-28 17:27:52 -04:00
Florian Hahn	1ed7f8ede5	[LAA] Support pointer phis in loop by analyzing each incoming pointer. SCEV does not look through non-header PHIs inside the loop. Such phis can be analyzed by adding separate accesses for each incoming pointer value. This results in 2 more loops vectorized in SPEC2000/186.crafty and avoids regressions when sinking instructions before vectorizing. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D101286	2021-04-28 20:19:40 +01:00
David Sherwood	00e65f3345	[LoopVectorize][SVE] Fix crash when vectorising FP negation This patch fixes a crash encountered when vectorising the following loop: void foo(float dst, float src, long long n) { for (long long i = 0; i < n; i++) dst[i] = -src[i]; } using scalable vectors. I've added a test to Transforms/LoopVectorize/AArch64/sve-basic-vec.ll as well as cleaned up the other tests in the same file. Differential Revision: https://reviews.llvm.org/D98054	2021-04-28 15:22:35 +01:00
David Sherwood	6998f8ae2d	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-28 13:41:07 +01:00
Sander de Smalen	584e9b6e4b	[LV] Calculate max feasible scalable VF. This patch also refactors the way the feasible max VF is calculated, although this is NFC for fixed-width vectors. After this change scalable VF hints are no longer truncated/clamped to a shorter scalable VF, nor does it drop the 'scalable flag' from the suggested VF to vectorize with a similar VF that is fixed. Instead, the hint is ignored which means the vectorizer is free to find a more suitable VF, using the CostModel to determine the best possible VF. Reviewed By: c-rhodes, fhahn Differential Revision: https://reviews.llvm.org/D98509	2021-04-28 12:30:00 +01:00
Kerry McLaughlin	9cc217ab36	[LoopVectorize] Prevent multiple Phis being generated with in-order reductions When using the -enable-strict-reductions flag where UF>1 we generate multiple Phi nodes, though only one of these is used as an input to the vector.reduce.fadd intrinsics. The unused Phi nodes are removed later by instcombine. This patch changes widenPHIInstruction/fixReduction to only generate one Phi, and adds an additional test for unrolling to strict-fadd.ll Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D100570	2021-04-28 11:29:01 +01:00
David Sherwood	6968520c3b	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `4afeda9157`.	2021-04-27 15:46:03 +01:00
David Sherwood	4afeda9157	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-27 15:26:15 +01:00
Florian Hahn	a950f66de2	[LV,LAA] Add test cases with pointer phis in loops. Pre-commits tests for D101286.	2021-04-27 13:49:32 +01:00
David Sherwood	cf7276820c	[NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions We can already vectorize loops that involve int<>int, fp<>fp, int<>fp and fp<>int conversions, however we didn't previously have any tests for them. This patch adds some tests for each conversion type. Differential Revision: https://reviews.llvm.org/D99951	2021-04-26 11:01:14 +01:00
David Sherwood	a458b7855e	[AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function When vectorising for AArch64 targets if you specify the SVE attribute we automatically then treat masked loads and stores as legal. Also, since we have no cost model for masked memory ops we believe it's cheap to use the masked load/store intrinsics even for fixed width vectors. This can lead to poor code quality as the intrinsics will currently be scalarised in the backend. This patch adds a basic cost model that marks fixed-width masked memory ops as significantly more expensive than for scalable vectors. Tests for the cost model are added here: Transforms/LoopVectorize/AArch64/masked-op-cost.ll Differential Revision: https://reviews.llvm.org/D100745	2021-04-26 11:00:03 +01:00
Joe Ellis	2c551aedcf	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	673e2f1b70	[COST][AARCH64] Improve cost of reverse shuffles for AArch64. Introduced the cost of thre reverse shuffles for AArch64, currently just copied the costs for PermuteSingleSrc. Differential Revision: https://reviews.llvm.org/D100871	2021-04-20 13:47:56 -07:00
Roman Lebedev	a1d283b71e	[NFC][LoopVectorize] Autogenerate check lines in pr45259.ll We might as well test all of the codegen here.	2021-04-20 21:29:21 +03:00
Alexey Bataev	683dc41695	Update tests checks, NFC.	2021-04-20 10:20:15 -07:00
Sander de Smalen	86729538bd	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Roman Lebedev	df9597cf5a	[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap This is similar to the subvector extractions, except that the 0'th subvector isn't free to insert, because we generally don't know whether or not the upper elements need to be preserved: https://godbolt.org/z/rsxP5W4sW This is needed to avoid regressions in D100684 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100698	2021-04-19 13:24:58 +03:00
Kerry McLaughlin	62ee638a87	[NFC] Add tests for scalable vectorization of loops with in-order reductions D98435 added support for in-order reductions and included tests for fixed-width vectorization with the -enable-strict-reductions flag. This patch adds similar tests to verify support for scalable vectorization of loops with in-order reductions. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D100385	2021-04-19 11:15:55 +01:00
Roman Lebedev	f3953a8aba	[NFC][LoopVectorize] Autogenerate check lines in X86/gather_scatter.ll test	2021-04-18 10:26:16 +03:00
Philip Reames	ff55d01a8e	[nofree] Restrict semantics to memory visible to caller This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee. This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute". The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D100141	2021-04-16 11:38:55 -07:00
Kerry McLaughlin	93f54fae9d	[NFC] Remove the -instcombine flag from strict-fadd.ll This also fixes a CHECK line in @fadd_strict_unroll which ensures the changes made to fixReduction() to support in-order reductions with unrolling are being tested correctly.	2021-04-15 15:10:48 +01:00
David Sherwood	ea14df695e	[SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction There were a few places in widenPHIInstruction where calculations of offsets were failing to take the runtime calculation of VF into account for scalable vectors. I've fixed those cases in this patch as well as adding an assert that we should not be scalarising for scalable vectors. Tests are added here: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D99254	2021-04-15 10:51:49 +01:00
Roman Lebedev	a36bb7fd76	[InstCombine] (X \| Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add https://alive2.llvm.org/ce/z/Coc5yf	2021-04-11 18:08:08 +03:00
Roman Lebedev	d1ebdbff12	[NFC][LoopVectorize] Autogenerate interleaved-accesses.ll	2021-04-11 18:08:08 +03:00
Thomas Preud'homme	623475248a	[test, LoopVectorize] Fix use of var defined in CHECK-NOT LLVM test Transforms/LoopVectorize/pr34681.ll tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. This commit only checks for the absence of icmp ne 1 which rules out the presence of the whole sequence and does not involve an undefined variable. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D99582	2021-04-09 10:01:57 +01:00
David Green	8675ef100f	[LV] Logical and/or select costs D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly. This fixes 60% performance regressions from code like the attached test case. Differential Revision: https://reviews.llvm.org/D99884	2021-04-08 10:39:47 +01:00
David Green	1a4d3d0bca	[LV] Add a logical and/or select cost test. NFC	2021-04-08 10:27:06 +01:00
Sander de Smalen	672f673004	[SVE] Remove checks for warnings in scalable-vector tests. After D98856 these tests will by default break (fatal_error) if any of the wrong interfaces are used, so there's no longer a need to have a RUN line that checks for a warning message emitted by the compiler.	2021-04-07 15:59:32 +01:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Sanjay Patel	7a4abc07dd	[LoopVectorize] auto-generate complete checks; NFC We can't see how much overhead/redundancy is being created with the partial checks. To make it smaller and easier to read, I reduced the vectorization factor because that does not add new information - it just duplicates things.	2021-04-01 11:55:41 -04:00
Philip Reames	e2c6621e63	[deref-at-point] restrict inference of dereferenceability based on allocsize attribute Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.) This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me). There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles. Differential Revision: https://reviews.llvm.org/D95815	2021-04-01 08:34:40 -07:00
David Sherwood	e3a13304fc	[NFC] Add tests for scalable vectorization of loops with large stride acesses This patch just adds tests that we can vectorize loop such as these: for (i = 0; i < n; i++) dst[i * 7] += 1; and for (i = 0; i < n; i++) if (cond[i]) dst[i * 7] += 1; using scalable vectors, where we expect to use gathers and scatters in the vectorized loop. The vector of pointers used for the gather is identical to those used for the scatter so there should be no memory dependences. Tests are added here: Transforms/LoopVectorize/AArch64/sve-large-strides.ll Differential Revision: https://reviews.llvm.org/D99192	2021-04-01 10:25:06 +01:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Thomas Preud'homme	8b5b03c279	[test, LoopVectorize] Fix use of var defined in CHECK-NOT LLVM test Transforms/LoopVectorize/X86/x86-pr39099.ll tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. This commit only checks for the absence of a widened load which rules out the presence of the whole sequence and does not involve an undefined variable. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D99583	2021-03-30 15:32:30 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Philip Reames	67e28173f1	Autogen test to account for tool output format change	2021-03-25 14:41:08 -07:00
Kerry McLaughlin	1f46499690	[SVE][LoopVectorize] Verify support for vectorizing loops with invariant loads D95598 added a cost model for broadcast shuffle, which should enable loops such as the following to vectorize, where the load of b[42] is invariant and can be done using a scalar load + splat: for (int i=0; i<n; ++i) a[i] = b[i] + b[42]; This patch adds tests to verify that we can vectorize such loops. Reviewed By: joechrisellis Differential Revision: https://reviews.llvm.org/D98506	2021-03-25 14:10:21 +00:00
Craig Topper	512bae81cc	[RISCV] Add basic cost modelling for fixed vector gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99142	2021-03-24 11:14:14 -07:00
Florian Hahn	7fb6d9f958	[LV] Add 'fast' flag to test to make sure it will be vectorized. This makes the test more robust with respect to when LV checks if the floating point instructions in a loop can be vectorized.	2021-03-23 15:32:23 +00:00
Florian Hahn	f759d512c8	[VPlan] Include name when printing after `93a9d2de8f`. The name is included when printing in DOT mode. Also print it in non-DOT mode after `93a9d2de8f`. This will become more important to distinguish different plans once VPlans are gradually refined.	2021-03-23 09:50:14 +00:00
Florian Hahn	42ec7a6f08	[VPlan] Add CHECK-LABEL to test/Transforms/LoopVectorize/vplan-printing.ll. This patch adds CHECK-LABEL lines to llvm/test/Transforms/LoopVectorize/vplan-printing.ll in order to make failures slightly easier to diagnose.	2021-03-22 18:29:38 +00:00

1 2 3 4 5 ...

1269 Commits