llvm-project

Commit Graph

Author	SHA1	Message	Date
David Sherwood	3ce8c31eb8	[NFC] Add extra RUN line to strict reduction tests I have added RUN lines to both: Transforms/LoopVectorize/AArch64/strict-fadd.ll Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll to show the default behaviour is to not vectorise when the following flag is unset: -force-ordered-reductions	2021-08-10 14:48:38 +01:00
David Sherwood	8439415333	[IR] Let ConstantVector::getSplat use poison instead of undef This patch updates ConstantVector::getSplat to use poison instead of undef when using insertelement/shufflevector to splat. This follows on from D93793. Differential Revision: https://reviews.llvm.org/D107751	2021-08-10 08:27:43 +01:00
Sander de Smalen	3e47f009ff	[LV] Consider ExtractValue as uniform. Since all operands to ExtractValue must be loop-invariant when we deem the loop vectorizable, we can consider ExtractValue to be uniform. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107286	2021-08-05 16:20:50 +01:00
Sander de Smalen	8d08a84745	[LV] Remove a change that was added in D106164. This change wasn't strictly necessary for D106164 and could be removed. This patch addresses the post-commit comments from @fhahn on D106164, and also changes sve-widen-gep.ll to use the same IR test as shown in pointer-induction.ll. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106878	2021-08-05 14:44:53 +01:00
David Sherwood	0156f91f3b	[NFC] Rename enable-strict-reductions to force-ordered-reductions I'm renaming the flag because a future patch will add a new enableOrderedReductions() TTI interface and so the meaning of this flag will change to be one of forcing the target to enable/disable them. Also, since other places in LoopVectorize.cpp use the word 'Ordered' instead of 'strict' I changed the flag to match. Differential Revision: https://reviews.llvm.org/D107264	2021-08-03 09:33:01 +01:00
James Y Knight	3d272eea08	Fix test/Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll. It was writing to the source directory (which may not be writeable), rather than using %t. Fixes: `a5dd6c6cf9` ("[LoopVectorize] Don't interleave scalar ordered reductions for inner loops")	2021-07-27 18:32:29 -04:00
David Sherwood	a5dd6c6cf9	[LoopVectorize] Don't interleave scalar ordered reductions for inner loops Consider the following loop: void foo(float dst, float src, int N) { for (int i = 0; i < N; i++) { dst[i] = 0.0; for (int j = 0; j < N; j++) { dst[i] += src[(i * N) + j]; } } } When we are not building with -Ofast we may attempt to vectorise the inner loop using ordered reductions instead. In addition we also try to select an appropriate interleave count for the inner loop. However, when choosing a VF=1 the inner loop will be scalar and there is existing code in selectInterleaveCount that limits the interleave count to 2 for reductions due to concerns about increasing the critical path. For ordered reductions this problem is even worse due to the additional data dependency, and so I've added code to simply disable interleaving for scalar ordered reductions for now. Test added here: Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll Differential Revision: https://reviews.llvm.org/D106646	2021-07-27 17:41:01 +01:00
Sander de Smalen	d7dd12aee3	[LV] Disable Scalable VFs when tail folding is enabled b/c of low tripcount. The loop vectorizer may decide to use tail folding when the trip-count is low. When that happens, scalable VFs are no longer a candidate, since tail folding/predication is not yet supported for scalable vectors. This can be re-enabled in a future patch. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D106657	2021-07-27 11:37:21 +01:00
Sander de Smalen	13ccb09725	[LV] Don't let ForceTargetInstructionCost override Invalid cost. Invalid costs can be used to avoid vectorization with a given VF, which is used for scalable vectors to avoid things that the code-generator cannot handle. If we override the cost using the -force-target-instruction-cost option of the LV, we would override this mechanism, rendering the flag useless. This change ensures the cost is only overriden when the original cost that was calculated is valid. That allows the flag to be used in combination with the -scalable-vectorization option. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106677	2021-07-26 20:27:49 +01:00
Sander de Smalen	e745277012	[AArch64] NFC: Make some AArch64-SVE LoopVectorize tests generic. This change moves most of `sve-inductions.ll` to non-AArch64 specific LV tests using the `-target-supports-scalable-vectors` flag, because they're not explicitly AArch64-specific. One test builds on AArch64-specific knowledge regarding masked loads/stores, and remains in sve-inductions.ll.	2021-07-26 20:27:48 +01:00
Sander de Smalen	b9051ba848	[LV] Remove assert that VF cannot be scalable in setCostBasedWideningDecision. Scalarization for scalable vectors is not (yet) supported, so the LV discards a VF when scalarization is chosen as the widening decision. It should therefore not assert that the VF is not scalable when it computes the decision to scalarize. The code can get here when both the interleave-cost, gather/scatter cost and scalarization-cost are all illegal. This may e.g. happen for SVE when the VF=1, to avoid generating `<vscale x 1 x eltty>` types that the code-generator cannot yet handle. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106656	2021-07-26 17:11:45 +01:00
Sander de Smalen	981e9dce54	[LV] Don't assume isScalarAfterVectorization if one of the uses needs widening. This fixes an issue that was found in D105199, where a GEP instruction is used both as the address of a store, as well as the value of a store. For the former, the value is scalar after vectorization, but the latter (as value) requires widening. Other code in that function seems to prevent similar cases from happening, but it seems this case was missed. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106164	2021-07-26 16:01:55 +01:00
Florian Hahn	93664503be	[LV] Add test to store a first-order rec via interleave group. This is a reduced version of the reproducer from https://bugs.chromium.org/p/chromium/issues/detail?id=1232798#c2	2021-07-26 15:20:04 +01:00
David Sherwood	b2a5f0029f	Fix test failures caused by `0aff1798b5`	2021-07-26 11:40:26 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Caroline Concatto	5a4de84d55	[LoopVectorize] Fix crash for predicated instruction with scalable VF This patch avoids computing discounts for predicated instructions when the VF is scalable. There is no support for vectorization of loops with division because the vectorizer cannot guarantee that zero divisions will not happen. This loop now does not use VF scalable ``` for (long long i = 0; i < n; i++) if (cond[i]) a[i] /= b[i]; ``` Differential Revision: https://reviews.llvm.org/D101916	2021-07-22 12:48:27 +01:00
Simon Pilgrim	1c9bec727a	[InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450	2021-07-22 10:58:51 +01:00
Simon Pilgrim	ca9b60f9de	[LoopVectorize] Regenerate sve-vector-reverse.ll test checks	2021-07-21 15:14:04 +01:00
Kerry McLaughlin	49d73130ca	[LV] Avoid scalable vectorization for loops containing alloca This patch returns an Invalid cost from getInstructionCost() for alloca instructions if the VF is scalable, as otherwise loops which contain these instructions will crash when attempting to scalarize the alloca. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105824	2021-07-16 11:47:13 +01:00
Sander de Smalen	239d01fa88	Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs." The original patch was: https://reviews.llvm.org/D105806 There were some issues with undeterministic behaviour of the sorting function, which led to scalable-call.ll passing and/or failing. This patch fixes the issue by numbering all instructions in the array first, and using that number as the order, which should provide a consistent ordering. This reverts commit `a607f64118`.	2021-07-16 10:52:01 +01:00
Sander de Smalen	a607f64118	Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs." This reverts commit `efaf3099c8`. This reverts commit `dc7bdc1e71`. Reverting patches due to buildbot failures.	2021-07-15 15:21:57 +01:00
Sander de Smalen	dc7bdc1e71	[LV] Fix determinism for failing scalable-call.ll test. The sort function for emitting an OptRemark was not deterministic, which caused scalable-call.ll to fail on some buildbots. This patch fixes that. This patch also fixes an issue where `Instruction::comesBefore()` is called when two Instructions are in different basic blocks, which would otherwise cause an assertion failure.	2021-07-15 13:16:59 +01:00
Sander de Smalen	efaf3099c8	[LV] Print remark when loop cannot be vectorized due to invalid costs. This patch emits remarks for instructions that have invalid costs for a given set of vectorization factors. Some example output: t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load dst[i] = sinf(src[i]); ^ t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32 dst[i] = sinf(src[i]); ^ t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store dst[i] = sinf(src[i]); ^ Reviewed By: fhahn, kmclaughlin Differential Revision: https://reviews.llvm.org/D105806	2021-07-14 17:11:33 +01:00
Sander de Smalen	eac1670739	[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid. At the moment, <vscale x 1 x eltty> are not yet fully handled by the code-generator, so to avoid vectorizing loops with that VF, we mark the cost for these types as invalid. The reason for not adding a new "TTI::getMinimumScalableVF" is because the type is supposed to be a type that can be legalized. It partially is, although the support for these types need some more work. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D103882	2021-07-14 16:44:22 +01:00
Sander de Smalen	d2e4ccc790	[LV] Ignore candidate VFs with invalid costs. This follows on from discussion on the mailing-list: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html to interpret an Invalid cost as 'infinitely expensive', as this simplifies some of the legalization issues with scalable vectors. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105473	2021-07-12 09:58:22 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Sjoerd Meijer	ee752134ac	[AArch64] Cost-model i8 vector loads/stores Loads of <4 x i8> vectors were modeled as extremely expensive. And while we don't have a load instruction that supports this, it isn't that expensive to create a vector of i8 elements. The codegen for this was fixed/optimised in D105110. This now tweaks the cost model and enables SLP vectorisation of my motivating case loadi8.ll. Differential Revision: https://reviews.llvm.org/D103629	2021-07-05 11:25:10 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Kerry McLaughlin	5db52751a5	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
Kerry McLaughlin	14eeccfe9a	[LoopVectorize] Don't use strict reductions when reordering is allowed If the `-enable-strict-reductions` flag is set to true, then currently we will always choose to vectorize the loop with strict in-order reductions. This is not necessary where we allow the reordering of FP operations, such as when loop hints are passed via metadata. This patch moves useOrderedReductions so that we can also check whether loop hints allow reordering, in which case we should use the default behaviour of vectorizing with unordered reductions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103814	2021-06-08 10:39:29 +01:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Kerry McLaughlin	9f76a85260	[LoopVectorize] Enable strict reductions when allowReordering() returns false When loop hints are passed via metadata, the allowReordering function in LoopVectorizationLegality will allow the order of floating point operations to be changed: bool allowReordering() const { // When enabling loop hints are provided we allow the vectorizer to change // the order of operations that is given by the scalar loop. This is not // enabled by default because can be unsafe or inefficient. The -enable-strict-reductions flag introduced in D98435 will currently only vectorize reductions in-loop if hints are used, since canVectorizeFPMath() will return false if reordering is not allowed. This patch changes canVectorizeFPMath() to query whether it is safe to vectorize the loop with ordered reductions if no hints are used. For testing purposes, an additional flag (-hints-allow-reordering) has been added to disable the reordering behaviour described above. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D101836	2021-05-26 13:59:12 +01:00
Kerry McLaughlin	6b0fe3c63b	[NFC] Add CHECK lines for unordered FP reductions An additional RUN line has been added to both strict-fadd.ll & scalable-strict-fadd.ll to ensure the correct behaviour of these tests where `-enable-strict-reductions` is false. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D103015	2021-05-26 11:00:20 +01:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Sander de Smalen	1e6630311c	NFC: cleaned up and renamed scalable-vf-analysis.ll -> scalable-vectorization.ll * Removes unnecessary loop hints. * Use RUN line with '-scalable-vectorization=preferred' instead of 'on' for the maximize-bandwidth behaviour. This prepares the test for enabling scalable vectorization; With a forced instruction-cost of 1, 'on' will always favour fixed-width VF to be chosen, whereas with 'preferred' we can check that the maximize-bandwidth option in combination with scalable-vectorization=preferred actually picks a scalable VF. * Renamed to scalable-vectorization.ll, because a follow-up patch will test more than just analysis.	2021-05-23 19:53:51 +01:00
Sander de Smalen	4f86aa650c	[LV] Add -scalable-vectorization=<option> flag. This patch adds a new option to the LoopVectorizer to control how scalable vectors can be used. Initially, this suggests three levels to control scalable vectorization, although other more aggressive options can be added in the future. The possible options are: - Disabled: Disables vectorization with scalable vectors. - Enabled: Vectorize loops using scalable vectors or fixed-width vectors, but favors fixed-width vectors when the cost is a tie. - Preferred: Like 'Enabled', but favoring scalable vectors when the cost-model is inconclusive. Reviewed By: paulwalker-arm, vkmr Differential Revision: https://reviews.llvm.org/D101945	2021-05-19 10:40:56 +01:00
Sander de Smalen	81fdc73e5d	[LV] Return both fixed and scalable Max VF from computeMaxVF. This patch introduces a new class, MaxVFCandidates, that holds the maximum vectorization factors that have been computed for both scalable and fixed-width vectors. This patch is intended to be NFC for fixed-width vectors, although considering a scalable max VF (which is disabled by default) pessimises tail-loop elimination, since it can no longer determine if any chosen VF (less than fixed/scalable MaxVFs) is guaranteed to handle all vector iterations if the trip-count is known. This issue will be addressed in a future patch. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D98721	2021-05-18 08:03:48 +01:00
David Sherwood	b7a11274f9	[LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors In InnerLoopVectorizer::widenPHIInstruction there are cases where we have to scalarise a pointer induction variable after vectorisation. For scalable vectors we already deal with the case where the pointer induction variable is uniform, but we currently crash if not uniform. For fixed width vectors we calculate every lane of the scalarised pointer induction variable for a given VF, however this cannot work for scalable vectors. In this case I have added support for caching the whole vector value for each unrolled part so that we can always extract an arbitrary element. Additionally, we still continue to cache the known minimum number of lanes too in order to improve code quality by avoiding an extractelement operation. I have adapted an existing test `pointer_iv_mixed` from the file: Transforms/LoopVectorize/consecutive-ptr-uniforms.ll and added it here for scalable vectors instead: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D101294	2021-05-12 11:02:11 +01:00
Florian Hahn	93a9a8a8d9	[VecLib] Add support for vector fns from Darwin's libsystem. This patch adds support for Darwin's libsystem math vector functions to TLI. Darwin's libsystem provides a range of vector functions for libm functions. This initial patch only adds the 2 x double and 4 x float versions, which are available on both X86 and ARM64. On X86, wider vector versions are supported as well. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D101856	2021-05-10 21:19:58 +01:00
Kerry McLaughlin	8c9742bd23	[SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences Adds support for scalable vectorization of loops containing first-order recurrences, e.g: ``` for(int i = 0; i < n; i++) b[i] = a[i] + a[i - 1] ``` This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into account when inserting into and extracting from the last lane of a vector. CreateVectorSplice has been added to construct a vector for the recurrence, which returns a splice intrinsic for scalable types. For fixed-width the behaviour remains unchanged as CreateVectorSplice will return a shufflevector instead. The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D101076	2021-05-06 11:35:39 +01:00
Sander de Smalen	9931ae645e	Reland "[LV] Calculate max feasible scalable VF." Relands https://reviews.llvm.org/D98509 This reverts commit `51d648c119`.	2021-05-04 15:44:41 +01:00
Sander de Smalen	51d648c119	Revert "[LV] Calculate max feasible scalable VF." Temporarily reverting this patch due to some unexpected issue found by one of the PPC buildbots. This reverts commit `584e9b6e4b`.	2021-04-29 16:04:37 +01:00
David Sherwood	00e65f3345	[LoopVectorize][SVE] Fix crash when vectorising FP negation This patch fixes a crash encountered when vectorising the following loop: void foo(float dst, float src, long long n) { for (long long i = 0; i < n; i++) dst[i] = -src[i]; } using scalable vectors. I've added a test to Transforms/LoopVectorize/AArch64/sve-basic-vec.ll as well as cleaned up the other tests in the same file. Differential Revision: https://reviews.llvm.org/D98054	2021-04-28 15:22:35 +01:00
David Sherwood	6998f8ae2d	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-28 13:41:07 +01:00
Sander de Smalen	584e9b6e4b	[LV] Calculate max feasible scalable VF. This patch also refactors the way the feasible max VF is calculated, although this is NFC for fixed-width vectors. After this change scalable VF hints are no longer truncated/clamped to a shorter scalable VF, nor does it drop the 'scalable flag' from the suggested VF to vectorize with a similar VF that is fixed. Instead, the hint is ignored which means the vectorizer is free to find a more suitable VF, using the CostModel to determine the best possible VF. Reviewed By: c-rhodes, fhahn Differential Revision: https://reviews.llvm.org/D98509	2021-04-28 12:30:00 +01:00
Kerry McLaughlin	9cc217ab36	[LoopVectorize] Prevent multiple Phis being generated with in-order reductions When using the -enable-strict-reductions flag where UF>1 we generate multiple Phi nodes, though only one of these is used as an input to the vector.reduce.fadd intrinsics. The unused Phi nodes are removed later by instcombine. This patch changes widenPHIInstruction/fixReduction to only generate one Phi, and adds an additional test for unrolling to strict-fadd.ll Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D100570	2021-04-28 11:29:01 +01:00
David Sherwood	6968520c3b	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `4afeda9157`.	2021-04-27 15:46:03 +01:00
David Sherwood	4afeda9157	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-27 15:26:15 +01:00
David Sherwood	cf7276820c	[NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions We can already vectorize loops that involve int<>int, fp<>fp, int<>fp and fp<>int conversions, however we didn't previously have any tests for them. This patch adds some tests for each conversion type. Differential Revision: https://reviews.llvm.org/D99951	2021-04-26 11:01:14 +01:00
David Sherwood	a458b7855e	[AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function When vectorising for AArch64 targets if you specify the SVE attribute we automatically then treat masked loads and stores as legal. Also, since we have no cost model for masked memory ops we believe it's cheap to use the masked load/store intrinsics even for fixed width vectors. This can lead to poor code quality as the intrinsics will currently be scalarised in the backend. This patch adds a basic cost model that marks fixed-width masked memory ops as significantly more expensive than for scalable vectors. Tests for the cost model are added here: Transforms/LoopVectorize/AArch64/masked-op-cost.ll Differential Revision: https://reviews.llvm.org/D100745	2021-04-26 11:00:03 +01:00
Joe Ellis	2c551aedcf	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	673e2f1b70	[COST][AARCH64] Improve cost of reverse shuffles for AArch64. Introduced the cost of thre reverse shuffles for AArch64, currently just copied the costs for PermuteSingleSrc. Differential Revision: https://reviews.llvm.org/D100871	2021-04-20 13:47:56 -07:00
Alexey Bataev	683dc41695	Update tests checks, NFC.	2021-04-20 10:20:15 -07:00
Kerry McLaughlin	62ee638a87	[NFC] Add tests for scalable vectorization of loops with in-order reductions D98435 added support for in-order reductions and included tests for fixed-width vectorization with the -enable-strict-reductions flag. This patch adds similar tests to verify support for scalable vectorization of loops with in-order reductions. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D100385	2021-04-19 11:15:55 +01:00
Kerry McLaughlin	93f54fae9d	[NFC] Remove the -instcombine flag from strict-fadd.ll This also fixes a CHECK line in @fadd_strict_unroll which ensures the changes made to fixReduction() to support in-order reductions with unrolling are being tested correctly.	2021-04-15 15:10:48 +01:00
David Sherwood	ea14df695e	[SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction There were a few places in widenPHIInstruction where calculations of offsets were failing to take the runtime calculation of VF into account for scalable vectors. I've fixed those cases in this patch as well as adding an assert that we should not be scalarising for scalable vectors. Tests are added here: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D99254	2021-04-15 10:51:49 +01:00
Sander de Smalen	672f673004	[SVE] Remove checks for warnings in scalable-vector tests. After D98856 these tests will by default break (fatal_error) if any of the wrong interfaces are used, so there's no longer a need to have a RUN line that checks for a warning message emitted by the compiler.	2021-04-07 15:59:32 +01:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
David Sherwood	e3a13304fc	[NFC] Add tests for scalable vectorization of loops with large stride acesses This patch just adds tests that we can vectorize loop such as these: for (i = 0; i < n; i++) dst[i * 7] += 1; and for (i = 0; i < n; i++) if (cond[i]) dst[i * 7] += 1; using scalable vectors, where we expect to use gathers and scatters in the vectorized loop. The vector of pointers used for the gather is identical to those used for the scatter so there should be no memory dependences. Tests are added here: Transforms/LoopVectorize/AArch64/sve-large-strides.ll Differential Revision: https://reviews.llvm.org/D99192	2021-04-01 10:25:06 +01:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Kerry McLaughlin	1f46499690	[SVE][LoopVectorize] Verify support for vectorizing loops with invariant loads D95598 added a cost model for broadcast shuffle, which should enable loops such as the following to vectorize, where the load of b[42] is invariant and can be done using a scalar load + splat: for (int i=0; i<n; ++i) a[i] = b[i] + b[42]; This patch adds tests to verify that we can vectorize such loops. Reviewed By: joechrisellis Differential Revision: https://reviews.llvm.org/D98506	2021-03-25 14:10:21 +00:00
Caroline Concatto	3c03635d53	[SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse This patch adds support for reverse loop vectorization. It is possible to vectorize the following loop: ``` for (int i = n-1; i >= 0; --i) a[i] = b[i] + 1.0; ``` with fixed or scalable vector. The loop-vectorizer will use 'reverse' on the loads/stores to make sure the lanes themselves are also handled in the right order. This patch adds support for scalable vector on IRBuilder interface to create a reverse vector. The IR function CreateVectorReverse lowers to experimental.vector.reverse for scalable vector and keedp the original behavior for fixed vector using shuffle reverse. Differential Revision: https://reviews.llvm.org/D95363	2021-03-16 07:51:59 +00:00
David Sherwood	de3185647d	[LoopVectorize][SVE] Add tests for vectorising conditional loads of invariant addresses For loops of the form: void foo(int a, int cond, short inv, long long n) { for (long long i=0; i<n; ++i) { if (cond[i]) a[i] = inv; } } we can vectorise for SVE using masked gather loads where the array of pointers is simply a vector splat of 'inv' and the mask comes from the condition 'cond[i] != 0'. This patch simply adds tests upstream to defend this capability. Differential Revision: https://reviews.llvm.org/D98043	2021-03-08 08:38:31 +00:00
David Sherwood	fec0a0adac	[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector There are certain loops like this below: for (int i = 0; i < n; i++) { a[i] = b[i] + 1; *inv = a[i]; } that can only be vectorised if we are able to extract the last lane of the vectorised form of 'a[i]'. For fixed width vectors this already works since we know at compile time what the final lane is, however for scalable vectors this is a different story. This patch adds support for extracting the last lane from a scalable vector using a runtime determined lane value. I have added support to VPIteration for runtime-determined lanes that still permit the caching of values. I did this by introducing a new class called VPLane, which describes the lane we're dealing with and provides interfaces to get both the compile-time known lane and the runtime determined value. Whilst doing this work I couldn't find any explicit tests for extracting the last lane values of fixed width vectors so I added tests for both scalable and fixed width vectors. Differential Revision: https://reviews.llvm.org/D95139	2021-03-05 09:57:56 +00:00
Florian Hahn	0cb9d8acbc	[LV] Add test cases that require a larger number of RT checks. Precommit tests cases for D75981.	2021-03-02 10:49:38 +00:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Florian Hahn	15a74b64df	[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe. This patch extends VPWidenPHIRecipe to manage pairs of incoming (VPValue, VPBasicBlock) in the VPlan native path. This is made possible because we now directly manage defined VPValues for recipes. By keeping both the incoming value and block in the recipe directly, code-generation in the VPlan native path becomes independent of the predecessor ordering when fixing up non-induction phis, which currently can cause crashes in the VPlan native path. This fixes PR45958. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D96773	2021-02-22 09:44:25 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Kerry McLaughlin	fea06efe7c	[SVE][LoopVectorize] Support for vectorization of loops with function calls Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs and adds some simple tests for loops containing a function for which there is a vectorized variant available. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96356	2021-02-12 13:47:43 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
Florian Hahn	d5387ec267	[LV] Add tests showing suboptimal vectorization for narrow types. This patch adds additional test cases showing missing/sub-optimal vectorization for loops which contain small and wider memory ops on AArch64.	2021-02-11 17:24:28 +00:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
David Sherwood	d4d4ceeb8f	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
David Green	4cc94b7313	[CostModel] Tests for showing the cost of intrinsics from the vectorizer. NFC	2021-01-24 14:47:15 +00:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Cullen Rhodes	1e7efd397a	[LV] Legalize scalable VF hints In the following loop: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int a, int b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718	2021-01-08 10:49:44 +00:00
Peter Waller	3e357ecd44	[llvm][NFC] Disallow all warnings in TypeSize tests This is a follow-up to a request from a reviewer [0]. The text may change in the future and these tests should not produce any warning output. [0] https://reviews.llvm.org/D91806#inline-879243 Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D94161	2021-01-06 17:17:07 +00:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Juneyoung Lee	278aa65cc4	[IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93793	2020-12-30 04:21:04 +09:00
Roman Lebedev	49dac4aca0	[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Cullen Rhodes	cba4accda0	[LV] Clamp VF hint when unsafe In the following loop the dependence distance is 2 and can only be vectorized if the vector length is no larger than this. void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(4) for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } However, when specifying a VF of 4 via a loop hint this loop is vectorized. According to [1][2], loop hints are ignored if the optimization is not safe to apply. This patch introduces a check to bail of vectorization if the user specified VF is greater than the maximum feasible VF, unless explicitly forced with '-force-vector-width=X'. [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave [2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations Reviewed By: sdesmalen, fhahn, Meinersbur Differential Revision: https://reviews.llvm.org/D90687	2020-12-01 11:30:34 +00:00
Sjoerd Meijer	f44ba25135	ExtractValue instruction costs Instruction ExtractValue wasn't handled in LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled as a mul which is not really accurate. Since it is free (most of the times), this now gets a cost of 0 using getInstructionCost. This is a follow-up of D92208, that required changing this regression test. In a follow up I will look at InsertValue which also isn't handled yet. Differential Revision: https://reviews.llvm.org/D92317	2020-12-01 10:42:23 +00:00
Sjoerd Meijer	5110ff0817	[AArch64][CostModel] Fix cost for mul <2 x i64> This was modeled to have a cost of 1, but since we do not have a MUL.2d this is scalarized into vector inserts/extracts and scalar muls. Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll, which we don't want to SLP vectorize. Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll unfortunately needed changing, but the reason is documented in LoopVectorize.cpp:6855: // The cost of executing VF copies of the scalar instruction. This opcode // is unknown. Assume that it is the same as 'mul'. which I will address next as a follow up of this. Differential Revision: https://reviews.llvm.org/D92208	2020-11-30 11:36:55 +00:00
Joe Ellis	462dd4f803	[SVE][AArch64] Improve specificity of vectorization legality TypeSize test The test was using -O2, where -loop-vectorize will suffice. Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D90685	2020-11-10 10:55:25 +00:00
Florian Hahn	f067bc3c0a	[LoopRotation] Allow loop header duplication if vectorization is forced. -Oz normally does not allow loop header duplication so this loop wouldn't be vectorized. However the vectorization pragma should override this and allow for loop rotation. rdar://problem/49281061 Original patch by Adam Nemet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D59832	2020-10-27 09:28:01 +00:00
Joe Ellis	467e5cf40f	[SVE][AArch64] Fix TypeSize warning in loop vectorization legality The warning would fire when calling isDereferenceableAndAlignedInLoop with a scalable load. Calling isDereferenceableAndAlignedInLoop with a scalable load would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes this issue by: - no longer considering vector loads as candidates in canVectorizeWithIfConvert. This doesn't make sense in the context of identifying scalar loads to vectorize. - making use of getFixedSize inside isDereferenceableAndAlignedInLoop -- this removes the dependency on the deprecated interface, and will trigger an assertion error if the function is ever called with a scalable type. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89798	2020-10-26 17:40:04 +00:00
Florian Hahn	1747aae9fc	[LV] Add cost-model test for AArch64 select costs. Currently, the cost of some compare/select patterns is overestimated on AArch64.	2020-10-26 13:43:31 +00:00
David Green	6d8eea61b1	[AArch64][LV] Move vectorizer test to Transforms/LoopVectorize/AArch64. NFC	2020-10-10 10:15:43 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Florian Hahn	d4ddf63fc4	[SCEV] Use loop guard info when computing the max BE taken count in howFarToZero. For some expressions, we can use information from loop guards when we are looking for a maximum. This patch applies information from loop guards to the expression used to compute the maximum backedge taken count in howFarToZero. It currently replaces an unknown expression X with UMin(X, Y), if the loop is guarded by X ult Y. This patch is minimal in what conditions it applies, and there are a few TODOs to generalize. This partly addresses PR40961. We will also need an update to LV to address it completely. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67178	2020-09-24 11:06:55 +01:00
Florian Hahn	98db27711d	[LV] Do not check widening decision for instrs outside of loop. No widening decisions will be computed for instructions outside the loop. Do not try to get a widening decision. The load/store will be just a scalar load, so treating at as normal should be fine I think. Fixes PR46950. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85087	2020-08-03 10:09:24 +01:00
Florian Hahn	1ccc49924a	[AArch64] Add getCFInstrCost, treat branches as free for throughput. D79164/2596da31740f changed getCFInstrCost to return 1 per default. AArch64 did not have its own implementation, hence the throughput cost of CFI instructions is overestimated. On most cores, most branches should be predicated and essentially free throughput wise. This restores a 9% performance regression on a SPEC2006 benchmark on AArch64 with -O3 LTO & PGO. This patch effectively restores pre `2596da3174` behavior for AArch64 and undoes the AArch64 test changes of the patch. Reviewers: samparker, dmgreen, anemet Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D82755	2020-06-30 20:34:04 +01:00

1 2 3 4 5 ...

264 Commits