llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Devereau	f526c600c0	[AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store and LD1/ST1 to load/store when a ptrue all predicate pattern operand is present. This allows existing IR optimizations such as dead-load removal to occur. Differential Revision: https://reviews.llvm.org/D113489	2021-11-16 11:10:23 +00:00
Matt	4a59694ba1	[AArch64][SVE] Combine FADD and FMUL aarch64 intrinsics to FMLA This is a refinement to the work in https://reviews.llvm.org/D111638 Fold (fadd p a (fmul p b c)) into (fma p a b c) Differential Revision: https://reviews.llvm.org/D113095	2021-11-08 12:22:38 +00:00
Peter Waller	7a34145f40	Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `753eba6421`. Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 13:42:14 +00:00
Peter Waller	753eba6421	Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `1febf42f03`, which has a use-of-uninitialized-memory bug. See: https://reviews.llvm.org/D112076	2021-11-03 13:39:38 +00:00
Peter Waller	1febf42f03	[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 11:02:44 +00:00
Matt	895145aacb	Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA" This reverts commit `fc28a2f8ce`.	2021-11-02 14:56:01 +00:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Matt	fc28a2f8ce	[AArch64][SVE] Combine predicated FMUL/FADD into FMA Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand with one only use and both use the same predicate. Differential Revision: https://reviews.llvm.org/D111638	2021-10-27 11:41:23 +00:00
David Sherwood	9448cdc900	[SVE][Analysis] Tune the cost model according to the tune-cpu attribute This patch introduces a new function: AArch64Subtarget::getVScaleForTuning that returns a value for vscale that can be used for tuning the cost model when using scalable vectors. The VScaleForTuning option in AArch64Subtarget is initialised according to the following rules: 1. If the user has specified the CPU to tune for we use that, else 2. If the target CPU was specified we use that, else 3. The tuning is set to "generic". For CPUs of type "generic" I have assumed that vscale=2. New tests added here: Analysis/CostModel/AArch64/sve-gather.ll Analysis/CostModel/AArch64/sve-scatter.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D110259	2021-10-21 09:33:50 +01:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Matthew Devereau	2ac1999937	[AArch64][SVE] Propagate math flags from intrinsics to instructions Retain floating-point math flags inside instCombineSVEVectorBinOp	2021-10-05 15:39:13 +01:00
Kazu Hirata	c1e32b3fc0	[Target] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-02 12:06:29 -07:00
Matthew Devereau	f085a9db8b	[AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops Replacing fmul and fadd instrinsics with their binary ops results more succinct AArch64 SVE output, e.g.: 4: 65428041 fmul z1.h, p0/m, z1.h, z2.h 8: 65408020 fadd z0.h, p0/m, z0.h, z1.h -> 4: 65620020 fmla z0.h, p0/m, z1.h, z2.h	2021-10-01 11:24:46 +01:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Simon Pilgrim	676f2809b5	[CostModel][AArch64] Don't dereference CostTblEntry before null check. Fix static analysis warning that we check for null Entry after dereferencing it. I don't think this can actually happen as i8/i16 should legalize to use the i32 path which should return a cost - but I'd rather play it safe that rely on an implicit type legalization.	2021-09-29 16:35:29 +01:00
Usman Nadeem	3b12282b0e	[AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set Differential Revision: https://reviews.llvm.org/D109667 Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea	2021-09-22 20:59:46 -07:00
David Spickett	92c9b28347	Revert "[AArch64][SVE] Teach cost model that masked loads/stores are cheap" This reverts commit `734708e04f`. Due to build failures on the 2 stage SVE VLS bot. https://lab.llvm.org/buildbot/#/builders/176/builds/908/steps/11/logs/stdio	2021-09-20 08:45:18 +00:00
Usman Nadeem	757384abff	[AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations zip1(uzp1(A, B), uzp2(A, B)) --> A zip2(uzp1(A, B), uzp2(A, B)) --> B Differential Revision: https://reviews.llvm.org/D109666 Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd	2021-09-17 15:24:46 -07:00
Usman Nadeem	ab111e982f	Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"" This reverts commit `eee7d225de`. Effectively relanding `98c37247d8` after fixing the failing tests. Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5	2021-09-10 18:11:24 -07:00
Usman Nadeem	eee7d225de	Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation" This reverts commit `98c37247d8`.	2021-09-10 13:01:48 -07:00
Usman Nadeem	98c37247d8	[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation Differential Revision: https://reviews.llvm.org/D109118 Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3	2021-09-10 12:52:14 -07:00
David Sherwood	d581d94385	[SVE] Fix the FP arithmetic instruction costs for SVE Several FP instructions (fadd, fsub, etc.) were incorrectly assigned a higher cost for SVE because they have custom lowering, however we know they are legal. This patch explicitly assigns a cost of 2 to these opcodes. Tests added here: Analysis/CostModel/AArch64/arith-fp-sve.ll Differential Revision: https://reviews.llvm.org/D108993	2021-09-02 09:55:13 +01:00
Nikita Popov	c1b7540645	[TTI] Sink IVDescriptors.h include (NFC) Forward declare RecurrenceDescriptor and include IVDescritor.h only in implementation code that actually needs it.	2021-08-30 22:41:58 +02:00
Jun Ma	8c47103491	[AArch64][SVE] Add API for conversion between SVE predicate pattern and element number. NFC This patch solely moves convert operation between SVE predicate pattern and element number into two small functions. It's pre-commit patch for optimize pture with known sve register width. Differential Revision: https://reviews.llvm.org/D108705	2021-08-27 20:03:48 +08:00
Matthew Devereau	9b830c798e	[AArch64][SVE] Teach cost model masked gathers/scatters are cheap Tell the cost model to use the scalable calculation for non-neon fixed vector. This results in a cheaper cost for fixed-length SVE masked gathers/scatters allowing the vectorizor to emit them more frequently.	2021-08-26 11:17:47 +01:00
Jingu Kang	94c4952951	[AArch64] Enable Upper bound unrolling universally Differential Revision: https://reviews.llvm.org/D105996	2021-08-20 11:25:38 +01:00
Matthew Devereau	734708e04f	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-08-19 13:01:33 +01:00
David Sherwood	219d4518fc	[Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive For tight loops like this: float r = 0; for (int i = 0; i < n; i++) { r += a[i]; } it's better not to vectorise at -O3 using fixed-width ordered reductions on AArch64 targets. Although the resulting number of instructions in the generated code ends up being comparable to not vectorising at all, there may be additional costs on some CPUs, for example perhaps the scheduling is worse. It makes sense to deter vectorisation in tight loops. Differential Revision: https://reviews.llvm.org/D108292	2021-08-18 17:01:56 +01:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
Usman Nadeem	5420fc4a27	[AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend Replace vector unpack operation with a scalar extend operation. unpack(splat(X)) --> splat(extend(X)) If we have both, unpkhi and unpklo, for the same vector then we may save a register in some cases, e.g: Hi = unpkhi (splat(X)) Lo = unpklo(splat(X)) --> Hi = Lo = splat(extend(X)) Differential Revision: https://reviews.llvm.org/D106929 Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b	2021-08-09 14:58:54 -07:00
Usman Nadeem	85bbc05154	[AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value Move the last{a,b} operation to the vector operand of the binary instruction if the binop's operand is a splat value. This essentially converts the binop to a scalar operation. Example: // If x and/or y is a splat value: lastX (binop (x, y)) --> binop(lastX(x), lastX(y)) Differential Revision: https://reviews.llvm.org/D106932 Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0	2021-08-09 14:48:41 -07:00
David Green	649cf4514d	[AArch64] Expand the SVE min/max reduction costs to NEON This takes the existing SVE costing for the various min/max reduction intrinsics and expands it to NEON, where I believe it applies equally well. In the process it changes the lowering to use min/max cost, as opposed to summing up the cost of ICmp+Select. Differential Revision: https://reviews.llvm.org/D106239	2021-08-05 23:23:24 +01:00
Roman Lebedev	6f6e9a867f	[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107271	2021-08-03 00:57:26 +03:00
Irina Dobrescu	b01417d3c5	[AArch64] Optimise min/max lowering in ISel Differential Revision: https://reviews.llvm.org/D106561	2021-08-02 13:40:21 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
David Green	38986c6782	[AArch64] Add worst case shuffle costs This adds some missing single source shuffle costs for AArch64, of i16 and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3 coming from the perfect shuffle tables. The larger vector sizes expand into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily chose 8 for the cost to be expensive but not too expensive. Differential Revision: https://reviews.llvm.org/D106241	2021-07-23 09:01:58 +01:00
David Green	c9cebda772	[AArch64] Adjust the cost of integer sum reductions This changes the cost to (LT.first-1) * cost(add) + 2, where the cost of an add is assumed to be 1. This brings it inline with the other reductions. Differential Revision: https://reviews.llvm.org/D106240	2021-07-22 18:19:54 +01:00
Bradley Smith	191f9fa5d2	[AArch64][SVE] Move instcombine like transforms out of SVEIntrinsicOpts Instead move them to the instcombine that happens in AArch64TargetTransformInfo. Differential Revision: https://reviews.llvm.org/D106144	2021-07-20 14:17:30 +00:00
Sander de Smalen	eb1a5120b8	[AArch64][SVE][InstCombine] last{a,b} of a splat vector Replace last{a,b}(splat(X)) with X, irrespective of the predicate. Patch by/Committing on behalf of: Usman Nadeem (mnadeem) Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105520	2021-07-20 09:44:43 +01:00
Sander de Smalen	eac1670739	[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid. At the moment, <vscale x 1 x eltty> are not yet fully handled by the code-generator, so to avoid vectorizing loops with that VF, we mark the cost for these types as invalid. The reason for not adding a new "TTI::getMinimumScalableVF" is because the type is supposed to be a type that can be legalized. It partially is, although the support for these types need some more work. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D103882	2021-07-14 16:44:22 +01:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Caroline Concatto	a2c5c56055	[AArch64][CostModel] Add cost model for experimental.vector.splice This patch adds a new ShuffleKind SK_Splice and then handle the cost in getShuffleCost, as in experimental.vector.reverse. Differential Revision: https://reviews.llvm.org/D104630	2021-07-05 14:30:24 +01:00
Sjoerd Meijer	ee752134ac	[AArch64] Cost-model i8 vector loads/stores Loads of <4 x i8> vectors were modeled as extremely expensive. And while we don't have a load instruction that supports this, it isn't that expensive to create a vector of i8 elements. The codegen for this was fixed/optimised in D105110. This now tweaks the cost model and enables SLP vectorisation of my motivating case loadi8.ll. Differential Revision: https://reviews.llvm.org/D103629	2021-07-05 11:25:10 +01:00
Jun Ma	ae5433945f	[AArch64][SVEIntrinsicOpts] Convect cntb/h/w/d to vscale intrinsic or constant. As is mentioned above Differential Revision: https://reviews.llvm.org/D104852	2021-07-01 10:09:47 +08:00
Rosie Sumpter	0c4651f0a8	[CostModel][AArch64] Improve cost model for vector reduction intrinsics OR, XOR and AND entries are added to the cost table. An extra cost is added when vector splitting occurs. This is done to address the issue of a missed SLP vectorization opportunity due to unreasonably high costs being attributed to the vector Or reduction (see: https://bugs.llvm.org/show_bug.cgi?id=44593). Differential Revision: https://reviews.llvm.org/D104538	2021-06-24 12:02:58 +01:00
Rosie Sumpter	d7c219a506	[CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic Added a case for CTPOP to AArch64TTIImpl::getIntrinsicInstrCost so that the cost estimate matches the codegen in test/CodeGen/AArch64/arm64-vpopcnt.ll Differential Revision: https://reviews.llvm.org/D103952	2021-06-11 11:15:46 +01:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Benjamin Kramer	3dceffd0fd	[AArch64] Silence fallthrough warning. NFC. AArch64TargetTransformInfo.cpp:302:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] default: ^	2021-06-10 17:23:37 +02:00
Irina Dobrescu	de79919e9e	[AArch64] Add cost tests for bitreverse This patch includes cost tests for bit reverse as well as some adjustments to the cost model. Differential Revision: https://reviews.llvm.org/D102755	2021-06-10 14:51:33 +01:00
Kerry McLaughlin	5db52751a5	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
Bradley Smith	60c9b5f35c	[AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics Use llvm.experimental.vector.insert instead of storing into an alloca when generating code for these intrinsics. This defers the codegen of the generated vector to instruction selection, allowing existing shufflevector style optimizations to apply. Additionally, introduce a new target transform that can recognise fixed predicate patterns in the svbool variants of these intrinsics. Differential Revision: https://reviews.llvm.org/D103082	2021-06-07 12:21:38 +01:00
Nicholas Guy	3043cbc436	[AArch64] Further enable UnrollAndJam Due to the dependency on runtime unrolling, UnJ is only enabled by default on in-order scheduling models, and if a cpu is specified through -mcpu. Differential Revision: https://reviews.llvm.org/D103604	2021-06-04 14:18:49 +01:00
Daniil Fukalov	e1cb98be2d	[TTI] NFC: Change getCostOfKeepingLiveOverCall to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102831	2021-05-21 15:18:12 +03:00
Peter Waller	2d574a1104	[CodeGen][AArch64][SVE] Canonicalize intrinsic rdffr{ => _z} Follow up to D101357 / `3fa6510f6`. Supersedes D102330. Goal: Use flags setting rdffrs instead of rdffr + ptest. Problem: RDFFR_P doesn't have have a flags setting equivalent. Solution: in instcombine, canonicalize to RDFFR_PP at the IR level, and rely on RDFFR_PP+PTEST => RDFFRS_PP optimization in AArch64InstrInfo::optimizePTestInstr. While here: * Test that rdffr.z+ptest generates a rdffrs. * Use update_{test,llc}_checks.py on the tests. * Use sve attribute on functions. Differential Revision: https://reviews.llvm.org/D102623	2021-05-20 16:22:50 +00:00
Caroline Concatto	9199b6535d	[CostModel][AArch64] Add missing costs for getShuffleCost with scalable vectors Differential Revision: https://reviews.llvm.org/D102490	2021-05-20 09:08:31 +01:00
Daniil Fukalov	3489c2d7b1	[TTI] NFC: Change getTypeLegalizationCost to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen, kparzysz Differential Revision: https://reviews.llvm.org/D101533	2021-04-30 22:51:51 +03:00
Alexey Bataev	12c51f2358	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 12:48:00 -07:00
Alexey Bataev	6e859f3cd4	Revert "[COST] Improve shuffle kind detection if shuffle mask is provided." This reverts commit `9239932221` to fix a compiler crash on mask checks.	2021-04-29 12:40:33 -07:00
Alexey Bataev	9239932221	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 09:42:56 -07:00
Bradley Smith	89085bcc86	[AArch64][SVE] Convert svdup(vec, SV_VL1, elm) to insertelement(vec, elm, 0) By converting the SVE intrinsic to a normal LLVM insertelement we give the code generator a better chance to remove transitions between GPRs and VPRs Co-authored-by: Paul Walker <paul.walker@arm.com> Depends on D101302 Differential Revision: https://reviews.llvm.org/D101167	2021-04-29 12:17:42 +01:00
Bradley Smith	c8f20ed448	[AArch64][SVE] Move convert.{from,to}.svbool optimization into InstCombine As part of this the ptrue coalescing done in SVEIntrinsicOpts has been modified to not introduce redundant converts, since the convert removal will no longer run after that optimisation to clean up. Differential Revision: https://reviews.llvm.org/D101302	2021-04-29 12:17:42 +01:00
Nicholas Guy	2b6e0c90f9	[AArch64] Enable runtime unrolling for in-order sched models Differential Revision: https://reviews.llvm.org/D97947	2021-04-27 13:22:10 +01:00
David Sherwood	a458b7855e	[AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function When vectorising for AArch64 targets if you specify the SVE attribute we automatically then treat masked loads and stores as legal. Also, since we have no cost model for masked memory ops we believe it's cheap to use the masked load/store intrinsics even for fixed width vectors. This can lead to poor code quality as the intrinsics will currently be scalarised in the backend. This patch adds a basic cost model that marks fixed-width masked memory ops as significantly more expensive than for scalable vectors. Tests for the cost model are added here: Transforms/LoopVectorize/AArch64/masked-op-cost.ll Differential Revision: https://reviews.llvm.org/D100745	2021-04-26 11:00:03 +01:00
Sander de Smalen	f9a50f04ba	[TTI] NFC: Change getIntImmCost[Inst\|Intrin] to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565	2021-04-23 16:06:36 +01:00
Sander de Smalen	e0edfa052f	[TTI] NFC: Change getAddressComputationCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100561	2021-04-23 16:06:35 +01:00
David Sherwood	57ca65e21e	[AArch64] Add instruction costs for FP_TO_UINT and FP_TO_SINT with half types We were missing some instruction costs when converting vectors of floating point half types into integers, so I've added those here. I also manually generated assembly code for each FP->int case and looked at the number of instructions generated, which meant adjusting some of the existing costs too. I've updated an existing test to reflect the new costs: Analysis/CostModel/AArch64/sve-fptoi.ll Differential Revision: https://reviews.llvm.org/D99935	2021-04-21 09:39:45 +01:00
Alexey Bataev	673e2f1b70	[COST][AARCH64] Improve cost of reverse shuffles for AArch64. Introduced the cost of thre reverse shuffles for AArch64, currently just copied the costs for PermuteSingleSrc. Differential Revision: https://reviews.llvm.org/D100871	2021-04-20 13:47:56 -07:00
Joe Ellis	c91cd4f3bb	[AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts... when the predicate used by last{a,b} specifies a known vector length. For example: aarch64_sve_lasta(VL1, D) -> extractelement(D, #1) aarch64_sve_lastb(VL1, D) -> extractelement(D, #0) Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100476	2021-04-20 10:01:33 +00:00
Florian Hahn	acd9cc7495	[AArch64] Use type-legalization cost for code size memop cost. At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory operations on large vectors have a cost of one, even if they will get expanded to a large number of memory operations in the backend. This patch updates getMemoryOpCost to return the cost for the type legalization for both CodeSize and SizeAndLatency. This should more accurately reflect the number of memory operations required. I am not sure how latency should properly be included in SizeAndLatency from the description, but returning the size cost should be clearly more accurate. This does not cause any binary changes when building MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because large vector memops are not really formed by code emitted from Clang. But using the C/C++ matrix extension can easily result in code with very large vector operations directly from Clang, e.g. https://clang.godbolt.org/z/6xzxcTGvb Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100291	2021-04-15 10:11:05 +01:00
Sander de Smalen	4f42d873c2	[TTI] NFC: Change getArithmeticInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100317	2021-04-14 17:20:36 +01:00
Sander de Smalen	1af35e77f4	[TTI] NFC: Change getVectorInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100315	2021-04-14 17:20:35 +01:00
Sander de Smalen	174e8f6c5e	[TTI] NFC: Change getShuffleCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100314	2021-04-14 17:20:35 +01:00
Sander de Smalen	14b934f8a6	[TTI] NFC: Change getCFInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100313	2021-04-14 17:20:34 +01:00
Sander de Smalen	03f47bdcb1	[TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100205	2021-04-13 14:21:02 +01:00
Sander de Smalen	db134e2428	[TTI] NFC: Change getCmpSelInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100203	2021-04-13 14:21:01 +01:00
Sander de Smalen	2285dfb73f	[TTI] NFC: Change getMinMaxReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100202	2021-04-13 14:21:00 +01:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	fd1f8a5462	[TTI] NFC: Change getGatherScatterOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100200	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00
Sander de Smalen	2f6f249a49	NFC: Change getIntrinsicInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D97468 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97469	2021-03-31 14:04:41 +01:00
Nashe Mncube	19601a4c6c	[SVE][Analysis]Instruction costs for ops on scalable-vec The following operations have no associated cost for them when applied to scalable vectors, and as a consequence can trigger a crash when a call is made to AArch64TTIImpl::getCastInstrCost(): - fptrunc - trunc - fpext - fpto(u,s)i This patch adds costs for these operations and relevant regression tests. Differential Revision: https://reviews.llvm.org/D98934	2021-03-29 11:15:50 +01:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
Caroline Concatto	f2b749be15	[CostModel][SVE] Add cost model for shuffle reverse with i1 and scalable vector This patch adds the cost model for experimental.vector.reverse with scalable vector types: nxv16i1, nxv8i1, nxv4i1 and nxv2i1. These types are missing from the previous cost model patch D95603. The cost model for experimental.vector.reverse with 1 bit mask is used by loop vectorization in the patch D95363 Differential Revision: https://reviews.llvm.org/D97758	2021-03-04 18:52:59 +00:00
Stelios Ioannou	30cb9c03b5	[AArch64] Add abs intrinsic costs This patch adds cost-modelling for abs vector intrinsic. Change-Id: I89007971bfb15f5b4a02a2eadfd43018e9a73976	2021-02-25 09:31:52 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Caroline Concatto	b52e6c5891	[CostModel]Add cost model for experimental.vector.reverse This patch uses the function getShuffleCost with SK_Reverse to compute the cost for experimental.vector.reverse. For scalable vector type, it adds a table will the legal types on AArch64TTIImpl::getShuffleCost to not assert in BasicTTIImpl::getShuffleCost, and for fixed vector, it relies on the existing cost model in BasicTTIImpl. Depends on D94883 Differential Revision: https://reviews.llvm.org/D95603	2021-02-15 14:23:57 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
Caroline Concatto	2cbcf3e297	[AArch64][SVE]Add cost model for broadcast shuffle This patch adds a cost model for SK_Broadcast in AArch64TTIImpl::getShuffleCost with scalable vector. Without this patch, the scalable vector type relies on BasicTTIImpl cost implementation and assert. Differential Revision: https://reviews.llvm.org/D95598	2021-02-03 09:53:22 +00:00
David Green	0175cd00a1	[AArch64] Add vector saturating add intrinsic costs This adds sadd.sat, uadd.sat, ssub.sat and usub.sat costs for AArch64, similar to how they were recently added for ARM. Differential Revision: https://reviews.llvm.org/D95292	2021-01-27 10:38:32 +00:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Caroline Concatto	172f1f8952	[AArch64][SVE]Add cost model for vector reduce for scalable vector This patch computes the cost for vector.reduce<operand> for scalable vectors. The cost is split into two parts: the legalization cost and the horizontal reduction. Differential Revision: https://reviews.llvm.org/D93639	2021-01-19 11:54:16 +00:00
Caroline Concatto	01c190e907	[AArch64][CostModel]Fix gather scatter cost model This patch fixes a bug introduced in the patch: https://reviews.llvm.org/D93030 This patch pulls the test for scalable vector to be the first instruction to be checked. This avoids the Gather and Scatter cost model for AArch64 to compute the number of vector elements for something that is not a vector and therefore crashing.	2021-01-07 14:02:08 +00:00
Caroline Concatto	060cfd9795	[AArch64][SVE]Add cost model for masked gather and scatter for scalable vector. A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that returns the maximum vscale for a given target. When known getMaxVScale is used to compute the cost of masked gather scatter for scalable vector. Depends on D92094 Differential Revision: https://reviews.llvm.org/D93030	2021-01-04 13:59:58 +00:00
Sjoerd Meijer	5110ff0817	[AArch64][CostModel] Fix cost for mul <2 x i64> This was modeled to have a cost of 1, but since we do not have a MUL.2d this is scalarized into vector inserts/extracts and scalar muls. Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll, which we don't want to SLP vectorize. Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll unfortunately needed changing, but the reason is documented in LoopVectorize.cpp:6855: // The cost of executing VF copies of the scalar instruction. This opcode // is unknown. Assume that it is the same as 'mul'. which I will address next as a follow up of this. Differential Revision: https://reviews.llvm.org/D92208	2020-11-30 11:36:55 +00:00
Simon Pilgrim	969918e177	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code. Differential Revision: https://reviews.llvm.org/D92183	2020-11-27 11:18:58 +00:00
Caroline Concatto	6c4d8f4651	[AArch64] Add check for widening instruction for SVE. This patch fixes the function isWideningInstruction for scalable vectors. Now the cost model can check the widening pattern for SVE. Differential Revision: https://reviews.llvm.org/D91260	2020-11-16 12:30:08 +00:00

1 2 3 4 5 ...

283 Commits