llvm-project

Commit Graph

Author	SHA1	Message	Date
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `e6ead19b77`.	2022-03-23 18:32:17 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit `27bd8f9492`. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `f7d7d2a08d`.	2022-03-22 16:41:55 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit `79613185d3`. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit `9136145eb0` broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Matt Devereau	a9e08bc7c1	[AArch64][SVE] InstCombine llvm.aarch64.sve.sel to select InstCombine llvm.aarch64.sve.sel to select. This allows an existing instCombine added in `20b0fa91c9` to fire. Differential Revision: https://reviews.llvm.org/D121792	2022-03-17 16:20:48 +00:00
Florian Hahn	aa590e5823	[AArch64] Improve costs for some conversions to fp16. Currently the cost model under-estimates the cost of certain FP16 conversions. This patch updates getCastInstrCost to return more accurate costs for the cases improved in `c2ed9fd054`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D113700	2022-03-11 10:27:39 +00:00
David Green	47f4cd9c3d	[AArch64] Update costs for some fp16 converts This updates the costs for FP16 converts, as some of them were pretty high. Differential Revision: https://reviews.llvm.org/D120771	2022-03-03 11:17:24 +00:00
David Green	65c0e45a37	[AArch64] Vector shifts cost 1 The costs of vector shifts was 2 as opposed to 1, as the nodes are marked custom. Fix this like the others and mark the nodes as cheap. Differential Revision: https://reviews.llvm.org/D120773	2022-03-03 10:42:57 +00:00
Sander de Smalen	0b41238ae7	[AArch64] Emit TBAA metadata for SVE load/store intrinsics In Clang we can attach TBAA metadata based on the load/store intrinsics based on the operation's element type. This also contains changes to InstCombine where the AArch64-specific intrinsics are transformed into generic LLVM load/store operations, to ensure that all metadata is transferred to the new instruction. There will be some further work after this patch to also emit TBAA metadata for SVE's gather/scatter- and struct load/store intrinsics. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D119319	2022-02-11 09:00:29 +00:00
Nikita Popov	3196ef8ee2	[AArch64TargetTransformInfo] Avoid pointer element type access Use the element type of the gathered/scattered vector instead.	2022-02-08 15:18:18 +01:00
Florian Hahn	17ebd68ae6	[AArch64] Fix costs of float vector compare/selects pairs. The current cost-model overestimates the cost of vector compares & selects for ordered floating point compares. This patch fixes that by extending the existing logic for integer predicates. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118256	2022-01-31 10:18:29 +00:00
Alban Bridonneau	2feddb37b4	Implement correct cost for SVE bitcasts We have some bitcasts which we know will be simplified, so their cost is zero. Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D118019	2022-01-26 14:25:44 +00:00
Matt Devereau	cee8b255be	[AArch64][SVE] Remove Redundant aarch64.sve.convert.to.svbool Generated code resulted in redundant aarch64.sve.convert.to.svbool calls for AArch64 Binary Operations. Narrow the more precise operands instead of widening the less precise operands Differential Revision: https://reviews.llvm.org/D116730	2022-01-17 14:35:24 +00:00
David Green	61888d97f6	[AArch64] Basic demand elements for some intrinsics A lot of neon intrinsics work lane-wise, meaning that non-demanded elements in and not demanded out. This teaches that to AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple single-input truncate intrinsics, which can help remove unnecessary instructions. Differential Revision: https://reviews.llvm.org/D117097	2022-01-13 11:53:12 +00:00
David Sherwood	ef1ca4d3e9	[AArch64] Fix incorrect use of MVT::getVectorNumElements in AArch64TTIImpl::getVectorInstrCost If we are inserting into or extracting from a scalable vector we do not know the number of elements at runtime, so we can only let the index wrap for fixed-length vectors. Tests added here: Analysis/CostModel/AArch64/sve-insert-extract.ll Differential Revision: https://reviews.llvm.org/D117099	2022-01-13 09:27:14 +00:00
David Green	bc615e436c	[AArch64] Update addo and subo costs Similar to D116732, this adds basic scalar sadd_with_overflow, uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for aarch64, which are usually quite efficiently lowered. Differential Revision: https://reviews.llvm.org/D116734	2022-01-07 16:20:23 +00:00
David Green	c65270cf96	[AArch64] Add basic umulo and smulo costs This adds some AArch64 specific smul_with_overflow and umul_with_overflow costs, overriding the default costs. The code generation for these mul with overflow intrinsics is usually better than the default expansion on AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll. Differential Revision: https://reviews.llvm.org/D116732	2022-01-06 17:22:47 +00:00
Matthew Devereau	e00f22c1b1	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-12-17 15:04:45 +00:00
Matt Devereau	fb47725d14	[AArch64][SVE] Instcombine SDIV to ASRD Instcombine SDIV to ASRD when the third operand of SDIV is a power of 2 Differential Revision: https://reviews.llvm.org/D115448	2021-12-14 15:58:28 +00:00
David Sherwood	8b0448ce5d	[AArch64][Analysis] Add on overhead costs for SVE gathers and scatters This patch adds on an overhead cost for gathers and scatters, which is a rough estimate based on performance investigations I have performed on SVE hardware for various micro-benchmarks. Differential Revision: https://reviews.llvm.org/D115143	2021-12-09 16:02:59 +00:00
Paul Walker	01bc67e449	[SVE][InstCombine] Support more cases where ld1/st1 can be lowered to load/store instructions. This patch extends the "is all active predicate" check to cover cases where the predicate is casted but in a way that doesn't change its "all active" status. Differential Revision: https://reviews.llvm.org/D115047	2021-12-08 11:01:33 +00:00
Igor Kirillov	08d45e6f4d	[AArch64][SVEIntrinsicOpts] Fix: predicated SVE mul/fmul are not commutative We can not swap multiplicand and multiplier because the sve intrinsics are predicated. Imagine lanes in vectors having the following values: pg = 0 multiplicand = 1 (from dup) multiplier = 2 The resulting value should be 1, but if we swap multiplicand and multiplier it will become 2, which is incorrect. Differential Revision: https://reviews.llvm.org/D114577	2021-11-26 12:41:27 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Matt Devereau	f526c600c0	[AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store and LD1/ST1 to load/store when a ptrue all predicate pattern operand is present. This allows existing IR optimizations such as dead-load removal to occur. Differential Revision: https://reviews.llvm.org/D113489	2021-11-16 11:10:23 +00:00
Matt	4a59694ba1	[AArch64][SVE] Combine FADD and FMUL aarch64 intrinsics to FMLA This is a refinement to the work in https://reviews.llvm.org/D111638 Fold (fadd p a (fmul p b c)) into (fma p a b c) Differential Revision: https://reviews.llvm.org/D113095	2021-11-08 12:22:38 +00:00
Peter Waller	7a34145f40	Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `753eba6421`. Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 13:42:14 +00:00
Peter Waller	753eba6421	Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `1febf42f03`, which has a use-of-uninitialized-memory bug. See: https://reviews.llvm.org/D112076	2021-11-03 13:39:38 +00:00
Peter Waller	1febf42f03	[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 11:02:44 +00:00
Matt	895145aacb	Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA" This reverts commit `fc28a2f8ce`.	2021-11-02 14:56:01 +00:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Matt	fc28a2f8ce	[AArch64][SVE] Combine predicated FMUL/FADD into FMA Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand with one only use and both use the same predicate. Differential Revision: https://reviews.llvm.org/D111638	2021-10-27 11:41:23 +00:00
David Sherwood	9448cdc900	[SVE][Analysis] Tune the cost model according to the tune-cpu attribute This patch introduces a new function: AArch64Subtarget::getVScaleForTuning that returns a value for vscale that can be used for tuning the cost model when using scalable vectors. The VScaleForTuning option in AArch64Subtarget is initialised according to the following rules: 1. If the user has specified the CPU to tune for we use that, else 2. If the target CPU was specified we use that, else 3. The tuning is set to "generic". For CPUs of type "generic" I have assumed that vscale=2. New tests added here: Analysis/CostModel/AArch64/sve-gather.ll Analysis/CostModel/AArch64/sve-scatter.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D110259	2021-10-21 09:33:50 +01:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Matthew Devereau	2ac1999937	[AArch64][SVE] Propagate math flags from intrinsics to instructions Retain floating-point math flags inside instCombineSVEVectorBinOp	2021-10-05 15:39:13 +01:00
Kazu Hirata	c1e32b3fc0	[Target] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-02 12:06:29 -07:00
Matthew Devereau	f085a9db8b	[AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops Replacing fmul and fadd instrinsics with their binary ops results more succinct AArch64 SVE output, e.g.: 4: 65428041 fmul z1.h, p0/m, z1.h, z2.h 8: 65408020 fadd z0.h, p0/m, z0.h, z1.h -> 4: 65620020 fmla z0.h, p0/m, z1.h, z2.h	2021-10-01 11:24:46 +01:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Simon Pilgrim	676f2809b5	[CostModel][AArch64] Don't dereference CostTblEntry before null check. Fix static analysis warning that we check for null Entry after dereferencing it. I don't think this can actually happen as i8/i16 should legalize to use the i32 path which should return a cost - but I'd rather play it safe that rely on an implicit type legalization.	2021-09-29 16:35:29 +01:00
Usman Nadeem	3b12282b0e	[AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set Differential Revision: https://reviews.llvm.org/D109667 Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea	2021-09-22 20:59:46 -07:00
David Spickett	92c9b28347	Revert "[AArch64][SVE] Teach cost model that masked loads/stores are cheap" This reverts commit `734708e04f`. Due to build failures on the 2 stage SVE VLS bot. https://lab.llvm.org/buildbot/#/builders/176/builds/908/steps/11/logs/stdio	2021-09-20 08:45:18 +00:00
Usman Nadeem	757384abff	[AArch64][SVE][InstCombine] Fold redundant zip1/2(uzp1/2) operations zip1(uzp1(A, B), uzp2(A, B)) --> A zip2(uzp1(A, B), uzp2(A, B)) --> B Differential Revision: https://reviews.llvm.org/D109666 Change-Id: I4a6578db2fcef9ff71ad0e77b9fe08354e6dbfcd	2021-09-17 15:24:46 -07:00
Usman Nadeem	ab111e982f	Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"" This reverts commit `eee7d225de`. Effectively relanding `98c37247d8` after fixing the failing tests. Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5	2021-09-10 18:11:24 -07:00
Usman Nadeem	eee7d225de	Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation" This reverts commit `98c37247d8`.	2021-09-10 13:01:48 -07:00
Usman Nadeem	98c37247d8	[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation Differential Revision: https://reviews.llvm.org/D109118 Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3	2021-09-10 12:52:14 -07:00
David Sherwood	d581d94385	[SVE] Fix the FP arithmetic instruction costs for SVE Several FP instructions (fadd, fsub, etc.) were incorrectly assigned a higher cost for SVE because they have custom lowering, however we know they are legal. This patch explicitly assigns a cost of 2 to these opcodes. Tests added here: Analysis/CostModel/AArch64/arith-fp-sve.ll Differential Revision: https://reviews.llvm.org/D108993	2021-09-02 09:55:13 +01:00
Nikita Popov	c1b7540645	[TTI] Sink IVDescriptors.h include (NFC) Forward declare RecurrenceDescriptor and include IVDescritor.h only in implementation code that actually needs it.	2021-08-30 22:41:58 +02:00
Jun Ma	8c47103491	[AArch64][SVE] Add API for conversion between SVE predicate pattern and element number. NFC This patch solely moves convert operation between SVE predicate pattern and element number into two small functions. It's pre-commit patch for optimize pture with known sve register width. Differential Revision: https://reviews.llvm.org/D108705	2021-08-27 20:03:48 +08:00
Matthew Devereau	9b830c798e	[AArch64][SVE] Teach cost model masked gathers/scatters are cheap Tell the cost model to use the scalable calculation for non-neon fixed vector. This results in a cheaper cost for fixed-length SVE masked gathers/scatters allowing the vectorizor to emit them more frequently.	2021-08-26 11:17:47 +01:00

1 2 3 4 5 ...

257 Commits