llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
David Sherwood	4ef9cb6c17	[AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses If we have interleave groups in the loop we want to vectorise then we should fall back on normal vectorisation with a scalar epilogue. In such cases when tail-folding is enabled we'll almost certainly go on to create vplans with very high costs for all vector VFs and fall back on VF=1 anyway. This is likely to be worse than if we'd just used an unpredicated vector loop in the first place. Once the vectoriser has proper support for analysing all the costs for each combination of VF and vectorisation style, then we should be able to remove this. Added an extra test here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D128342	2022-08-02 09:52:33 +01:00
Vasileios Porpodas	f669030373	[TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2. 2xi64 is the legalized type for wide reductions (like 16xi64) and setting the cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable. The few performance measurments that I did on an aarch64 machine confirm that these patterns are actually faster when vectorized. Differential Revision: https://reviews.llvm.org/D130740	2022-08-01 13:03:14 -07:00
chendewen	7eeb468ae5	[Aarch64] Add cost for missing extensions. This patch adds a cost estimate for some missing sign extensions. ref: https://reviews.llvm.org/D14730 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D130565	2022-07-28 17:34:00 +08:00
David Sherwood	f15b6b2907	[AArch64] Add target hook for preferPredicateOverEpilogue This patch adds the AArch64 hook for preferPredicateOverEpilogue, which currently returns true if SVE is enabled and one of the following conditions (non-exhaustive) is met: 1. The "sve-tail-folding" option is set to "all", or 2. The "sve-tail-folding" option is set to "all+noreductions" and the loop does not contain reductions, 3. The "sve-tail-folding" option is set to "all+norecurrences" and the loop has no first-order recurrences. Currently the default option is "disabled", but this will be changed in a later patch. I've added new tests to show the options behave as expected here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll Differential Revision: https://reviews.llvm.org/D129560	2022-07-21 17:20:06 +01:00
Cullen Rhodes	7c3cda551a	[AArch64][SVE] Prefer SIMD&FP variant of clast[ab] The scalar variant with GPR source/dest has considerably higher latency than the SIMD&FP scalar variant across a variety of micro-architectures: Core Scalar SIMD&FP -------------------------------- Neoverse V1 9 cyc 3 cyc Neoverse N2 8 cyc 3 cyc Cortex A510 8 cyc 4 cyc A64FX 29 cyc 6 cyc	2022-07-13 08:53:36 +00:00
Bradley Smith	a83aa33d1b	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976	2022-06-27 10:48:45 +00:00
David Green	fb4d3d238f	[AArch64] Remove unnecessary funnel shift sve costs. D127680 added some unnecessary funnel shift costs for AArch64 to "match the legacy behaviour". The default costs are closer to the correct values and line up with the scalar/neon costs better. Remove the lines again to clean up the code, they can be added back at a later date with better values if needed.	2022-06-21 12:21:37 +01:00
Philip Reames	db85345f2d	[BasicTTI] Allow generic handling of scalable vector fshr/fshl This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in `60e4698b9a`, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue. Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing. AArch64 costing was changed to preserve legacy behavior. There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target. Differential Revision: https://reviews.llvm.org/D127680	2022-06-20 10:38:51 -07:00
Tiehu Zhang	b329156f4f	[AArch64][LV] AArch64 does not prefer vectorized addressing TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads. For aarch64, only gather/scatter (supported by SVE) can deal with vectors of addresses. This patch specializes the hook for AArch64, to return true only when we enable SVE. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D124612	2022-06-17 18:32:50 +08:00
Jingu Kang	bb82f74612	Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"" This reverts commit `42ebfa8269`. The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build failure. Differential Revision: https://reviews.llvm.org/D118979	2022-05-23 16:15:45 +01:00
Bradley Smith	5f4541fefb	[AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic Differential Revision: https://reviews.llvm.org/D125233	2022-05-19 14:07:59 +00:00
Florian Hahn	17a73992dd	[AArch64] Remove redundant f{min,max}nm intrinsics. The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify llvm.aarch64.neon.f{min,max}nm(a, a) -> a. This helps with simplifying code written using the ACLE, e.g. see https://godbolt.org/z/jYxsoc89c Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125234	2022-05-10 19:57:43 +01:00
David Green	dccc69a38d	[AArch64] Add extra reverse costs. This adds some extra costs for reverse shuffles under AArch64, filling in the i16/f16/i8 gaps in the cost model. Differential Revision: https://reviews.llvm.org/D124786	2022-05-06 18:23:36 +01:00
David Green	2dcb2d8562	[AArch64] Cost modelling for fptoi_sat This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many common types they will be legal instructions as the AArch64 instructions will saturate naturally. For unsupported pairs of integer and floating point types, an additional min/max clamp is needed. Differential Revision: https://reviews.llvm.org/D124357	2022-05-02 11:36:05 +01:00
David Kreitzer	6918a15f43	Test commit. Fixed a typo in a comment.	2022-04-29 16:18:09 -07:00
David Green	46cef9a82d	[AArch64] Attempt to fix bots by ensuring legalized type is a vector	2022-04-27 15:36:15 +01:00
David Green	8e2a0e61f5	[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call. Differential Revision: https://reviews.llvm.org/D123414	2022-04-27 13:51:50 +01:00
David Green	d6327050e0	[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly more accurate cost for the resulting shuffle. Differential Revision: https://reviews.llvm.org/D123409	2022-04-27 12:09:01 +01:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Vasileios Porpodas	6a9bbd9f20	Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `55ce296d6f`.	2022-04-26 11:25:26 -07:00
Vasileios Porpodas	55ce296d6f	[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost` Before this patch `Args` was used to pass a broadcat's arguments by SLP. This patch changes this. `Args` is now used for passing the operands of the shuffle. Differential Revision: https://reviews.llvm.org/D124202	2022-04-26 11:11:29 -07:00
Vasileios Porpodas	4e971efad4	Recommit "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64" This reverts commit `7052a0ad68`.	2022-04-22 15:44:02 -07:00
Vasileios Porpodas	7052a0ad68	Revert "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64" This reverts commit `7ba702644b`.	2022-04-22 08:24:04 -07:00
Vasileios Porpodas	7ba702644b	[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64 The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts the lookahead score of splat loads ad they can be done by the `movddup` instruction that combines the load and the broadcast and is cheap to execute. A similar issue shows up on AArch64. The `ld1r` instruction performs a broadcast load and is cheap to execute. This patch implements the TargetTransformInfo hooks for AArch64. Differential Revision: https://reviews.llvm.org/D123638	2022-04-22 07:29:58 -07:00
Muhammad Omair Javaid	42ebfa8269	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit `64b6192e81`. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch.	2022-04-13 04:53:07 +05:00
David Green	fa784f6382	[AArch64] Insert subvector costs An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so long as the index is a multiple of 4. This teaches the cost model that, using code copied over from the X86 backend. Some of the costs (v16i16_4_0) are still high because they get matched as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen tests for inserting subvectors, which I were added as llvm/test/CodeGen/AArch64/insert-subvector.ll. Differential Revision: https://reviews.llvm.org/D120880	2022-04-07 19:27:41 +01:00
Jingu Kang	64b6192e81	[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth Set the maximum VF of AArch64 with 128 / the size of smallest type in loop. Differential Revision: https://reviews.llvm.org/D118979	2022-04-05 13:16:52 +01:00
David Green	750bf3582a	[AArch64] Increase cost of v2i64 multiplies The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4extract + 2insert + 2*mul. Scalarizing to/from gpr registers are expensive though, and the cost wasn't high enough to prevent vectorizing in places where it can be detrimental for performance. This increases it so that the costs of copying to/from GPRs is increased to 2 each, with the total cost increasing to 14. So long as umull/smull are handled correctly (as in D123006) this seems to lead to better vectorization factors and better performance. Differential Revision: https://reviews.llvm.org/D123007	2022-04-04 17:42:20 +01:00
David Green	2abaa027d9	[AArch64] Teach the costmodel about widening muls A vector mul(sext, sext) or mul(zext, zext) will be code generated as a single smull or umull instruction. This most notably effects v2i64 multiplies, which are otherwise not legal and need to be expanded. The oneuse check has also been slightly changed, as it is already checked from the use of isWideningInstruction in getCastInstrCost. Differential Revision: https://reviews.llvm.org/D123006	2022-04-04 12:45:04 +01:00
David Green	3c88ff44c5	[AArch64] Remove unsued WideningBaseCost. NFC The WideningBaseCost is always 0. This removes it to clean up the code.	2022-04-03 22:16:39 +01:00
Vasileios Porpodas	39aa202aff	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `e6ead19b77`.	2022-03-23 18:32:17 -07:00
Arthur Eubanks	e6ead19b77	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash." This reverts commit `27bd8f9492`. Causes crashes, see comments in D121973	2022-03-23 10:57:45 -07:00
Vasileios Porpodas	27bd8f9492	Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash. Original review: https://reviews.llvm.org/D121354 This reverts commit `f7d7d2a08d`.	2022-03-22 16:41:55 -07:00
Arthur Eubanks	f7d7d2a08d	Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads."" This reverts commit `79613185d3`. Causes crashes, see comments in https://reviews.llvm.org/D121973.	2022-03-22 13:33:49 -07:00
Vasileios Porpodas	79613185d3	Recommit "[SLP] Fix lookahead operand reordering for splat loads." Original review: https://reviews.llvm.org/D121354 The original commit `9136145eb0` broke the build on several targets. Differential Revision: https://reviews.llvm.org/D121973	2022-03-21 15:57:32 -07:00
Matt Devereau	a9e08bc7c1	[AArch64][SVE] InstCombine llvm.aarch64.sve.sel to select InstCombine llvm.aarch64.sve.sel to select. This allows an existing instCombine added in `20b0fa91c9` to fire. Differential Revision: https://reviews.llvm.org/D121792	2022-03-17 16:20:48 +00:00
Florian Hahn	aa590e5823	[AArch64] Improve costs for some conversions to fp16. Currently the cost model under-estimates the cost of certain FP16 conversions. This patch updates getCastInstrCost to return more accurate costs for the cases improved in `c2ed9fd054`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D113700	2022-03-11 10:27:39 +00:00
David Green	47f4cd9c3d	[AArch64] Update costs for some fp16 converts This updates the costs for FP16 converts, as some of them were pretty high. Differential Revision: https://reviews.llvm.org/D120771	2022-03-03 11:17:24 +00:00
David Green	65c0e45a37	[AArch64] Vector shifts cost 1 The costs of vector shifts was 2 as opposed to 1, as the nodes are marked custom. Fix this like the others and mark the nodes as cheap. Differential Revision: https://reviews.llvm.org/D120773	2022-03-03 10:42:57 +00:00
Sander de Smalen	0b41238ae7	[AArch64] Emit TBAA metadata for SVE load/store intrinsics In Clang we can attach TBAA metadata based on the load/store intrinsics based on the operation's element type. This also contains changes to InstCombine where the AArch64-specific intrinsics are transformed into generic LLVM load/store operations, to ensure that all metadata is transferred to the new instruction. There will be some further work after this patch to also emit TBAA metadata for SVE's gather/scatter- and struct load/store intrinsics. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D119319	2022-02-11 09:00:29 +00:00
Nikita Popov	3196ef8ee2	[AArch64TargetTransformInfo] Avoid pointer element type access Use the element type of the gathered/scattered vector instead.	2022-02-08 15:18:18 +01:00
Florian Hahn	17ebd68ae6	[AArch64] Fix costs of float vector compare/selects pairs. The current cost-model overestimates the cost of vector compares & selects for ordered floating point compares. This patch fixes that by extending the existing logic for integer predicates. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118256	2022-01-31 10:18:29 +00:00
Alban Bridonneau	2feddb37b4	Implement correct cost for SVE bitcasts We have some bitcasts which we know will be simplified, so their cost is zero. Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D118019	2022-01-26 14:25:44 +00:00
Matt Devereau	cee8b255be	[AArch64][SVE] Remove Redundant aarch64.sve.convert.to.svbool Generated code resulted in redundant aarch64.sve.convert.to.svbool calls for AArch64 Binary Operations. Narrow the more precise operands instead of widening the less precise operands Differential Revision: https://reviews.llvm.org/D116730	2022-01-17 14:35:24 +00:00
David Green	61888d97f6	[AArch64] Basic demand elements for some intrinsics A lot of neon intrinsics work lane-wise, meaning that non-demanded elements in and not demanded out. This teaches that to AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple single-input truncate intrinsics, which can help remove unnecessary instructions. Differential Revision: https://reviews.llvm.org/D117097	2022-01-13 11:53:12 +00:00
David Sherwood	ef1ca4d3e9	[AArch64] Fix incorrect use of MVT::getVectorNumElements in AArch64TTIImpl::getVectorInstrCost If we are inserting into or extracting from a scalable vector we do not know the number of elements at runtime, so we can only let the index wrap for fixed-length vectors. Tests added here: Analysis/CostModel/AArch64/sve-insert-extract.ll Differential Revision: https://reviews.llvm.org/D117099	2022-01-13 09:27:14 +00:00
David Green	bc615e436c	[AArch64] Update addo and subo costs Similar to D116732, this adds basic scalar sadd_with_overflow, uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for aarch64, which are usually quite efficiently lowered. Differential Revision: https://reviews.llvm.org/D116734	2022-01-07 16:20:23 +00:00
David Green	c65270cf96	[AArch64] Add basic umulo and smulo costs This adds some AArch64 specific smul_with_overflow and umul_with_overflow costs, overriding the default costs. The code generation for these mul with overflow intrinsics is usually better than the default expansion on AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll. Differential Revision: https://reviews.llvm.org/D116732	2022-01-06 17:22:47 +00:00
Matthew Devereau	e00f22c1b1	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-12-17 15:04:45 +00:00

1 2 3 4 5 ...

288 Commits