Commit Graph

283 Commits

Author SHA1 Message Date
Cullen Rhodes 7c3cda551a [AArch64][SVE] Prefer SIMD&FP variant of clast[ab]
The scalar variant with GPR source/dest has considerably higher latency
than the SIMD&FP scalar variant across a variety of micro-architectures:

  Core           Scalar    SIMD&FP
  --------------------------------
  Neoverse V1     9 cyc      3 cyc
  Neoverse N2     8 cyc      3 cyc
  Cortex A510     8 cyc      4 cyc
  A64FX          29 cyc      6 cyc
2022-07-13 08:53:36 +00:00
Bradley Smith a83aa33d1b [IR] Move vector.insert/vector.extract out of experimental namespace
These intrinsics are now fundemental for SVE code generation and have been
present for a year and a half, hence move them out of the experimental
namespace.

Differential Revision: https://reviews.llvm.org/D127976
2022-06-27 10:48:45 +00:00
David Green fb4d3d238f [AArch64] Remove unnecessary funnel shift sve costs.
D127680 added some unnecessary funnel shift costs for AArch64 to "match
the legacy behaviour". The default costs are closer to the correct
values and line up with the scalar/neon costs better. Remove the lines
again to clean up the code, they can be added back at a later date with
better values if needed.
2022-06-21 12:21:37 +01:00
Philip Reames db85345f2d [BasicTTI] Allow generic handling of scalable vector fshr/fshl
This change removes an explicit scalable vector bailout for fshl and fshr. This bailout was added in 60e4698b9a, when sinking a unconditional bailout for all intrinsics into selected cases. Its not clear if the bailout was originally unneeded, or if our cost model infrastructure has simply matured in the meantime. Either way, the generic code appears to handle scalable vectors without issue.

Note that the RISC-V cost model changes here aren't particularly interesting. They do probably better match the current lowering, but the main point is to have coverage of the BasicTTI path and simply show lack of crashing.

AArch64 costing was changed to preserve legacy behavior.  There will most likely be an upcoming change to use the generic costs there too, but I didn't want to make that change not being particularly familiar with the target.

Differential Revision: https://reviews.llvm.org/D127680
2022-06-20 10:38:51 -07:00
Tiehu Zhang b329156f4f [AArch64][LV] AArch64 does not prefer vectorized addressing
TTI::prefersVectorizedAddressing() try to vectorize the addresses that lead to loads.
For aarch64, only gather/scatter (supported by SVE) can deal with vectors of addresses.
This patch specializes the hook for AArch64, to return true only when we enable SVE.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D124612
2022-06-17 18:32:50 +08:00
Jingu Kang bb82f74612 Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269.

The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build
failure.

Differential Revision: https://reviews.llvm.org/D118979
2022-05-23 16:15:45 +01:00
Bradley Smith 5f4541fefb [AArch64][SVE] Convert SRSHL to LSL when the fed from an ABS intrinsic
Differential Revision: https://reviews.llvm.org/D125233
2022-05-19 14:07:59 +00:00
Florian Hahn 17a73992dd
[AArch64] Remove redundant f{min,max}nm intrinsics.
The patch extends AArch64TTIImpl::instCombineIntrinsic to simplify
llvm.aarch64.neon.f{min,max}nm(a, a) -> a.

This helps with simplifying code written using the ACLE, e.g.
see https://godbolt.org/z/jYxsoc89c

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D125234
2022-05-10 19:57:43 +01:00
David Green dccc69a38d [AArch64] Add extra reverse costs.
This adds some extra costs for reverse shuffles under AArch64, filling
in the i16/f16/i8 gaps in the cost model.

Differential Revision: https://reviews.llvm.org/D124786
2022-05-06 18:23:36 +01:00
David Green 2dcb2d8562 [AArch64] Cost modelling for fptoi_sat
This builds on top of the target-independent cost model added in D124269
to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics.
For many common types they will be legal instructions as the AArch64
instructions will saturate naturally. For unsupported pairs of integer
and floating point types, an additional min/max clamp is needed.

Differential Revision: https://reviews.llvm.org/D124357
2022-05-02 11:36:05 +01:00
David Kreitzer 6918a15f43 Test commit. Fixed a typo in a comment. 2022-04-29 16:18:09 -07:00
David Green 46cef9a82d [AArch64] Attempt to fix bots by ensuring legalized type is a vector 2022-04-27 15:36:15 +01:00
David Green 8e2a0e61f5 [AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost
Given a larger-than-legal shuffle mask, the final codegen will split
into multiple sub-vectors. This attempts to model that in
AArch64TTIImpl::getShuffleCost, splitting masks up according to the size
of the legalized vectors. If the sub-masks have at most 2 input sources
we can call getShuffleCost on them and sum the costs, to get a more
accurate final cost for the entire shuffle. The call to
improveShuffleKindFromMask helps to improve the shuffle kind for the
sub-mask cost call.

Differential Revision: https://reviews.llvm.org/D123414
2022-04-27 13:51:50 +01:00
David Green d6327050e0 [AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost
Given a shuffle with 4 elements size 16 or 32, we can use the costs
directly from the PerfectShuffle tables to get a slightly more accurate
cost for the resulting shuffle.

Differential Revision: https://reviews.llvm.org/D123409
2022-04-27 12:09:01 +01:00
Vasileios Porpodas fa8a9fea47 Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Vasileios Porpodas 6a9bbd9f20 Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 55ce296d6f.
2022-04-26 11:25:26 -07:00
Vasileios Porpodas 55ce296d6f [SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202
2022-04-26 11:11:29 -07:00
Vasileios Porpodas 4e971efad4 Recommit "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64"
This reverts commit 7052a0ad68.
2022-04-22 15:44:02 -07:00
Vasileios Porpodas 7052a0ad68 Revert "[SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64"
This reverts commit 7ba702644b.
2022-04-22 08:24:04 -07:00
Vasileios Porpodas 7ba702644b [SLP][AArch64] Implement lookahead operand reordering score of splat loads for AArch64
The original patch (https://reviews.llvm.org/D121354) targets x86 and adjusts
the lookahead score of splat loads ad they can be done by the `movddup`
instruction that combines the load and the broadcast and is cheap to execute.

A similar issue shows up on AArch64. The `ld1r` instruction performs a broadcast
load and is cheap to execute.

This patch implements the TargetTransformInfo hooks for AArch64.

Differential Revision: https://reviews.llvm.org/D123638
2022-04-22 07:29:58 -07:00
Muhammad Omair Javaid 42ebfa8269 Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e81.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.
2022-04-13 04:53:07 +05:00
David Green fa784f6382 [AArch64] Insert subvector costs
An insert subvector under aarch64 can often be done as a single lane mov
operation. For example a v4i8 inserted into a v16i8 is a s-reg mov, so
long as the index is a multiple of 4. This teaches the cost model that,
using code copied over from the X86 backend.

Some of the costs (v16i16_4_0) are still high because they get matched
as a SK_Select, not an SK_InsertSubvector. D120879 has some codegen
tests for inserting subvectors, which I were added as
llvm/test/CodeGen/AArch64/insert-subvector.ll.

Differential Revision: https://reviews.llvm.org/D120880
2022-04-07 19:27:41 +01:00
Jingu Kang 64b6192e81 [AArch64] Set maximum VF with shouldMaximizeVectorBandwidth
Set the maximum VF of AArch64 with 128 / the size of smallest type in loop.

Differential Revision: https://reviews.llvm.org/D118979
2022-04-05 13:16:52 +01:00
David Green 750bf3582a [AArch64] Increase cost of v2i64 multiplies
The cost of a v2i64 multiply was special cased in D92208 as scalarized
into 4*extract + 2*insert + 2*mul. Scalarizing to/from gpr registers are
expensive though, and the cost wasn't high enough to prevent vectorizing
in places where it can be detrimental for performance. This increases it
so that the costs of copying to/from GPRs is increased to 2 each, with
the total cost increasing to 14. So long as umull/smull are handled
correctly (as in D123006) this seems to lead to better vectorization
factors and better performance.

Differential Revision: https://reviews.llvm.org/D123007
2022-04-04 17:42:20 +01:00
David Green 2abaa027d9 [AArch64] Teach the costmodel about widening muls
A vector mul(sext, sext) or mul(zext, zext) will be code generated as a
single smull or umull instruction. This most notably effects v2i64
multiplies, which are otherwise not legal and need to be expanded.

The oneuse check has also been slightly changed, as it is already
checked from the use of isWideningInstruction in getCastInstrCost.

Differential Revision: https://reviews.llvm.org/D123006
2022-04-04 12:45:04 +01:00
David Green 3c88ff44c5 [AArch64] Remove unsued WideningBaseCost. NFC
The WideningBaseCost is always 0. This removes it to clean up the code.
2022-04-03 22:16:39 +01:00
Vasileios Porpodas 39aa202aff Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit e6ead19b77.
2022-03-23 18:32:17 -07:00
Arthur Eubanks e6ead19b77 Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash."
This reverts commit 27bd8f9492.

Causes crashes, see comments in D121973
2022-03-23 10:57:45 -07:00
Vasileios Porpodas 27bd8f9492 Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit f7d7d2a08d.
2022-03-22 16:41:55 -07:00
Arthur Eubanks f7d7d2a08d Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads.""
This reverts commit 79613185d3.

Causes crashes, see comments in https://reviews.llvm.org/D121973.
2022-03-22 13:33:49 -07:00
Vasileios Porpodas 79613185d3 Recommit "[SLP] Fix lookahead operand reordering for splat loads."
Original review: https://reviews.llvm.org/D121354

The original commit 9136145eb0 broke the build on several targets.

Differential Revision: https://reviews.llvm.org/D121973
2022-03-21 15:57:32 -07:00
Matt Devereau a9e08bc7c1 [AArch64][SVE] InstCombine llvm.aarch64.sve.sel to select
InstCombine llvm.aarch64.sve.sel to select. This allows an existing instCombine
added in 20b0fa91c9 to fire.

Differential Revision: https://reviews.llvm.org/D121792
2022-03-17 16:20:48 +00:00
Florian Hahn aa590e5823
[AArch64] Improve costs for some conversions to fp16.
Currently the cost model under-estimates the cost of certain
FP16 conversions.

This patch updates getCastInstrCost to return more accurate costs for
the cases improved in c2ed9fd054.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D113700
2022-03-11 10:27:39 +00:00
David Green 47f4cd9c3d [AArch64] Update costs for some fp16 converts
This updates the costs for FP16 converts, as some of them were pretty
high.

Differential Revision: https://reviews.llvm.org/D120771
2022-03-03 11:17:24 +00:00
David Green 65c0e45a37 [AArch64] Vector shifts cost 1
The costs of vector shifts was 2 as opposed to 1, as the nodes are
marked custom. Fix this like the others and mark the nodes as cheap.

Differential Revision: https://reviews.llvm.org/D120773
2022-03-03 10:42:57 +00:00
Sander de Smalen 0b41238ae7 [AArch64] Emit TBAA metadata for SVE load/store intrinsics
In Clang we can attach TBAA metadata based on the load/store intrinsics
based on the operation's element type.

This also contains changes to InstCombine where the AArch64-specific
intrinsics are transformed into generic LLVM load/store operations,
to ensure that all metadata is transferred to the new instruction.

There will be some further work after this patch to also emit TBAA
metadata for SVE's gather/scatter- and struct load/store intrinsics.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D119319
2022-02-11 09:00:29 +00:00
Nikita Popov 3196ef8ee2 [AArch64TargetTransformInfo] Avoid pointer element type access
Use the element type of the gathered/scattered vector instead.
2022-02-08 15:18:18 +01:00
Florian Hahn 17ebd68ae6
[AArch64] Fix costs of float vector compare/selects pairs.
The current cost-model overestimates the cost of vector compares &
selects for ordered floating point compares. This patch fixes that by
extending the existing logic for integer predicates.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118256
2022-01-31 10:18:29 +00:00
Alban Bridonneau 2feddb37b4 Implement correct cost for SVE bitcasts
We have some bitcasts which we know will be simplified,
so their cost is zero.

Reviewed By: david-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D118019
2022-01-26 14:25:44 +00:00
Matt Devereau cee8b255be [AArch64][SVE] Remove Redundant aarch64.sve.convert.to.svbool
Generated code resulted in redundant aarch64.sve.convert.to.svbool
calls for AArch64 Binary Operations. Narrow the more precise operands
instead of widening the less precise operands

Differential Revision: https://reviews.llvm.org/D116730
2022-01-17 14:35:24 +00:00
David Green 61888d97f6 [AArch64] Basic demand elements for some intrinsics
A lot of neon intrinsics work lane-wise, meaning that non-demanded
elements in and not demanded out. This teaches that to
AArch64TTIImpl::simplifyDemandedVectorEltsIntrinsic for some simple
single-input truncate intrinsics, which can help remove unnecessary
instructions.

Differential Revision: https://reviews.llvm.org/D117097
2022-01-13 11:53:12 +00:00
David Sherwood ef1ca4d3e9 [AArch64] Fix incorrect use of MVT::getVectorNumElements in AArch64TTIImpl::getVectorInstrCost
If we are inserting into or extracting from a scalable vector we do
not know the number of elements at runtime, so we can only let the
index wrap for fixed-length vectors.

Tests added here:

  Analysis/CostModel/AArch64/sve-insert-extract.ll

Differential Revision: https://reviews.llvm.org/D117099
2022-01-13 09:27:14 +00:00
David Green bc615e436c [AArch64] Update addo and subo costs
Similar to D116732, this adds basic scalar sadd_with_overflow,
uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for
aarch64, which are usually quite efficiently lowered.

Differential Revision: https://reviews.llvm.org/D116734
2022-01-07 16:20:23 +00:00
David Green c65270cf96 [AArch64] Add basic umulo and smulo costs
This adds some AArch64 specific smul_with_overflow and umul_with_overflow
costs, overriding the default costs. The code generation for these mul
with overflow intrinsics is usually better than the default expansion on
AArch64. The costs come from https://godbolt.org/z/zEzYhMWqo with various
types, or llvm/test/CodeGen/AArch64/arm64-xaluo.ll.

Differential Revision: https://reviews.llvm.org/D116732
2022-01-06 17:22:47 +00:00
Matthew Devereau e00f22c1b1 [AArch64][SVE] Teach cost model that masked loads/stores are cheap
Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.
2021-12-17 15:04:45 +00:00
Matt Devereau fb47725d14 [AArch64][SVE] Instcombine SDIV to ASRD
Instcombine SDIV to ASRD when the third operand of SDIV is a power of 2

Differential Revision: https://reviews.llvm.org/D115448
2021-12-14 15:58:28 +00:00
David Sherwood 8b0448ce5d [AArch64][Analysis] Add on overhead costs for SVE gathers and scatters
This patch adds on an overhead cost for gathers and scatters, which
is a rough estimate based on performance investigations I have
performed on SVE hardware for various micro-benchmarks.

Differential Revision: https://reviews.llvm.org/D115143
2021-12-09 16:02:59 +00:00
Paul Walker 01bc67e449 [SVE][InstCombine] Support more cases where ld1/st1 can be lowered to load/store instructions.
This patch extends the "is all active predicate" check to cover
cases where the predicate is casted but in a way that doesn't
change its "all active" status.

Differential Revision: https://reviews.llvm.org/D115047
2021-12-08 11:01:33 +00:00
Igor Kirillov 08d45e6f4d [AArch64][SVEIntrinsicOpts] Fix: predicated SVE mul/fmul are not commutative
We can not swap multiplicand and multiplier because the sve intrinsics
are predicated. Imagine lanes in vectors having the following values:
         pg = 0
         multiplicand = 1 (from dup)
         multiplier = 2
The resulting value should be 1, but if we swap multiplicand and multiplier it will become 2,
which is incorrect.

Differential Revision: https://reviews.llvm.org/D114577
2021-11-26 12:41:27 +00:00
Rosie Sumpter c2441b6b89 [LoopVectorize] Add vector reduction support for fmuladd intrinsic
Enables LoopVectorize to handle reduction patterns involving the
llvm.fmuladd intrinsic.

Differential Revision: https://reviews.llvm.org/D111555
2021-11-24 08:50:04 +00:00