This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.
In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.
Differential Revision: https://reviews.llvm.org/D106239
This adds some missing single source shuffle costs for AArch64, of i16
and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3
coming from the perfect shuffle tables. The larger vector sizes expand
into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily
chose 8 for the cost to be expensive but not too expensive.
Differential Revision: https://reviews.llvm.org/D106241
This changes the cost to (LT.first-1) * cost(add) + 2, where the cost of
an add is assumed to be 1. This brings it inline with the other
reductions.
Differential Revision: https://reviews.llvm.org/D106240
This patch uses the mechanism from D62995 to strengthen the
definitions of the reduction intrinsics by letting the scalar
result/accumulator type be overloaded from the vector element type.
For example:
; The LLVM LangRef specifies that the scalar result must equal the
; vector element type, but this is not checked/enforced by LLVM.
declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)
This patch changes that into:
declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a)
Which has the type-constraint more explicit and causes LLVM to check
the result type with the vector element type.
Reviewers: RKSimon, arsenm, rnk, greened, aemerson
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D62996
llvm-svn: 363240
Summary: The comment says we need 3 extracts and a select at the end. But didn't we just account for the select in the vector cost above. Aren't we just extracting the single element after taking the min/max in the vector register?
Reviewers: RKSimon, spatel, ABataev
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D55480
llvm-svn: 348739
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.
For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.
Fixes PR37731
Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997
Differential Revision: https://reviews.llvm.org/D54585
llvm-svn: 348076
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.
llvm-svn: 347541
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.
For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.
Fixes PR37731
Differential Revision: https://reviews.llvm.org/D54585
llvm-svn: 346970
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.
Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!
I've done my best to fix a number of vectorizer uses:
SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.
LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.
Differential Revision: https://reviews.llvm.org/D53573
llvm-svn: 345617
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.
Differential Revision: https://reviews.llvm.org/D44490
llvm-svn: 327702
This patch considers the experimental vector reduce intrinsics in the default
implementation of getIntrinsicInstrCost. The cost of these intrinsics is
computed with getArithmeticReductionCost and getMinMaxReductionCost. This patch
also adds a test case for AArch64 that indicates the costs we currently compute
for vector reduce intrinsics. These costs are inaccurate and will be updated in
a follow-on patch.
Differential Revision: https://reviews.llvm.org/D44489
llvm-svn: 327698