The getOrderedReductionCost implementation introduced in D105432 calls the CRTP base version getArithmeticInstrCost instead of the redirecting to the target version.
Differential Revision: https://reviews.llvm.org/D106795
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:
1. Tree-wise. This is the typical fast-math reduction that involves
continually splitting a vector up into halves and adding each
half together until we get a scalar result. This is the default
behaviour for integers, whereas for floating point we only do this
if reassociation is allowed.
2. Ordered. This now allows us to estimate the cost of performing
a strict vector reduction by treating it as a series of scalar
operations in lane order. This is the case when FP reassociation
is not permitted. For scalable vectors this is more difficult
because at compile time we do not know how many lanes there are,
and so we use the worst case maximum vscale value.
I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.
New tests have been added here:
Analysis/CostModel/AArch64/reduce-fadd.ll
Analysis/CostModel/AArch64/sve-intrinsics.ll
Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll
Differential Revision: https://reviews.llvm.org/D105432
I'm not sure if/how this ever worked, but it must not be tested
currently because the basic tests added here were crashing as
noted in the post-review comments for 1c83716 (which reverted
another cost-model fix in 22d10b8ab4).