The scalarization overhead was set deliberately high for MVE, whilst the
codegen was new. It helps protect us against the negative ramifications
of mixing scalar and vector instructions. This decreases that,
especially for floating point where the cost of extracting/inserting
lane elements can be low. For integer the cost is still fairly high due
to the cross-register-bank copy, but is no longer n^2 in the length of
the vector.
In general, this will decrease the cost of scalarizing floats and long
integer vectors. i64 increase in cost, having a high cost before and
after this patch. For floats this allows up to start doing things like
vectorizing fdiv instructions, even if they are scalarized.
Differential Revision: https://reviews.llvm.org/D98245
This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and
MAXNUM representing fmin and fmax. It tightens up the costs, not using a
ICmp+Select cost.
Differential Revision: https://reviews.llvm.org/D96603