This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).
Differential Revision: https://reviews.llvm.org/D90692
This is a modified 2nd try of 22d10b8ab4
(reverted by 1c8371692d because it managed
to expose an existing crashing bug that should be fixed by
74ffc823 ).
Original commit message:
This is similar in spirit to 01ea93d85d (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.
That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.
Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.
Targets should provide better overrides if the current modeling
is not accurate.
This is similar in spirit to 01ea93d85d (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.
That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.
Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.
Targets should provide better overrides if the current modeling
is not accurate.
As with other targets, set the throughput cost of control-flow
instructions to free so that we don't miss out of vectorization
opportunities.
Differential Revision: https://reviews.llvm.org/D85283
Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.
The cost model test changes now reflect that ret instructions are not
generally free.
Differential Revision: https://reviews.llvm.org/D79164