We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4. Differential Revision: https://reviews.llvm.org/D73646 |
||
|---|---|---|
| .. | ||
| AArch64 | ||
| AMDGPU | ||
| ARM | ||
| PowerPC | ||
| RISCV | ||
| SystemZ | ||
| X86 | ||
| no_info.ll | ||