Based off Agner and AMD SoG tables, the XOP VPHADD/VPHSUB unary horizontal ops are as fast as basic arithmetic ops, not the slower SSSE3 binary horizontal add/sub ops. This also matches what the bdver2 model already lists. Noticed while investigating reduction add optimizations. |
||
|---|---|---|
| .. | ||
| AArch64 | ||
| AMDGPU | ||
| ARM | ||
| JSON/X86 | ||
| SystemZ | ||
| X86 | ||
| invalid_input_file_name.test | ||
| lit.local.cfg | ||