Matches znver1/2 numbers from AMD SoG + Agner - no additional uops for folded instructions and znver1 double pumps 256-bit vectors Matches skylake/icelake throughput numbers from Intel AoM + Agner/instlatx64 Noticed while adding fdiv CostKinds support