Summary of changes: - Remove dst for global_atomic_add_f32, global_atomic_pk_add_f16. - Make vdata input-only for buffer_atomic_add_f32, buffer_atomic_pk_add_f16. - Other minor improvements.