Our actual lowering for i1 reductions uses ctpop combined with possibly a vector negate and possibly a logic op afterwards. I believe ctpop to be low cost on all reasonable hardware.
The default costing implementation here was returning quite inconsistent costs. and/or were returning very high costs (because we seem to think moving into scalar registers is very expensive?) and others were returning lower but still too high (because of the assumed tree reduce strategy). While we should probably improve the generic costing strategy for i1 vectors, let's start by fixing the immediate problem.
Differential Revision: https://reviews.llvm.org/D127511
Apparently, update_analyze_test_checks.sh does *not* warn on conflicting CHECKs, it just silently drops those lines from the generated test. That is.. less than helpful.
The generic cost of logical or/and reductions should be cost of bitcast
<ReduxWidth x i1> to iReduxWidth + cmp eq|ne iReduxWidth.
Differential Revision: https://reviews.llvm.org/D97961