llvm-project

History

Sanjay Patel 05aadf885d [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181		2018-10-10 20:47:46 +00:00
..
already-vectorized.ll	…
assume.ll	…
avx1.ll	…
avx512.ll	[X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.	2018-01-20 00:26:08 +00:00
consecutive-ptr-cg-bug.ll	[SCEV] Smart range calculation for SCEVUnknown Phis	2018-03-01 06:56:48 +00:00
consecutive-ptr-uniforms.ll	…
constant-fold.ll	[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160	2018-10-08 05:46:29 +00:00
constant-vector-operand.ll	…
conversion-cost.ll	…
cost-model.ll	…
float-induction-x86.ll	…
force-ifcvt.ll	…
fp32_to_uint32-cost-model.ll	…
fp64_to_uint32-cost-model.ll	…
fp_to_sint8-cost-model.ll	…
funclet.ll	…
gather-cost.ll	…
gather-vs-interleave.ll	…
gather_scatter.ll	[LV] Fix code gen for conditionally executed loads and stores	2018-09-07 15:53:48 +00:00
gcc-examples.ll	…
illegal-parallel-loop-uniform-write.ll	…
imprecise-through-phis.ll	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)	2018-02-21 01:42:52 +00:00
int128_no_gather.ll	…
interleaving.ll	…
invariant-load-gather.ll	[LV] Fix code gen for conditionally executed loads and stores	2018-09-07 15:53:48 +00:00
invariant-store-vectorization.ll	[LV][LAA] Vectorize loop invariant values stored into loop invariant address	2018-09-25 20:57:20 +00:00
lit.local.cfg	…
masked_load_store.ll	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try	2018-10-10 20:47:46 +00:00
max-mstore.ll	…
metadata-enable.ll	[LV] Preserve inbounds on created GEPs	2018-05-01 15:35:08 +00:00
min-trip-count-switch.ll	…
mul_slm_16bit.ll	…
no-vector.ll	…
no_fpmath.ll	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	2018-05-09 02:40:45 +00:00
no_fpmath_with_hotness.ll	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	2018-05-09 02:40:45 +00:00
parallel-loops-after-reg2mem.ll	…
parallel-loops.ll	…
powof2div.ll	…
pr23997.ll	[X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)	2018-05-18 11:58:25 +00:00
pr34438.ll	…
pr35432.ll	Revert "[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs"	2018-08-27 21:41:37 +00:00
pr36524.ll	Revert r325687 (workaround for PR36032).	2018-03-22 22:04:39 +00:00
pr39160.ll	[LV] Move test for r343954 into x86 subdirectory	2018-10-09 22:40:04 +00:00
propagate-metadata.ll	…
ptr-indvar-crash.ll	…
rauw-bug.ll	…
reduction-crash.ll	…
reduction-small-size.ll	[LV] Ignore the cost of values that will not appear in the vectorized loop	2017-12-12 08:57:43 +00:00
redundant-vf2-cost.ll	Move redundant-vf2-cost.ll test to X86 directory	2018-06-15 18:46:03 +00:00
reg-usage-debug.ll	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	2018-05-09 02:40:45 +00:00
reg-usage.ll	…
register-assumption.ll	…
scatter_crash.ll	…
slm-no-vectorize.ll	…
small-size.ll	…
strided_load_cost.ll	…
struct-store.ll	…
svml-calls-finite.ll	…
svml-calls.ll	[TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML calls	2018-06-07 18:21:24 +00:00
tripcount.ll	…
uint64_to_fp64-cost-model.ll	…
uniform-phi.ll	[LV] First order recurrence phis should not be treated as uniform	2018-09-04 22:12:23 +00:00
uniform_load.ll	…
uniformshift.ll	…
unroll-pm.ll	…
unroll-small-loops.ll	…
unroll_selection.ll	…
veclib-calls.ll	…
vect.omp.force.ll	…
vect.omp.force.small-tc.ll	…
vector-scalar-select-cost.ll	…
vector_max_bandwidth.ll	NFC - Various typo fixes in tests	2018-07-04 13:28:39 +00:00
vector_ptr_load_store.ll	…
vectorization-remarks-loopid-dbg.ll	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	2018-05-09 02:40:45 +00:00
vectorization-remarks-missed.ll	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	2018-05-09 02:40:45 +00:00
vectorization-remarks-profitable.ll	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	2018-05-09 02:40:45 +00:00
vectorization-remarks.ll	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	2018-05-09 02:40:45 +00:00
vectorize-only-for-real.ll	…
x86-pr39099.ll	[IAI,LV] Avoid creating interleave-groups for predicated accesse	2018-10-07 06:57:25 +00:00
x86-predication.ll	…
x86_fp80-vector-store.ll	…