llvm-project

Commit Graph

Author	SHA1	Message	Date
Phoebe Wang	02fe96b240	[X86][FP16] Do not split FP64->FP16 to FP64->FP32->FP16 Truncation from double to half is not always identical to truncating to float first and then to half. https://godbolt.org/z/56s9517hd On the other hand, expanding to float and then to double is always identical to expanding to double directly. https://godbolt.org/z/Ye8vbYPnY Reviewed By: RKSimon, skan Differential Revision: https://reviews.llvm.org/D130151	2022-07-22 08:36:05 +08:00
Bing1 Yu	e01bf5a3e2	[X86] Promote v32f16's fadd into v32f32's fadd when it is avx512 without avx512fp16 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D130059	2022-07-19 14:37:50 +08:00
Benjamin Kramer	9234a7c0df	[X86][FP16] Don't crash when lowering SELECT on fp16 vectors This is a regression from `f187948162`	2022-07-18 13:41:00 +02:00
Phoebe Wang	f187948162	[X86][FP16] Enable vector support for FP16 emulation This is follow up of D107082, which enable vector support according to psABI. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D127982	2022-07-16 09:38:58 +08:00
Simon Pilgrim	66bfd1ba8c	[X86] Move isInRange(ArrayRef<int>) inside assert to fix NDEBUG builds. NFC. Fix unused static function warning introduced by D129207	2022-07-12 21:51:07 +01:00
Xiang1 Zhang	a45dd3d814	[X86] Support -mstack-protector-guard-symbol Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129346	2022-07-12 10:17:00 +08:00
Xiang1 Zhang	643786213b	Revert "[X86] Support -mstack-protector-guard-symbol" This reverts commit `efbaad1c4a`. due to miss adding review info.	2022-07-12 10:14:32 +08:00
Xiang1 Zhang	efbaad1c4a	[X86] Support -mstack-protector-guard-symbol	2022-07-12 10:13:48 +08:00
Simon Pilgrim	97868fb972	[X86] isTargetShuffleEquivalent - attempt to match SM_SentinelZero shuffle mask elements using known bits If the combined shuffle mask requires zero elements, we don't currently have much chance of matching them against the expected source vector. This patch uses the SelectionDAG::MaskedVectorIsZero wrapper to attempt to determine if the expected lement we want to use is already known to be zero. I've also tightened up the ExpectedMask assertion to always be in range - we're never giving it a target shuffle mask that has sentinels at all - allowing to remove some of the confusing bounds checks. This attempts to address some of the regressions uncovered by D129150 where we more aggressively fold shuffles as AND / 'clear' masks which results in more combined shuffles using SM_SentinelZero. Differential Revision: https://reviews.llvm.org/D129207	2022-07-11 15:29:44 +01:00
Phoebe Wang	8fb083d33e	[X86][FP16] Add constrained FP support for scalar emulation This is a follow up patch to support constrained FP in FP16 emulation. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D128114	2022-07-08 20:33:42 +08:00
Phoebe Wang	6c535f9f1b	[X86][FP16] Fix crash when lowering copysign for f16 This is to address the assertion fail reported in https://reviews.llvm.org/D107082#3635612 Not sure if it is a problem of promoting FCOPYSIGN + libcall FP_ROUND. The promoting will set the rounding mode to 1 `a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4810-L4814)` While libcall cannot handle the rounding mode equals to 1 `a442c62888/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp (L4324-L4328)` So changing the action to Expand to workaround the problem. Reviewed By: clementval, MaskRay Differential Revision: https://reviews.llvm.org/D129294	2022-07-07 19:17:26 -07:00
Simon Pilgrim	fbb51ac0ba	[X86] LowerShift - lower some shuffles directly to X86ISD::PSHUFLW nodes. These are expected to lower to X86ISD::PSHUFLW but we were seeing some regressions in D129150 because it'd managed to exploit the masking of the shift amounts to create unintended clear masks instead.	2022-07-06 18:01:03 +01:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Paul Robinson	08e4fe6c61	[X86] Add RDPRU instruction Add support for the RDPRU instruction on Zen2 processors. User-facing features: - Clang option -m[no-]rdpru to enable/disable the feature - Support is implicit for znver2/znver3 processors - Preprocessor symbol __RDPRU__ to indicate support - Header rdpruintrin.h to define intrinsics - "rdpru" mnemonic supported for assembler code Internal features: - Clang builtin __builtin_ia32_rdpru - IR intrinsic @llvm.x86.rdpru Differential Revision: https://reviews.llvm.org/D128934	2022-07-06 07:17:47 -07:00
Craig Topper	2bfca35614	[X86] Disable combineVectorSizedSetCCEquality for soft float. The vector types aren't legal with soft float. Also disable under NoImplicitFloat for good measure. Fixes PR56351. Differential Revision: https://reviews.llvm.org/D129060	2022-07-04 08:33:30 -07:00
Simon Pilgrim	26708fa166	Revert rG057db2002bb3: [X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc. Reverted due to reports from @alexfh about it causing an infinite loop (repro still pending).	2022-07-01 10:36:09 +01:00
Simon Pilgrim	e961e05d59	[SLP][X86] Add 32-bit vector stores to help vectorization opportunities Building on the work on D124284, this patch tags v4i8 and v2i16 vector loads as custom, enabling SLP to try to vectorize these types ending in a partial store (using the SSE MOVD instruction) - we already do something similar for 64-bit vector types. Differential Revision: https://reviews.llvm.org/D127604	2022-06-30 20:25:50 +01:00
Craig Topper	3706bdad4a	[X86] Remove unnecessary COPY from EmitLoweredCascadedSelect. I believe we already checked that the destination of the first CMOV is only used by the second CMOV so I don't think there is any reason we need the PHI to write the register that was used by the first CMOV. We can directly use the second CMOV destination and avoid the copy. This may be a left over from when the cascaded select handling was part of the main algorithm before it was refactored in D35685. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D128124	2022-06-28 09:33:33 -07:00
Simon Pilgrim	0b998053db	[X86] combineConcatVectorOps - IsConcatFree must check extraction index Identified in the regression reported by @alexfh on rGb5d7beeb9792 - IsConcatFree wasn't ensuring the subvector extraction index matched the position it would be concatenated back into.	2022-06-27 11:46:49 +01:00
Simon Pilgrim	ac4cb1775b	[X86] fold (and (mul x, c1), c2) -> (mul x, (and c1, c2)) iff c2 is all/no bits mask Noticed on D128216 - if we're zeroing out vector elements of a mul/mulh result then see if we can merge the and-mask into the mul by just multiplying by zero. Ideally we'd make this generic (similar to the existing foldSelectWithIdentityConstant?), but these cases are appearing very late, after the constants have been lowered to constant-pool loads.	2022-06-21 15:10:43 +01:00
Simon Pilgrim	057db2002b	[X86] combineAndnp - constant fold ANDNP(C,X) -> AND(~C,X) If the LHS op has a single use then using the more general AND op is likely to allow commutation, load folding, generic folds etc.	2022-06-21 12:31:01 +01:00
Simon Pilgrim	843d43e62a	[X86] computeKnownBitsForTargetNode - add X86ISD::VBROADCAST_LOAD handling This requires us to override the isTargetCanonicalConstantNode callback introduced in D128144, so we can recognise the various cases where a VBROADCAST_LOAD constant is being reused at different vector widths to prevent infinite loops.	2022-06-21 11:48:01 +01:00
Simon Pilgrim	8254966062	[X86] LowerINSERT_VECTOR_ELT - always lower v32i8/v16i16 allones insertions on AVX1 as OR ops v32i8/v16i16 blend shuffles on AVX1 will expand to OR(AND,ANDN) patterns which can be easily broken by other combines	2022-06-20 18:43:03 +01:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Simon Pilgrim	ba3f2667b6	[DAG] Add MaskedVectorIsZero helper Equivalent to MaskedValueIsZero, except its checking if all of the demanded vectors elements are known to be zero	2022-06-19 17:56:30 +01:00
Simon Pilgrim	41455dd1dc	[X86] Remove isTargetShuffleSplat and just use SelectionDAG::isSplatValue shuffle(splat(x)) -> splat(x), it doesn't have to be a target specific broadcast	2022-06-19 11:22:57 +01:00
Simon Pilgrim	ac3f967382	[X86] canonicalizeShuffleWithBinOps - merge shuffles across binops if either source op is a known splat The shuffle of a splat (with no undefs) should always be removed	2022-06-18 17:14:00 +01:00
Simon Pilgrim	f42f2b7005	[X86] canonicalizeShuffleWithBinOps - merge unary shuffles across binops if either source op is a foldable load This mostly handles folding of constants that have already become loads, but we expose some generic load cases as well. This also exposes the chance to merge unary shuffles across X86ISD::ANDNP nodes with different scalar widths	2022-06-18 15:58:54 +01:00
Simon Pilgrim	3c9123af9f	[X86] isShuffleFoldableLoad - ensure the load has one use. We'll only fold the load if has one use. Makes no difference to existing tests but will be necessary for an upcoming patch to improve load folding as part of canonicalizeShuffleWithBinOps.	2022-06-18 14:51:55 +01:00
Phoebe Wang	655ba9c8a1	Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This resolves problems reported in commit `1a20252978`. 1. Promote to float lowering for nodes XINT_TO_FP 2. Bail out f16 from shuffle combine due to vector type is not legal in the version	2022-06-17 21:34:05 +08:00
Benjamin Kramer	1a20252978	Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This reverts commit `04a3d5f3a1`. I see two more issues: - uitofp/sitofp from i32/i64 to half now generates __floatsihf/__floatdihf, which exists in neither compiler-rt nor libgcc - This crashes when legalizing the bitcast: ``` ; RUN: llc < %s -mcpu=skx define void @main.45(ptr nocapture readnone %retval, ptr noalias nocapture readnone %run_options, ptr noalias nocapture readnone %params, ptr noalias nocapture readonly %buffer_table, ptr noalias nocapture readnone %status, ptr noalias nocapture readnone %prof_counters) local_unnamed_addr { entry: %fusion = load ptr, ptr %buffer_table, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_1.2 = load ptr, ptr %0, align 8 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 2 %Arg_0.1 = load ptr, ptr %1, align 8 %2 = load half, ptr %Arg_0.1, align 8 %3 = bitcast half %2 to i16 %4 = and i16 %3, 32767 %5 = icmp eq i16 %4, 0 %6 = and i16 %3, -32768 %broadcast.splatinsert = insertelement <4 x half> poison, half %2, i64 0 %broadcast.splat = shufflevector <4 x half> %broadcast.splatinsert, <4 x half> poison, <4 x i32> zeroinitializer %broadcast.splatinsert9 = insertelement <4 x i16> poison, i16 %4, i64 0 %broadcast.splat10 = shufflevector <4 x i16> %broadcast.splatinsert9, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert11 = insertelement <4 x i16> poison, i16 %6, i64 0 %broadcast.splat12 = shufflevector <4 x i16> %broadcast.splatinsert11, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert13 = insertelement <4 x i16> poison, i16 %3, i64 0 %broadcast.splat14 = shufflevector <4 x i16> %broadcast.splatinsert13, <4 x i16> poison, <4 x i32> zeroinitializer %wide.load = load <4 x half>, ptr %Arg_1.2, align 8 %7 = fcmp uno <4 x half> %broadcast.splat, %wide.load %8 = fcmp oeq <4 x half> %broadcast.splat, %wide.load %9 = bitcast <4 x half> %wide.load to <4 x i16> %10 = and <4 x i16> %9, <i16 32767, i16 32767, i16 32767, i16 32767> %11 = icmp eq <4 x i16> %10, zeroinitializer %12 = and <4 x i16> %9, <i16 -32768, i16 -32768, i16 -32768, i16 -32768> %13 = or <4 x i16> %12, <i16 1, i16 1, i16 1, i16 1> %14 = select <4 x i1> %11, <4 x i16> %9, <4 x i16> %13 %15 = icmp ugt <4 x i16> %broadcast.splat10, %10 %16 = icmp ne <4 x i16> %broadcast.splat12, %12 %17 = or <4 x i1> %15, %16 %18 = select <4 x i1> %17, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1> %19 = add <4 x i16> %18, %broadcast.splat14 %20 = select i1 %5, <4 x i16> %14, <4 x i16> %19 %21 = select <4 x i1> %8, <4 x i16> %9, <4 x i16> %20 %22 = bitcast <4 x i16> %21 to <4 x half> %23 = select <4 x i1> %7, <4 x half> <half 0xH7E00, half 0xH7E00, half 0xH7E00, half 0xH7E00>, <4 x half> %22 store <4 x half> %23, ptr %fusion, align 16 ret void } ``` llc: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:977: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode ): Assertion `(TLI.getTypeAction(DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal \|\| Op.getOpcode() == ISD::TargetConstant \|\| Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.	2022-06-17 09:43:07 +02:00
Phoebe Wang	04a3d5f3a1	Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" Fix the crash on lowering X86ISD::FCMP.	2022-06-17 12:12:17 +08:00
Paul Robinson	ff0122dcce	[PS5] Emit ud2 for ubsan trap	2022-06-16 11:20:10 -07:00
Frederik Gossen	3cd5696a33	Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" This reverts commit `e1c5afa47d`. This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is below. Let me know if you have trouble reproducing this. ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\00\00\00?" @1 = private unnamed_addr constant [4 x i8] c"\1C}\908" @2 = private unnamed_addr constant [4 x i8] c"?\00\\4" @3 = private unnamed_addr constant [4 x i8] c"%ci1" @4 = private unnamed_addr constant [4 x i8] zeroinitializer @5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0" @6 = private unnamed_addr constant [4 x i8] c"\00\00\00B" @7 = private unnamed_addr constant [4 x i8] c"\94\B4\C22" @8 = private unnamed_addr constant [4 x i8] c"^\09B6" @9 = private unnamed_addr constant [4 x i8] c"\15\F3M?" @10 = private unnamed_addr constant [4 x i8] c"e\CC\\;" @11 = private unnamed_addr constant [4 x i8] c"d\BD/>" @12 = private unnamed_addr constant [4 x i8] c"V\F4I=" @13 = private unnamed_addr constant [4 x i8] c"\10\CB,<" @14 = private unnamed_addr constant [4 x i8] c"\AC\E3\D6:" @15 = private unnamed_addr constant [4 x i8] c"\DC\A8E9" @16 = private unnamed_addr constant [4 x i8] c"\C6\FA\897" @17 = private unnamed_addr constant [4 x i8] c"%\F9\955" @18 = private unnamed_addr constant [4 x i8] c"\B5\DB\813" @19 = private unnamed_addr constant [4 x i8] c"\B4W_\B2" @20 = private unnamed_addr constant [4 x i8] c"\1Cc\8F\B4" @21 = private unnamed_addr constant [4 x i8] c"~3\94\B6" @22 = private unnamed_addr constant [4 x i8] c"3Yq\B8" @23 = private unnamed_addr constant [4 x i8] c"\E9\17\17\BA" @24 = private unnamed_addr constant [4 x i8] c"\F1\B2\8D\BB" @25 = private unnamed_addr constant [4 x i8] c"\F8t\C2\BC" @26 = private unnamed_addr constant [4 x i8] c"\82[\C2\BD" @27 = private unnamed_addr constant [4 x i8] c"uB-?" @28 = private unnamed_addr constant [4 x i8] c"^\FF\9B\BE" @29 = private unnamed_addr constant [4 x i8] c"\00\00\00A" ; Function Attrs: uwtable define void @main.158(ptr %retval, ptr noalias %run_options, ptr noalias %params, ptr noalias %buffer_table, ptr noalias %status, ptr noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.1 = alloca i64, align 8 %fusion.invar_address.dim.0 = alloca i64, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_0.1 = load ptr, ptr %0, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 0 %fusion = load ptr, ptr %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 store i64 0, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 return: ; preds = %fusion.loop_exit.dim.0 ret void fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %entry %fusion.indvar.dim.0 = load i64, ptr %fusion.invar_address.dim.0, align 8 %2 = icmp uge i64 %fusion.indvar.dim.0, 3 br i1 %2, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_body.dim.1, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, ptr %fusion.invar_address.dim.1, align 8 %3 = icmp uge i64 %fusion.indvar.dim.1, 1 br i1 %3, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 %4 = getelementptr inbounds [3 x [1 x half]], ptr %Arg_0.1, i64 0, i64 %fusion.indvar.dim.0, i64 0 %5 = load half, ptr %4, align 2, !invariant.load !0, !noalias !3 %6 = fpext half %5 to float %7 = call float @llvm.fabs.f32(float %6) %constant.121 = load float, ptr @29, align 4 %compare.2 = fcmp ole float %7, %constant.121 %8 = zext i1 %compare.2 to i8 %constant.120 = load float, ptr @0, align 4 %multiply.95 = fmul float %7, %constant.120 %constant.119 = load float, ptr @5, align 4 %add.82 = fadd float %multiply.95, %constant.119 %constant.118 = load float, ptr @4, align 4 %multiply.94 = fmul float %add.82, %constant.118 %constant.117 = load float, ptr @19, align 4 %add.81 = fadd float %multiply.94, %constant.117 %multiply.92 = fmul float %add.82, %add.81 %constant.116 = load float, ptr @18, align 4 %add.79 = fadd float %multiply.92, %constant.116 %multiply.91 = fmul float %add.82, %add.79 %subtract.87 = fsub float %multiply.91, %add.81 %constant.115 = load float, ptr @20, align 4 %add.78 = fadd float %subtract.87, %constant.115 %multiply.89 = fmul float %add.82, %add.78 %subtract.86 = fsub float %multiply.89, %add.79 %constant.114 = load float, ptr @17, align 4 %add.76 = fadd float %subtract.86, %constant.114 %multiply.88 = fmul float %add.82, %add.76 %subtract.84 = fsub float %multiply.88, %add.78 %constant.113 = load float, ptr @21, align 4 %add.75 = fadd float %subtract.84, %constant.113 %multiply.86 = fmul float %add.82, %add.75 %subtract.83 = fsub float %multiply.86, %add.76 %constant.112 = load float, ptr @16, align 4 %add.73 = fadd float %subtract.83, %constant.112 %multiply.85 = fmul float %add.82, %add.73 %subtract.81 = fsub float %multiply.85, %add.75 %constant.111 = load float, ptr @22, align 4 %add.72 = fadd float %subtract.81, %constant.111 %multiply.83 = fmul float %add.82, %add.72 %subtract.80 = fsub float %multiply.83, %add.73 %constant.110 = load float, ptr @15, align 4 %add.70 = fadd float %subtract.80, %constant.110 %multiply.82 = fmul float %add.82, %add.70 %subtract.78 = fsub float %multiply.82, %add.72 %constant.109 = load float, ptr @23, align 4 %add.69 = fadd float %subtract.78, %constant.109 %multiply.80 = fmul float %add.82, %add.69 %subtract.77 = fsub float %multiply.80, %add.70 %constant.108 = load float, ptr @14, align 4 %add.68 = fadd float %subtract.77, %constant.108 %multiply.79 = fmul float %add.82, %add.68 %subtract.75 = fsub float %multiply.79, %add.69 %constant.107 = load float, ptr @24, align 4 %add.67 = fadd float %subtract.75, %constant.107 %multiply.77 = fmul float %add.82, %add.67 %subtract.74 = fsub float %multiply.77, %add.68 %constant.106 = load float, ptr @13, align 4 %add.66 = fadd float %subtract.74, %constant.106 %multiply.76 = fmul float %add.82, %add.66 %subtract.72 = fsub float %multiply.76, %add.67 %constant.105 = load float, ptr @25, align 4 %add.65 = fadd float %subtract.72, %constant.105 %multiply.74 = fmul float %add.82, %add.65 %subtract.71 = fsub float %multiply.74, %add.66 %constant.104 = load float, ptr @12, align 4 %add.64 = fadd float %subtract.71, %constant.104 %multiply.73 = fmul float %add.82, %add.64 %subtract.69 = fsub float %multiply.73, %add.65 %constant.103 = load float, ptr @26, align 4 %add.63 = fadd float %subtract.69, %constant.103 %multiply.71 = fmul float %add.82, %add.63 %subtract.67 = fsub float %multiply.71, %add.64 %constant.102 = load float, ptr @11, align 4 %add.62 = fadd float %subtract.67, %constant.102 %multiply.70 = fmul float %add.82, %add.62 %subtract.66 = fsub float %multiply.70, %add.63 %constant.101 = load float, ptr @28, align 4 %add.61 = fadd float %subtract.66, %constant.101 %multiply.68 = fmul float %add.82, %add.61 %subtract.65 = fsub float %multiply.68, %add.62 %constant.100 = load float, ptr @27, align 4 %add.60 = fadd float %subtract.65, %constant.100 %subtract.64 = fsub float %add.60, %add.62 %multiply.66 = fmul float %subtract.64, %constant.120 %constant.99 = load float, ptr @6, align 4 %divide.4 = fdiv float %constant.99, %7 %add.59 = fadd float %divide.4, %constant.119 %multiply.65 = fmul float %add.59, %constant.118 %constant.98 = load float, ptr @3, align 4 %add.58 = fadd float %multiply.65, %constant.98 %multiply.64 = fmul float %add.59, %add.58 %constant.97 = load float, ptr @7, align 4 %add.57 = fadd float %multiply.64, %constant.97 %multiply.63 = fmul float %add.59, %add.57 %subtract.63 = fsub float %multiply.63, %add.58 %constant.96 = load float, ptr @2, align 4 %add.56 = fadd float %subtract.63, %constant.96 %multiply.62 = fmul float %add.59, %add.56 %subtract.62 = fsub float %multiply.62, %add.57 %constant.95 = load float, ptr @8, align 4 %add.55 = fadd float %subtract.62, %constant.95 %multiply.61 = fmul float %add.59, %add.55 %subtract.61 = fsub float %multiply.61, %add.56 %constant.94 = load float, ptr @1, align 4 %add.54 = fadd float %subtract.61, %constant.94 %multiply.60 = fmul float %add.59, %add.54 %subtract.60 = fsub float %multiply.60, %add.55 %constant.93 = load float, ptr @10, align 4 %add.53 = fadd float %subtract.60, %constant.93 %multiply.59 = fmul float %add.59, %add.53 %subtract.59 = fsub float %multiply.59, %add.54 %constant.92 = load float, ptr @9, align 4 %add.52 = fadd float %subtract.59, %constant.92 %subtract.58 = fsub float %add.52, %add.54 %multiply.58 = fmul float %subtract.58, %constant.120 %9 = call float @llvm.sqrt.f32(float %7) %10 = fdiv float 1.000000e+00, %9 %multiply.57 = fmul float %multiply.58, %10 %11 = trunc i8 %8 to i1 %12 = select i1 %11, float %multiply.66, float %multiply.57 %13 = fptrunc float %12 to half %14 = getelementptr inbounds [3 x [1 x half]], ptr %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 0 store half %13, ptr %14, align 2, !alias.scope !3 %invar.inc1 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc1, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 br label %return } ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.fabs.f32(float %0) #1 ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.sqrt.f32(float %0) #1 attributes #0 = { uwtable "denormal-fp-math"="preserve-sign" "no-frame-pointer-elim"="false" } attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 6} !2 = !{i64 8} !3 = !{!4} !4 = !{!"buffer: {index:0, offset:0, size:6}", !5} !5 = !{!"XLA global AA domain"}	2022-06-15 18:04:42 -04:00
Phoebe Wang	e1c5afa47d	Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"" Fixed the missing SQRT promotion. Adding several missing operations too.	2022-06-15 23:00:18 +08:00
Thomas Joerg	37455b1f71	Revert "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"" This reverts commit `6e02e27536`. This introduces a crash in the backend. Reproducer in MLIR's LLVM dialect follows. Let me know if you have trouble reproducing this. module { llvm.func @malloc(i64) -> !llvm.ptr<i8> llvm.func @_mlir_ciface_tf_report_error(!llvm.ptr<i8>, i32, !llvm.ptr<i8>) llvm.mlir.global internal constant @error_message_2208944672953921889("failed to allocate memory at loc(\22-\22:3:8)\00") llvm.func @_mlir_ciface_tf_alloc(!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> llvm.func @Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<i8>, %arg1: i64, %arg2: !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.mlir.constant(8 : i32) : i32 %1 = llvm.mlir.constant(8 : index) : i64 %2 = llvm.mlir.constant(2 : index) : i64 %3 = llvm.mlir.constant(dense<0.000000e+00> : vector<4xf16>) : vector<4xf16> %4 = llvm.mlir.constant(dense<[0, 1, 2, 3]> : vector<4xi32>) : vector<4xi32> %5 = llvm.mlir.constant(dense<1.000000e+00> : vector<4xf16>) : vector<4xf16> %6 = llvm.mlir.constant(false) : i1 %7 = llvm.mlir.constant(1 : i32) : i32 %8 = llvm.mlir.constant(0 : i32) : i32 %9 = llvm.mlir.constant(4 : index) : i64 %10 = llvm.mlir.constant(0 : index) : i64 %11 = llvm.mlir.constant(1 : index) : i64 %12 = llvm.mlir.constant(-1 : index) : i64 %13 = llvm.mlir.null : !llvm.ptr<f16> %14 = llvm.getelementptr %13[%9] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %15 = llvm.ptrtoint %14 : !llvm.ptr<f16> to i64 %16 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %17 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %18 = llvm.mlir.null : !llvm.ptr<i64> %19 = llvm.getelementptr %18[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %20 = llvm.ptrtoint %19 : !llvm.ptr<i64> to i64 %21 = llvm.alloca %20 x i64 : (i64) -> !llvm.ptr<i64> llvm.br ^bb1(%10 : i64) ^bb1(%22: i64): // 2 preds: ^bb0, ^bb2 %23 = llvm.icmp "slt" %22, %arg1 : i64 llvm.cond_br %23, ^bb2, ^bb3 ^bb2: // pred: ^bb1 %24 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %25 = llvm.getelementptr %24[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %26 = llvm.add %22, %11 : i64 %27 = llvm.getelementptr %25[%26] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %28 = llvm.load %27 : !llvm.ptr<i64> %29 = llvm.getelementptr %21[%22] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %28, %29 : !llvm.ptr<i64> llvm.br ^bb1(%26 : i64) ^bb3: // pred: ^bb1 llvm.br ^bb4(%10, %11 : i64, i64) ^bb4(%30: i64, %31: i64): // 2 preds: ^bb3, ^bb5 %32 = llvm.icmp "slt" %30, %arg1 : i64 llvm.cond_br %32, ^bb5, ^bb6 ^bb5: // pred: ^bb4 %33 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %34 = llvm.getelementptr %33[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %35 = llvm.add %30, %11 : i64 %36 = llvm.getelementptr %34[%35] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %37 = llvm.load %36 : !llvm.ptr<i64> %38 = llvm.mul %37, %31 : i64 llvm.br ^bb4(%35, %38 : i64, i64) ^bb6: // pred: ^bb4 %39 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> %40 = llvm.getelementptr %39[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %41 = llvm.load %40 : !llvm.ptr<ptr<f16>> %42 = llvm.getelementptr %13[%11] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %43 = llvm.ptrtoint %42 : !llvm.ptr<f16> to i64 %44 = llvm.alloca %7 x i32 : (i32) -> !llvm.ptr<i32> llvm.store %8, %44 : !llvm.ptr<i32> %45 = llvm.call @_mlir_ciface_tf_alloc(%arg0, %31, %43, %8, %7, %44) : (!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> %46 = llvm.bitcast %45 : !llvm.ptr<i8> to !llvm.ptr<f16> %47 = llvm.icmp "eq" %31, %10 : i64 %48 = llvm.or %6, %47 : i1 %49 = llvm.mlir.null : !llvm.ptr<i8> %50 = llvm.icmp "ne" %45, %49 : !llvm.ptr<i8> %51 = llvm.or %50, %48 : i1 llvm.cond_br %51, ^bb7, ^bb13 ^bb7: // pred: ^bb6 %52 = llvm.urem %31, %9 : i64 %53 = llvm.sub %31, %52 : i64 llvm.br ^bb8(%10 : i64) ^bb8(%54: i64): // 2 preds: ^bb7, ^bb9 %55 = llvm.icmp "slt" %54, %53 : i64 llvm.cond_br %55, ^bb9, ^bb10 ^bb9: // pred: ^bb8 %56 = llvm.mul %54, %11 : i64 %57 = llvm.add %56, %10 : i64 %58 = llvm.add %57, %10 : i64 %59 = llvm.getelementptr %41[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %60 = llvm.bitcast %59 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %61 = llvm.load %60 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %62 = "llvm.intr.sqrt"(%61) : (vector<4xf16>) -> vector<4xf16> %63 = llvm.fdiv %5, %62 : vector<4xf16> %64 = llvm.getelementptr %46[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %65 = llvm.bitcast %64 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %63, %65 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %66 = llvm.add %54, %9 : i64 llvm.br ^bb8(%66 : i64) ^bb10: // pred: ^bb8 %67 = llvm.icmp "ult" %53, %31 : i64 llvm.cond_br %67, ^bb11, ^bb12 ^bb11: // pred: ^bb10 %68 = llvm.mul %53, %12 : i64 %69 = llvm.add %31, %68 : i64 %70 = llvm.mul %53, %11 : i64 %71 = llvm.add %70, %10 : i64 %72 = llvm.trunc %69 : i64 to i32 %73 = llvm.mlir.undef : vector<4xi32> %74 = llvm.insertelement %72, %73[%8 : i32] : vector<4xi32> %75 = llvm.shufflevector %74, %73 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xi32>, vector<4xi32> %76 = llvm.icmp "slt" %4, %75 : vector<4xi32> %77 = llvm.add %71, %10 : i64 %78 = llvm.getelementptr %41[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %79 = llvm.bitcast %78 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %80 = llvm.intr.masked.load %79, %76, %3 {alignment = 2 : i32} : (!llvm.ptr<vector<4xf16>>, vector<4xi1>, vector<4xf16>) -> vector<4xf16> %81 = llvm.bitcast %16 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %80, %81 : !llvm.ptr<vector<4xf16>> %82 = llvm.load %81 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %83 = "llvm.intr.sqrt"(%82) : (vector<4xf16>) -> vector<4xf16> %84 = llvm.fdiv %5, %83 : vector<4xf16> %85 = llvm.bitcast %17 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %84, %85 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %86 = llvm.load %85 : !llvm.ptr<vector<4xf16>> %87 = llvm.getelementptr %46[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %88 = llvm.bitcast %87 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.intr.masked.store %86, %88, %76 {alignment = 2 : i32} : vector<4xf16>, vector<4xi1> into !llvm.ptr<vector<4xf16>> llvm.br ^bb12 ^bb12: // 2 preds: ^bb10, ^bb11 %89 = llvm.mul %2, %1 : i64 %90 = llvm.mul %arg1, %2 : i64 %91 = llvm.add %90, %11 : i64 %92 = llvm.mul %91, %1 : i64 %93 = llvm.add %89, %92 : i64 %94 = llvm.alloca %93 x i8 : (i64) -> !llvm.ptr<i8> %95 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %46, %95 : !llvm.ptr<ptr<f16>> %96 = llvm.getelementptr %95[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %46, %96 : !llvm.ptr<ptr<f16>> %97 = llvm.getelementptr %95[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %98 = llvm.bitcast %97 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %98 : !llvm.ptr<i64> %99 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>> %100 = llvm.getelementptr %99[%10, 3] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>>, i64) -> !llvm.ptr<i64> %101 = llvm.getelementptr %100[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %102 = llvm.sub %arg1, %11 : i64 llvm.br ^bb14(%102, %11 : i64, i64) ^bb13: // pred: ^bb6 %103 = llvm.mlir.addressof @error_message_2208944672953921889 : !llvm.ptr<array<42 x i8>> %104 = llvm.getelementptr %103[%10, %10] : (!llvm.ptr<array<42 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @_mlir_ciface_tf_report_error(%arg0, %0, %104) : (!llvm.ptr<i8>, i32, !llvm.ptr<i8>) -> () %105 = llvm.mul %2, %1 : i64 %106 = llvm.mul %2, %10 : i64 %107 = llvm.add %106, %11 : i64 %108 = llvm.mul %107, %1 : i64 %109 = llvm.add %105, %108 : i64 %110 = llvm.alloca %109 x i8 : (i64) -> !llvm.ptr<i8> %111 = llvm.bitcast %110 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %13, %111 : !llvm.ptr<ptr<f16>> %112 = llvm.getelementptr %111[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %13, %112 : !llvm.ptr<ptr<f16>> %113 = llvm.getelementptr %111[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %114 = llvm.bitcast %113 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %114 : !llvm.ptr<i64> %115 = llvm.call @malloc(%109) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%115, %110, %109, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %116 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %117 = llvm.insertvalue %10, %116[0] : !llvm.struct<(i64, ptr<i8>)> %118 = llvm.insertvalue %115, %117[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %118 : !llvm.struct<(i64, ptr<i8>)> ^bb14(%119: i64, %120: i64): // 2 preds: ^bb12, ^bb15 %121 = llvm.icmp "sge" %119, %10 : i64 llvm.cond_br %121, ^bb15, ^bb16 ^bb15: // pred: ^bb14 %122 = llvm.getelementptr %21[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %123 = llvm.load %122 : !llvm.ptr<i64> %124 = llvm.getelementptr %100[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %123, %124 : !llvm.ptr<i64> %125 = llvm.getelementptr %101[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %120, %125 : !llvm.ptr<i64> %126 = llvm.mul %120, %123 : i64 %127 = llvm.sub %119, %11 : i64 llvm.br ^bb14(%127, %126 : i64, i64) ^bb16: // pred: ^bb14 %128 = llvm.call @malloc(%93) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%128, %94, %93, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %129 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %130 = llvm.insertvalue %arg1, %129[0] : !llvm.struct<(i64, ptr<i8>)> %131 = llvm.insertvalue %128, %130[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %131 : !llvm.struct<(i64, ptr<i8>)> } llvm.func @_mlir_ciface_Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<struct<(i64, ptr<i8>)>>, %arg1: !llvm.ptr<i8>, %arg2: !llvm.ptr<struct<(i64, ptr<i8>)>>) attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.load %arg2 : !llvm.ptr<struct<(i64, ptr<i8>)>> %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)> %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)> %3 = llvm.call @Rsqrt_CPU_DT_HALF_DT_HALF(%arg1, %1, %2) : (!llvm.ptr<i8>, i64, !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> llvm.store %3, %arg0 : !llvm.ptr<struct<(i64, ptr<i8>)>> llvm.return } }	2022-06-15 13:24:24 +02:00
Benjamin Kramer	fb34d531af	Promote bf16 to f32 when the target doesn't support it This is modeled after the half-precision fp support. Two new nodes are introduced for casting from and to bf16. Since casting from bf16 is a simple operation I opted to always directly lower it to integer arithmetic. The other way round is more complicated if you want to preserve IEEE semantics, so it's handled by a new __truncsfbf2 compiler-rt builtin. This is of course very bare bones, but sufficient to get a semi-softened fadd on x86. Possible future improvements: - Targets with bf16 conversion instructions can now make fp_to_bf16 legal - The software conversion to bf16 can be replaced by a trivial implementation under fast math. Differential Revision: https://reviews.llvm.org/D126953	2022-06-15 12:56:31 +02:00
Simon Pilgrim	4fd561415e	[X86] needCarryOrOverflowFlag/onlyZeroFlagUsed - merge identical switch cases. NFCI. Makes it easier to grok and fixes various bugprone-branch-clone warnings.	2022-06-15 10:40:22 +01:00
Phoebe Wang	6e02e27536	Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI" Disabled 2 mlir tests due to the runtime doesn't support `_Float16`, see the issue here https://github.com/llvm/llvm-project/issues/55992	2022-06-15 09:15:31 +08:00
Simon Pilgrim	64eea34420	[X86] combineEXTEND_VECTOR_INREG - don't attempt to shuffle combine ANY_EXTEND_VECTOR_INREG without SSE41 Without SSE41, ANY_EXTEND_VECTOR_INREG nodes are likely to be prematurely combined to a target shuffle preventing generic sign extension folds. Fixes a number of sign-extend regressions in D127115.	2022-06-13 17:42:04 +01:00
Mehdi Amini	5d8298a768	Revert "[X86][RFC] Enable `_Float16` type support on X86 following the psABI" This reverts commit `2d2da259c8`. This breaks MLIR integration test (JIT crashing), reverting in the meantime.	2022-06-12 15:14:37 +00:00
Simon Pilgrim	b5d7beeb97	[X86] combineConcatVectorOps - add support for concatenation of VSELECT/BLENDV nodes (REAPPLIED) If the LHS/RHS selection operands can be cheaply concatenated back together then replace 2 x 128-bit selection nodes with 1 x 256-bit node Addresses the regression introduced in the bug fix from rGd5af6a38082b39ae520a328e44dc29ebcb036bb2 REAPPLIED with for bug identified in rGea8fb3b60196	2022-06-12 15:40:36 +01:00
Phoebe Wang	2d2da259c8	[X86][RFC] Enable `_Float16` type support on X86 following the psABI GCC and Clang/LLVM will support `_Float16` on X86 in C/C++, following the latest X86 psABI. (https://gitlab.com/x86-psABIs) _Float16 arithmetic will be performed using native half-precision. If native arithmetic instructions are not available, it will be performed at a higher precision (currently always float) and then truncated down to _Float16 immediately after each single arithmetic operation. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107082	2022-06-12 11:40:00 +08:00
Simon Pilgrim	7841d09449	[X86][AVX512] Retain pmuldq broadcast loads on 32-bit targets Don't demand just the lower 32-bits on 32-bit AVX512 targets to preserve 64-bit broadcast loads patterns	2022-06-11 19:30:00 +01:00
Simon Pilgrim	6eaea225c7	[X86] combineTargetShuffle - break if-else chain. NFC. (style) Both cases always continue.	2022-06-11 09:16:39 +01:00
Simon Pilgrim	89d2b1e4f7	[X86] emitOrXorXorTree - break if-else chain. NFC. (style) Both cases always return.	2022-06-11 09:16:38 +01:00
Simon Pilgrim	5acbb2dda2	[X86] combineMulToPMADDWD - don't bitcast the source ops before splitting to ensure we split the build vectors early Fixes a regression on D127115 - splitting was creating extract_subvector(bitcast(build_vector())) patterns which prevented the build vectors being split before being bitcast to vXi16 types, resulting in various issues with further folding of the (now legal) build vectors	2022-06-10 13:44:49 +01:00
Simon Pilgrim	7ac33b8aac	[X86] Remove !VT.is128BitVector() check. NFCI. The code is inside a if(VT.is256BitVector() \|\| VT.is512BitVector()) condition	2022-06-09 21:39:45 +01:00
Simon Pilgrim	72a049d778	[X86][AVX2] LowerINSERT_VECTOR_ELT - support v4i64 insertion as BLENDI(X, SCALAR_TO_VECTOR(Y))	2022-06-09 21:18:10 +01:00
Simon Pilgrim	1a02db9882	[X86] canonicalizeShuffleWithBinOps - add TODO for X86ISD::ANDNP bitwise handling Its just as safe to move shuffles across X86ISD::ANDNP as any other logical bitop, they just tend to appear too late to matter. Noticed while triaging D127115 regressions.	2022-06-09 12:18:26 +01:00
Simon Pilgrim	9a76337fee	[X86] combineMOVMSK - constant fold with getTargetConstantBitsFromNode not just BUILD_VECTOR Help avoid a regression in D127115	2022-06-08 17:48:55 +01:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
Simon Pilgrim	f5507978a3	[X86] getFauxShuffleMask - add VSELECT/BLENDV handling First step towards enabling shuffle combining starting from VSELECT/BLENDV nodes - this should eventually help improve the codegen reported at Issue #54819	2022-06-07 14:46:25 +01:00
Simon Pilgrim	1b6d3bdc82	[X86] foldMaskedMergeImpl - pass SDLoc by const reference not value.	2022-06-07 12:36:30 +01:00
Simon Pilgrim	63e3035dbe	[X86] LowerGC_TRANSITION - remove redundant SDLoc().	2022-06-07 10:57:58 +01:00
Shilei Tian	0c3e6e5717	[NFC] Remove trailing whitespace	2022-06-06 18:59:13 -04:00
Eric Christopher	93cb6b9c83	Revert "[X86] combineConcatVectorOps - add support for concatenation VSELECT/BLENDV nodes" See the original commit for a testcase. This reverts commit `ea8fb3b601`.	2022-06-03 12:31:11 -07:00
Simon Pilgrim	de2b543505	[X86] LowerVSETCC - merge getConstant() calls with flipped/unflipped sign masks. NFCI.	2022-06-01 15:09:48 +01:00
Sanjay Patel	3a503a4a9c	[x86] fix miscompile from wrongly identified fneg We may need to peek through a bitcast when identifying an fneg idiom via its pool constant, but we can't allow a different-sized constant in that match. This is noted in issue #55758 with an example that needs fast-math, but as the test here shows, this has potential to miscompile more generally (no fast-math required). Differential Revision: https://reviews.llvm.org/D126775	2022-06-01 09:56:33 -04:00
Simon Pilgrim	f6dbb0b6fb	[X86] Fix typo in extraction type introduced in rGed0303aa2251e4484a2b4ff7f236c9f7cdfb2092 It doesn't look like we have test coverage for this at the moment :(	2022-06-01 12:31:27 +01:00
Simon Pilgrim	ea8fb3b601	[X86] combineConcatVectorOps - add support for concatenation VSELECT/BLENDV nodes If the LHS/RHS selection operands can be cheaply concatenated back together then replace 2 x 128-bit selection nodes with 1 x 256-bit node Addresses the regression introduced in the bug fix from rGd5af6a38082b39ae520a328e44dc29ebcb036bb2	2022-06-01 10:46:06 +01:00
Simon Pilgrim	d5af6a3808	[X86] LowerMINMAX - split v4i64 types on AVX1 targets (Issue #55648 ) Originally we tried to use default expansion for v4i64 types to make it easier to concatenate the results back together, but this can cause infinite loop issues with existing VSELECT splitting code in narrowExtractedVectorSelect if we have other uses of the VSELECT results (e.g. reduction patterns). To fix the infinite loop, this patch always splits MIN/MAX v4i64 nodes during lowering and I've added a TODO for combineConcatVectorOps to investigate when we can cheaply concatenate VSELECT/BLENDV nodes together. Fixes #55648 - regression test case will be added in a follow up.	2022-05-31 17:28:56 +01:00
Simon Pilgrim	af0113cf77	[X86] combineEXTRACT_SUBVECTOR - pull out repeated getVectorNumElements() calls. NFC.	2022-05-31 16:13:54 +01:00
Simon Pilgrim	b9443cb6fa	[X86] narrowExtractedVectorSelect - don't peek through bitcasts to find source vector We don't seem to need this for any test coverage and it was making tracking of the uses() of the source vector more difficult Noticed while investigating Issue #55648	2022-05-31 14:57:18 +01:00
Simon Pilgrim	ed0303aa22	[X86] LowerTRUNCATE - avoid creating extract_subvector(bitcast(vec)) patterns We have a generic DAG combine to attempt to fold extract_subvector(bitcast(vec)) -> bitcast(extract_subvector(vec)) but if we create these patterns late in lowering then we often miss them. Noticed while investigating Issue #55648 which gets caught in an infinite loop trying to split extract_subvector(bitcast(vselect()) patterns - this doesn't fix the issue yet but reduces the regressions from the WIP fix.	2022-05-31 14:30:56 +01:00
Zongwei Lan	ad73ce318e	[Target] use getSubtarget<> instead of static_cast<>(getSubtarget()) Differential Revision: https://reviews.llvm.org/D125391	2022-05-26 11:22:41 -07:00
Craig Topper	06fee478d2	[X86] Add isSimple check to the load combine in combineExtractVectorElt. I think we need to be sure the load isn't volatile before we duplicate and shrink it. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D126353	2022-05-25 09:11:11 -07:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Simon Pilgrim	320545b577	[X86] Rename combineCONCAT_VECTORS\INSERT_SUBVECTOR\EXTRACT_SUBVECTOR to match Opcode name. NFCI. Its a lot easier to quickly search for the combine when it actually contains the name of the opcode it combines.	2022-05-17 18:37:53 +01:00
Simon Pilgrim	c64f5d44ad	[X86] Attempt to fold EFLAGS into X86ISD::ADD/SUB ops We already use combineAddOrSubToADCOrSBB to fold extended EFLAGS results into ISD::ADD/SUB ops as X86ISD::ADC/SBB carry ops. This patch extends this to also try to fold EFLAGS results with X86ISD::ADD/SUB ops Differential Revision: https://reviews.llvm.org/D125642	2022-05-17 10:59:24 +01:00
Simon Pilgrim	b3077f563d	[X86] Move combineAddOrSubToADCOrSBB earlier. NFC. Make it easier to reuse in X86 ADD/SUB combines in an upcoming patch.	2022-05-15 22:06:33 +01:00
Simon Pilgrim	fd1f0c51ef	[X86] lowerShuffleAsLanePermuteAndSHUFP always succeeds, so just return the result. NFC.	2022-05-15 15:53:36 +01:00
Simon Pilgrim	c0f59be358	[X86] Pull out repeated isShuffleMaskInputInPlace calls. NFC.	2022-05-15 15:35:09 +01:00
Simon Pilgrim	32162cf291	[X86] lowerV4I64Shuffle - try harder to lower to PERMQ(BLENDD(V1,V2)) pattern	2022-05-15 14:57:58 +01:00
Simon Pilgrim	bc90bbb759	[X86] LowerAVG - fix cut+paste typo. NFC.	2022-05-14 17:42:09 +01:00
Simon Pilgrim	98f82d69bd	[X86] LowerStore - use is64BitVector() wrapper. NFCI.	2022-05-13 15:30:18 +01:00
Matthias Braun	cd19af74c0	Avoid 8 and 16bit switch conditions on x86 This adds a `TargetLoweringBase::getSwitchConditionType` callback to give targets a chance to control the type used in `CodeGenPrepare::optimizeSwitchInst`. Implement callback for X86 to avoid i8 and i16 types where possible as they often incur extra zero-extensions. This is NFC for non-X86 targets. Differential Revision: https://reviews.llvm.org/D124894	2022-05-10 10:00:10 -07:00
Simon Pilgrim	980f41d7c4	[X86] (style) Use auto for dyn_cast<> results	2022-05-01 17:15:18 +01:00
Simon Pilgrim	d4f06ec874	[X86] (style) Don't use auto for non obvious types	2022-05-01 17:10:21 +01:00
Simon Pilgrim	92235e3bf4	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - permit 32-bit sublane permute for unary v32i8 cases Increase the likelihood that we can lower to a permd(pshufb()) pattern, but only after we've attempted with 64-bit sublane permutes first Fixes #55066	2022-04-30 11:00:28 +01:00
Simon Pilgrim	b424055b52	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC. This is a NFC cleanup as part of the work on #55066 - the idea being that we will be able to check for multiple sub lane scales.	2022-04-29 16:03:50 +01:00
Simon Pilgrim	3562f855b7	[X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0	2022-04-29 12:08:47 +01:00
Simon Pilgrim	336a1233b2	[X86] SimplifyDemandedVectorEltsForTargetNode - fold shift(0,x) -> 0	2022-04-29 11:32:54 +01:00
Simon Pilgrim	6c44e398ec	[X86] combineShuffle - reuse SDLoc. NFCI.	2022-04-29 10:30:11 +01:00
Simon Pilgrim	2d7f0b1c22	[X86] Fold ANDNP(undef,x)/ANDNP(x,undef) -> 0 Matches the fold in DAGCombiner::visitANDLike.	2022-04-29 10:20:48 +01:00
Simon Pilgrim	ab17ed0723	[X86] Don't fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) on BMI2 targets With BMI2 we have SHRX which is a lot quicker than regular x86 shifts. Fixes #55138	2022-04-28 21:28:16 +01:00
Simon Pilgrim	9e3b7e8e65	[X86] getTargetVShiftByConstNode - use SelectionDAG::FoldConstantArithmetic to perform constant folding. NFCI. Remove some unnecessary code duplication.	2022-04-28 17:10:20 +01:00
Simon Pilgrim	de7cee24b6	[X86] getBT - attempt to peek through aext(and(trunc(x),c)) mask/modulo Ideally we'd fold this with generic DAGCombiner, but that only works for !isTruncateFree cases - we might be able to adapt IsDesirableToPromoteOp to find truncated src ops in the future, but for now just use this peephole. Noticed in Issue #55138	2022-04-28 16:10:26 +01:00
Simon Pilgrim	ed8dffef4c	[X86] getFauxShuffle - don't assume an UNDEF src element for AND/ANDNP results in an UNDEF shuffle mask index The other src element might be zero, guaranteeing zero. Fixes #55157	2022-04-28 12:32:58 +01:00
Simon Pilgrim	e378577524	[X86] Use is128BitLaneRepeatedShuffleMask wrapper. NFC. We don't need to know the actual repeated mask.	2022-04-27 21:09:57 +01:00
Simon Pilgrim	03482bccad	[X86] collectConcatOps - add ability to collect from vector 'widening' patterns Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvector on the fly.	2022-04-27 15:38:58 +01:00
Xiang1 Zhang	c430f0f532	[X86] Add use condition for combineSetCCMOVMSK Reviewed by RKSimon, LuoYuanke Differential Revision: https://reviews.llvm.org/D123652	2022-04-26 16:42:50 +08:00
Simon Pilgrim	e8305c0b8f	[X86] combineX86ShuffleChain - don't fold to truncate(concat(V1,V2)) if it was already a PACK op Fixes #55050	2022-04-25 17:13:44 +01:00
Craig Topper	c6fdb1de47	[X86] Move some hasOneUse checks after checking what the opcode is. Calling hasOneUse can be expensive on nodes with multiple results. Especially when some results are Chains. By checking the opcode first, we can avoid walking the uses if it isn't an interesting node, and thus avoid calling hasOneUse on a node that might have many uses. Found by profiling the IR given in D123857. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D123881	2022-04-16 14:18:58 -07:00
Craig Topper	9d86bf825c	[X86] Move hasOneUse check after opcode check. NFC Checking opcode is cheap. hasOneUse might not be if the node has multiple results. By checking the opcode we can rule out nodes with multiple results we aren't interested in.	2022-04-15 17:20:57 -07:00
Liu, Chen3	bf60a5af0a	[X86] Covert unsigned int 0 to float-point with FILD instruction. unsinged int 0 will be convert to float/double -0.0 when the rounding mode is set to 'FE_DOWNWARD'. Using FILD instruction instead of SSE instructions on 32-bit target if the strictfp is enabled. Differential Revision: https://reviews.llvm.org/D123660	2022-04-13 20:06:15 +08:00
Simon Pilgrim	0488c6638b	[X86] getFauxShuffleMask - remove use DemandedElts TODO Most of the getTargetShuffleInputs recursive calls have now gone and the remaining uses aren't likely to benefit from a DemandedElts mask	2022-04-12 15:36:30 +01:00
Simon Pilgrim	1e803d305a	Revert rG88ff6f70c45f2767576c64dde28cbfe7a90916ca "[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains" Reverting while I investigate reports of internal test regressions/failures	2022-04-11 10:42:43 +01:00
Simon Pilgrim	88ff6f70c4	[X86] Extend vselect(cond, pshufb(x), pshufb(y)) -> or(pshufb(x), pshufb(y)) to include inner or(pshufb(x), pshufb(y)) chains	2022-04-10 13:04:53 +01:00
Simon Pilgrim	c74d729bd6	[X86] combineExtractSubvector - fold extract_subvector(insert_subvector(V,X,C1),C1) extract_subvector(insert_subvector(V,X,C1),C1) -> insert_subvector(extract_subvector(V,C1),X,0) More aggressively attempt to reduce the width of an extract_subvector source - we currently only do this if we're inserting into a zero vector (i.e. canonicalizing to the AVX implicit zero upper elts pattern). But if we're extracting from the same point as the inner insert_subvector then the fold is still relatively trivial - we can probably do even better if we can ensure the subvector isn't badly split.	2022-04-10 11:03:08 +01:00

1 2 3 4 5 ...

8184 Commits