llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Edd Barrett	fa250250b2	Migrate llvm.experimental.patchpoint() to ptr. This intrinsic used a typed pointer for a call target operand. This change updates the operand to be an opaque pointer and updates all pointers in all test files that use the intrinsic. Differential revision: https://reviews.llvm.org/D131261	2022-08-10 13:18:02 +01:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Matt Arsenault	97ed2fbc5f	MIR: Fix parse error on empty CustomRegMask	2022-06-27 08:50:35 -04:00
Phoebe Wang	655ba9c8a1	Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This resolves problems reported in commit `1a20252978`. 1. Promote to float lowering for nodes XINT_TO_FP 2. Bail out f16 from shuffle combine due to vector type is not legal in the version	2022-06-17 21:34:05 +08:00
Benjamin Kramer	1a20252978	Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This reverts commit `04a3d5f3a1`. I see two more issues: - uitofp/sitofp from i32/i64 to half now generates __floatsihf/__floatdihf, which exists in neither compiler-rt nor libgcc - This crashes when legalizing the bitcast: ``` ; RUN: llc < %s -mcpu=skx define void @main.45(ptr nocapture readnone %retval, ptr noalias nocapture readnone %run_options, ptr noalias nocapture readnone %params, ptr noalias nocapture readonly %buffer_table, ptr noalias nocapture readnone %status, ptr noalias nocapture readnone %prof_counters) local_unnamed_addr { entry: %fusion = load ptr, ptr %buffer_table, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_1.2 = load ptr, ptr %0, align 8 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 2 %Arg_0.1 = load ptr, ptr %1, align 8 %2 = load half, ptr %Arg_0.1, align 8 %3 = bitcast half %2 to i16 %4 = and i16 %3, 32767 %5 = icmp eq i16 %4, 0 %6 = and i16 %3, -32768 %broadcast.splatinsert = insertelement <4 x half> poison, half %2, i64 0 %broadcast.splat = shufflevector <4 x half> %broadcast.splatinsert, <4 x half> poison, <4 x i32> zeroinitializer %broadcast.splatinsert9 = insertelement <4 x i16> poison, i16 %4, i64 0 %broadcast.splat10 = shufflevector <4 x i16> %broadcast.splatinsert9, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert11 = insertelement <4 x i16> poison, i16 %6, i64 0 %broadcast.splat12 = shufflevector <4 x i16> %broadcast.splatinsert11, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert13 = insertelement <4 x i16> poison, i16 %3, i64 0 %broadcast.splat14 = shufflevector <4 x i16> %broadcast.splatinsert13, <4 x i16> poison, <4 x i32> zeroinitializer %wide.load = load <4 x half>, ptr %Arg_1.2, align 8 %7 = fcmp uno <4 x half> %broadcast.splat, %wide.load %8 = fcmp oeq <4 x half> %broadcast.splat, %wide.load %9 = bitcast <4 x half> %wide.load to <4 x i16> %10 = and <4 x i16> %9, <i16 32767, i16 32767, i16 32767, i16 32767> %11 = icmp eq <4 x i16> %10, zeroinitializer %12 = and <4 x i16> %9, <i16 -32768, i16 -32768, i16 -32768, i16 -32768> %13 = or <4 x i16> %12, <i16 1, i16 1, i16 1, i16 1> %14 = select <4 x i1> %11, <4 x i16> %9, <4 x i16> %13 %15 = icmp ugt <4 x i16> %broadcast.splat10, %10 %16 = icmp ne <4 x i16> %broadcast.splat12, %12 %17 = or <4 x i1> %15, %16 %18 = select <4 x i1> %17, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1> %19 = add <4 x i16> %18, %broadcast.splat14 %20 = select i1 %5, <4 x i16> %14, <4 x i16> %19 %21 = select <4 x i1> %8, <4 x i16> %9, <4 x i16> %20 %22 = bitcast <4 x i16> %21 to <4 x half> %23 = select <4 x i1> %7, <4 x half> <half 0xH7E00, half 0xH7E00, half 0xH7E00, half 0xH7E00>, <4 x half> %22 store <4 x half> %23, ptr %fusion, align 16 ret void } ``` llc: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:977: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode ): Assertion `(TLI.getTypeAction(DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal \|\| Op.getOpcode() == ISD::TargetConstant \|\| Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.	2022-06-17 09:43:07 +02:00
Phoebe Wang	04a3d5f3a1	Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" Fix the crash on lowering X86ISD::FCMP.	2022-06-17 12:12:17 +08:00
Frederik Gossen	3cd5696a33	Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" This reverts commit `e1c5afa47d`. This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is below. Let me know if you have trouble reproducing this. ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\00\00\00?" @1 = private unnamed_addr constant [4 x i8] c"\1C}\908" @2 = private unnamed_addr constant [4 x i8] c"?\00\\4" @3 = private unnamed_addr constant [4 x i8] c"%ci1" @4 = private unnamed_addr constant [4 x i8] zeroinitializer @5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0" @6 = private unnamed_addr constant [4 x i8] c"\00\00\00B" @7 = private unnamed_addr constant [4 x i8] c"\94\B4\C22" @8 = private unnamed_addr constant [4 x i8] c"^\09B6" @9 = private unnamed_addr constant [4 x i8] c"\15\F3M?" @10 = private unnamed_addr constant [4 x i8] c"e\CC\\;" @11 = private unnamed_addr constant [4 x i8] c"d\BD/>" @12 = private unnamed_addr constant [4 x i8] c"V\F4I=" @13 = private unnamed_addr constant [4 x i8] c"\10\CB,<" @14 = private unnamed_addr constant [4 x i8] c"\AC\E3\D6:" @15 = private unnamed_addr constant [4 x i8] c"\DC\A8E9" @16 = private unnamed_addr constant [4 x i8] c"\C6\FA\897" @17 = private unnamed_addr constant [4 x i8] c"%\F9\955" @18 = private unnamed_addr constant [4 x i8] c"\B5\DB\813" @19 = private unnamed_addr constant [4 x i8] c"\B4W_\B2" @20 = private unnamed_addr constant [4 x i8] c"\1Cc\8F\B4" @21 = private unnamed_addr constant [4 x i8] c"~3\94\B6" @22 = private unnamed_addr constant [4 x i8] c"3Yq\B8" @23 = private unnamed_addr constant [4 x i8] c"\E9\17\17\BA" @24 = private unnamed_addr constant [4 x i8] c"\F1\B2\8D\BB" @25 = private unnamed_addr constant [4 x i8] c"\F8t\C2\BC" @26 = private unnamed_addr constant [4 x i8] c"\82[\C2\BD" @27 = private unnamed_addr constant [4 x i8] c"uB-?" @28 = private unnamed_addr constant [4 x i8] c"^\FF\9B\BE" @29 = private unnamed_addr constant [4 x i8] c"\00\00\00A" ; Function Attrs: uwtable define void @main.158(ptr %retval, ptr noalias %run_options, ptr noalias %params, ptr noalias %buffer_table, ptr noalias %status, ptr noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.1 = alloca i64, align 8 %fusion.invar_address.dim.0 = alloca i64, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_0.1 = load ptr, ptr %0, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 0 %fusion = load ptr, ptr %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 store i64 0, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 return: ; preds = %fusion.loop_exit.dim.0 ret void fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %entry %fusion.indvar.dim.0 = load i64, ptr %fusion.invar_address.dim.0, align 8 %2 = icmp uge i64 %fusion.indvar.dim.0, 3 br i1 %2, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_body.dim.1, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, ptr %fusion.invar_address.dim.1, align 8 %3 = icmp uge i64 %fusion.indvar.dim.1, 1 br i1 %3, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 %4 = getelementptr inbounds [3 x [1 x half]], ptr %Arg_0.1, i64 0, i64 %fusion.indvar.dim.0, i64 0 %5 = load half, ptr %4, align 2, !invariant.load !0, !noalias !3 %6 = fpext half %5 to float %7 = call float @llvm.fabs.f32(float %6) %constant.121 = load float, ptr @29, align 4 %compare.2 = fcmp ole float %7, %constant.121 %8 = zext i1 %compare.2 to i8 %constant.120 = load float, ptr @0, align 4 %multiply.95 = fmul float %7, %constant.120 %constant.119 = load float, ptr @5, align 4 %add.82 = fadd float %multiply.95, %constant.119 %constant.118 = load float, ptr @4, align 4 %multiply.94 = fmul float %add.82, %constant.118 %constant.117 = load float, ptr @19, align 4 %add.81 = fadd float %multiply.94, %constant.117 %multiply.92 = fmul float %add.82, %add.81 %constant.116 = load float, ptr @18, align 4 %add.79 = fadd float %multiply.92, %constant.116 %multiply.91 = fmul float %add.82, %add.79 %subtract.87 = fsub float %multiply.91, %add.81 %constant.115 = load float, ptr @20, align 4 %add.78 = fadd float %subtract.87, %constant.115 %multiply.89 = fmul float %add.82, %add.78 %subtract.86 = fsub float %multiply.89, %add.79 %constant.114 = load float, ptr @17, align 4 %add.76 = fadd float %subtract.86, %constant.114 %multiply.88 = fmul float %add.82, %add.76 %subtract.84 = fsub float %multiply.88, %add.78 %constant.113 = load float, ptr @21, align 4 %add.75 = fadd float %subtract.84, %constant.113 %multiply.86 = fmul float %add.82, %add.75 %subtract.83 = fsub float %multiply.86, %add.76 %constant.112 = load float, ptr @16, align 4 %add.73 = fadd float %subtract.83, %constant.112 %multiply.85 = fmul float %add.82, %add.73 %subtract.81 = fsub float %multiply.85, %add.75 %constant.111 = load float, ptr @22, align 4 %add.72 = fadd float %subtract.81, %constant.111 %multiply.83 = fmul float %add.82, %add.72 %subtract.80 = fsub float %multiply.83, %add.73 %constant.110 = load float, ptr @15, align 4 %add.70 = fadd float %subtract.80, %constant.110 %multiply.82 = fmul float %add.82, %add.70 %subtract.78 = fsub float %multiply.82, %add.72 %constant.109 = load float, ptr @23, align 4 %add.69 = fadd float %subtract.78, %constant.109 %multiply.80 = fmul float %add.82, %add.69 %subtract.77 = fsub float %multiply.80, %add.70 %constant.108 = load float, ptr @14, align 4 %add.68 = fadd float %subtract.77, %constant.108 %multiply.79 = fmul float %add.82, %add.68 %subtract.75 = fsub float %multiply.79, %add.69 %constant.107 = load float, ptr @24, align 4 %add.67 = fadd float %subtract.75, %constant.107 %multiply.77 = fmul float %add.82, %add.67 %subtract.74 = fsub float %multiply.77, %add.68 %constant.106 = load float, ptr @13, align 4 %add.66 = fadd float %subtract.74, %constant.106 %multiply.76 = fmul float %add.82, %add.66 %subtract.72 = fsub float %multiply.76, %add.67 %constant.105 = load float, ptr @25, align 4 %add.65 = fadd float %subtract.72, %constant.105 %multiply.74 = fmul float %add.82, %add.65 %subtract.71 = fsub float %multiply.74, %add.66 %constant.104 = load float, ptr @12, align 4 %add.64 = fadd float %subtract.71, %constant.104 %multiply.73 = fmul float %add.82, %add.64 %subtract.69 = fsub float %multiply.73, %add.65 %constant.103 = load float, ptr @26, align 4 %add.63 = fadd float %subtract.69, %constant.103 %multiply.71 = fmul float %add.82, %add.63 %subtract.67 = fsub float %multiply.71, %add.64 %constant.102 = load float, ptr @11, align 4 %add.62 = fadd float %subtract.67, %constant.102 %multiply.70 = fmul float %add.82, %add.62 %subtract.66 = fsub float %multiply.70, %add.63 %constant.101 = load float, ptr @28, align 4 %add.61 = fadd float %subtract.66, %constant.101 %multiply.68 = fmul float %add.82, %add.61 %subtract.65 = fsub float %multiply.68, %add.62 %constant.100 = load float, ptr @27, align 4 %add.60 = fadd float %subtract.65, %constant.100 %subtract.64 = fsub float %add.60, %add.62 %multiply.66 = fmul float %subtract.64, %constant.120 %constant.99 = load float, ptr @6, align 4 %divide.4 = fdiv float %constant.99, %7 %add.59 = fadd float %divide.4, %constant.119 %multiply.65 = fmul float %add.59, %constant.118 %constant.98 = load float, ptr @3, align 4 %add.58 = fadd float %multiply.65, %constant.98 %multiply.64 = fmul float %add.59, %add.58 %constant.97 = load float, ptr @7, align 4 %add.57 = fadd float %multiply.64, %constant.97 %multiply.63 = fmul float %add.59, %add.57 %subtract.63 = fsub float %multiply.63, %add.58 %constant.96 = load float, ptr @2, align 4 %add.56 = fadd float %subtract.63, %constant.96 %multiply.62 = fmul float %add.59, %add.56 %subtract.62 = fsub float %multiply.62, %add.57 %constant.95 = load float, ptr @8, align 4 %add.55 = fadd float %subtract.62, %constant.95 %multiply.61 = fmul float %add.59, %add.55 %subtract.61 = fsub float %multiply.61, %add.56 %constant.94 = load float, ptr @1, align 4 %add.54 = fadd float %subtract.61, %constant.94 %multiply.60 = fmul float %add.59, %add.54 %subtract.60 = fsub float %multiply.60, %add.55 %constant.93 = load float, ptr @10, align 4 %add.53 = fadd float %subtract.60, %constant.93 %multiply.59 = fmul float %add.59, %add.53 %subtract.59 = fsub float %multiply.59, %add.54 %constant.92 = load float, ptr @9, align 4 %add.52 = fadd float %subtract.59, %constant.92 %subtract.58 = fsub float %add.52, %add.54 %multiply.58 = fmul float %subtract.58, %constant.120 %9 = call float @llvm.sqrt.f32(float %7) %10 = fdiv float 1.000000e+00, %9 %multiply.57 = fmul float %multiply.58, %10 %11 = trunc i8 %8 to i1 %12 = select i1 %11, float %multiply.66, float %multiply.57 %13 = fptrunc float %12 to half %14 = getelementptr inbounds [3 x [1 x half]], ptr %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 0 store half %13, ptr %14, align 2, !alias.scope !3 %invar.inc1 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc1, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 br label %return } ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.fabs.f32(float %0) #1 ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.sqrt.f32(float %0) #1 attributes #0 = { uwtable "denormal-fp-math"="preserve-sign" "no-frame-pointer-elim"="false" } attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 6} !2 = !{i64 8} !3 = !{!4} !4 = !{!"buffer: {index:0, offset:0, size:6}", !5} !5 = !{!"XLA global AA domain"}	2022-06-15 18:04:42 -04:00
Phoebe Wang	e1c5afa47d	Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"" Fixed the missing SQRT promotion. Adding several missing operations too.	2022-06-15 23:00:18 +08:00
Thomas Joerg	37455b1f71	Revert "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"" This reverts commit `6e02e27536`. This introduces a crash in the backend. Reproducer in MLIR's LLVM dialect follows. Let me know if you have trouble reproducing this. module { llvm.func @malloc(i64) -> !llvm.ptr<i8> llvm.func @_mlir_ciface_tf_report_error(!llvm.ptr<i8>, i32, !llvm.ptr<i8>) llvm.mlir.global internal constant @error_message_2208944672953921889("failed to allocate memory at loc(\22-\22:3:8)\00") llvm.func @_mlir_ciface_tf_alloc(!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> llvm.func @Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<i8>, %arg1: i64, %arg2: !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.mlir.constant(8 : i32) : i32 %1 = llvm.mlir.constant(8 : index) : i64 %2 = llvm.mlir.constant(2 : index) : i64 %3 = llvm.mlir.constant(dense<0.000000e+00> : vector<4xf16>) : vector<4xf16> %4 = llvm.mlir.constant(dense<[0, 1, 2, 3]> : vector<4xi32>) : vector<4xi32> %5 = llvm.mlir.constant(dense<1.000000e+00> : vector<4xf16>) : vector<4xf16> %6 = llvm.mlir.constant(false) : i1 %7 = llvm.mlir.constant(1 : i32) : i32 %8 = llvm.mlir.constant(0 : i32) : i32 %9 = llvm.mlir.constant(4 : index) : i64 %10 = llvm.mlir.constant(0 : index) : i64 %11 = llvm.mlir.constant(1 : index) : i64 %12 = llvm.mlir.constant(-1 : index) : i64 %13 = llvm.mlir.null : !llvm.ptr<f16> %14 = llvm.getelementptr %13[%9] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %15 = llvm.ptrtoint %14 : !llvm.ptr<f16> to i64 %16 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %17 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %18 = llvm.mlir.null : !llvm.ptr<i64> %19 = llvm.getelementptr %18[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %20 = llvm.ptrtoint %19 : !llvm.ptr<i64> to i64 %21 = llvm.alloca %20 x i64 : (i64) -> !llvm.ptr<i64> llvm.br ^bb1(%10 : i64) ^bb1(%22: i64): // 2 preds: ^bb0, ^bb2 %23 = llvm.icmp "slt" %22, %arg1 : i64 llvm.cond_br %23, ^bb2, ^bb3 ^bb2: // pred: ^bb1 %24 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %25 = llvm.getelementptr %24[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %26 = llvm.add %22, %11 : i64 %27 = llvm.getelementptr %25[%26] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %28 = llvm.load %27 : !llvm.ptr<i64> %29 = llvm.getelementptr %21[%22] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %28, %29 : !llvm.ptr<i64> llvm.br ^bb1(%26 : i64) ^bb3: // pred: ^bb1 llvm.br ^bb4(%10, %11 : i64, i64) ^bb4(%30: i64, %31: i64): // 2 preds: ^bb3, ^bb5 %32 = llvm.icmp "slt" %30, %arg1 : i64 llvm.cond_br %32, ^bb5, ^bb6 ^bb5: // pred: ^bb4 %33 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %34 = llvm.getelementptr %33[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %35 = llvm.add %30, %11 : i64 %36 = llvm.getelementptr %34[%35] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %37 = llvm.load %36 : !llvm.ptr<i64> %38 = llvm.mul %37, %31 : i64 llvm.br ^bb4(%35, %38 : i64, i64) ^bb6: // pred: ^bb4 %39 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> %40 = llvm.getelementptr %39[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %41 = llvm.load %40 : !llvm.ptr<ptr<f16>> %42 = llvm.getelementptr %13[%11] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %43 = llvm.ptrtoint %42 : !llvm.ptr<f16> to i64 %44 = llvm.alloca %7 x i32 : (i32) -> !llvm.ptr<i32> llvm.store %8, %44 : !llvm.ptr<i32> %45 = llvm.call @_mlir_ciface_tf_alloc(%arg0, %31, %43, %8, %7, %44) : (!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> %46 = llvm.bitcast %45 : !llvm.ptr<i8> to !llvm.ptr<f16> %47 = llvm.icmp "eq" %31, %10 : i64 %48 = llvm.or %6, %47 : i1 %49 = llvm.mlir.null : !llvm.ptr<i8> %50 = llvm.icmp "ne" %45, %49 : !llvm.ptr<i8> %51 = llvm.or %50, %48 : i1 llvm.cond_br %51, ^bb7, ^bb13 ^bb7: // pred: ^bb6 %52 = llvm.urem %31, %9 : i64 %53 = llvm.sub %31, %52 : i64 llvm.br ^bb8(%10 : i64) ^bb8(%54: i64): // 2 preds: ^bb7, ^bb9 %55 = llvm.icmp "slt" %54, %53 : i64 llvm.cond_br %55, ^bb9, ^bb10 ^bb9: // pred: ^bb8 %56 = llvm.mul %54, %11 : i64 %57 = llvm.add %56, %10 : i64 %58 = llvm.add %57, %10 : i64 %59 = llvm.getelementptr %41[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %60 = llvm.bitcast %59 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %61 = llvm.load %60 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %62 = "llvm.intr.sqrt"(%61) : (vector<4xf16>) -> vector<4xf16> %63 = llvm.fdiv %5, %62 : vector<4xf16> %64 = llvm.getelementptr %46[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %65 = llvm.bitcast %64 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %63, %65 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %66 = llvm.add %54, %9 : i64 llvm.br ^bb8(%66 : i64) ^bb10: // pred: ^bb8 %67 = llvm.icmp "ult" %53, %31 : i64 llvm.cond_br %67, ^bb11, ^bb12 ^bb11: // pred: ^bb10 %68 = llvm.mul %53, %12 : i64 %69 = llvm.add %31, %68 : i64 %70 = llvm.mul %53, %11 : i64 %71 = llvm.add %70, %10 : i64 %72 = llvm.trunc %69 : i64 to i32 %73 = llvm.mlir.undef : vector<4xi32> %74 = llvm.insertelement %72, %73[%8 : i32] : vector<4xi32> %75 = llvm.shufflevector %74, %73 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xi32>, vector<4xi32> %76 = llvm.icmp "slt" %4, %75 : vector<4xi32> %77 = llvm.add %71, %10 : i64 %78 = llvm.getelementptr %41[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %79 = llvm.bitcast %78 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %80 = llvm.intr.masked.load %79, %76, %3 {alignment = 2 : i32} : (!llvm.ptr<vector<4xf16>>, vector<4xi1>, vector<4xf16>) -> vector<4xf16> %81 = llvm.bitcast %16 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %80, %81 : !llvm.ptr<vector<4xf16>> %82 = llvm.load %81 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %83 = "llvm.intr.sqrt"(%82) : (vector<4xf16>) -> vector<4xf16> %84 = llvm.fdiv %5, %83 : vector<4xf16> %85 = llvm.bitcast %17 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %84, %85 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %86 = llvm.load %85 : !llvm.ptr<vector<4xf16>> %87 = llvm.getelementptr %46[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %88 = llvm.bitcast %87 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.intr.masked.store %86, %88, %76 {alignment = 2 : i32} : vector<4xf16>, vector<4xi1> into !llvm.ptr<vector<4xf16>> llvm.br ^bb12 ^bb12: // 2 preds: ^bb10, ^bb11 %89 = llvm.mul %2, %1 : i64 %90 = llvm.mul %arg1, %2 : i64 %91 = llvm.add %90, %11 : i64 %92 = llvm.mul %91, %1 : i64 %93 = llvm.add %89, %92 : i64 %94 = llvm.alloca %93 x i8 : (i64) -> !llvm.ptr<i8> %95 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %46, %95 : !llvm.ptr<ptr<f16>> %96 = llvm.getelementptr %95[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %46, %96 : !llvm.ptr<ptr<f16>> %97 = llvm.getelementptr %95[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %98 = llvm.bitcast %97 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %98 : !llvm.ptr<i64> %99 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>> %100 = llvm.getelementptr %99[%10, 3] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>>, i64) -> !llvm.ptr<i64> %101 = llvm.getelementptr %100[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %102 = llvm.sub %arg1, %11 : i64 llvm.br ^bb14(%102, %11 : i64, i64) ^bb13: // pred: ^bb6 %103 = llvm.mlir.addressof @error_message_2208944672953921889 : !llvm.ptr<array<42 x i8>> %104 = llvm.getelementptr %103[%10, %10] : (!llvm.ptr<array<42 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @_mlir_ciface_tf_report_error(%arg0, %0, %104) : (!llvm.ptr<i8>, i32, !llvm.ptr<i8>) -> () %105 = llvm.mul %2, %1 : i64 %106 = llvm.mul %2, %10 : i64 %107 = llvm.add %106, %11 : i64 %108 = llvm.mul %107, %1 : i64 %109 = llvm.add %105, %108 : i64 %110 = llvm.alloca %109 x i8 : (i64) -> !llvm.ptr<i8> %111 = llvm.bitcast %110 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %13, %111 : !llvm.ptr<ptr<f16>> %112 = llvm.getelementptr %111[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %13, %112 : !llvm.ptr<ptr<f16>> %113 = llvm.getelementptr %111[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %114 = llvm.bitcast %113 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %114 : !llvm.ptr<i64> %115 = llvm.call @malloc(%109) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%115, %110, %109, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %116 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %117 = llvm.insertvalue %10, %116[0] : !llvm.struct<(i64, ptr<i8>)> %118 = llvm.insertvalue %115, %117[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %118 : !llvm.struct<(i64, ptr<i8>)> ^bb14(%119: i64, %120: i64): // 2 preds: ^bb12, ^bb15 %121 = llvm.icmp "sge" %119, %10 : i64 llvm.cond_br %121, ^bb15, ^bb16 ^bb15: // pred: ^bb14 %122 = llvm.getelementptr %21[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %123 = llvm.load %122 : !llvm.ptr<i64> %124 = llvm.getelementptr %100[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %123, %124 : !llvm.ptr<i64> %125 = llvm.getelementptr %101[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %120, %125 : !llvm.ptr<i64> %126 = llvm.mul %120, %123 : i64 %127 = llvm.sub %119, %11 : i64 llvm.br ^bb14(%127, %126 : i64, i64) ^bb16: // pred: ^bb14 %128 = llvm.call @malloc(%93) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%128, %94, %93, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %129 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %130 = llvm.insertvalue %arg1, %129[0] : !llvm.struct<(i64, ptr<i8>)> %131 = llvm.insertvalue %128, %130[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %131 : !llvm.struct<(i64, ptr<i8>)> } llvm.func @_mlir_ciface_Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<struct<(i64, ptr<i8>)>>, %arg1: !llvm.ptr<i8>, %arg2: !llvm.ptr<struct<(i64, ptr<i8>)>>) attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.load %arg2 : !llvm.ptr<struct<(i64, ptr<i8>)>> %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)> %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)> %3 = llvm.call @Rsqrt_CPU_DT_HALF_DT_HALF(%arg1, %1, %2) : (!llvm.ptr<i8>, i64, !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> llvm.store %3, %arg0 : !llvm.ptr<struct<(i64, ptr<i8>)>> llvm.return } }	2022-06-15 13:24:24 +02:00
Phoebe Wang	6e02e27536	Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI" Disabled 2 mlir tests due to the runtime doesn't support `_Float16`, see the issue here https://github.com/llvm/llvm-project/issues/55992	2022-06-15 09:15:31 +08:00
Mehdi Amini	5d8298a768	Revert "[X86][RFC] Enable `_Float16` type support on X86 following the psABI" This reverts commit `2d2da259c8`. This breaks MLIR integration test (JIT crashing), reverting in the meantime.	2022-06-12 15:14:37 +00:00
Phoebe Wang	2d2da259c8	[X86][RFC] Enable `_Float16` type support on X86 following the psABI GCC and Clang/LLVM will support `_Float16` on X86 in C/C++, following the latest X86 psABI. (https://gitlab.com/x86-psABIs) _Float16 arithmetic will be performed using native half-precision. If native arithmetic instructions are not available, it will be performed at a higher precision (currently always float) and then truncated down to _Float16 immediately after each single arithmetic operation. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107082	2022-06-12 11:40:00 +08:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Scott Linder	2d43955cec	[AMDGPU][NFC] Refactor AMDGPUCallingConv.td Rename CalleeSavedRegs defs to avoid being overly specific: * CSR_AMDGPU_AGPRs_32_255 => CSR_AMDGPU_AGPRs * CSR_AMDGPU_SGPRs_30_31 + CSR_AMDGPU_SGPRs_32_105 => CSR_AMDGPU_SGPRs * CSR_AMDGPU_SI_Gfx_SGPRs_4_29 + CSR_AMDGPU_SI_Gfx_SGPRs_64_105 => CSR_AMDGPU_SI_Gfx_SGPRs * CSR_AMDGPU_HighRegs => CSR_AMDGPU * CSR_AMDGPU_HighRegs_With_AGPRs => CSR_AMDGPU_GFX90AInsts * CSR_AMDGPU_SI_Gfx_With_AGPRs => CSR_AMDGPU_SI_Gfx_GFX90AInsts Introduce a class RegMask to mark the cases where we use the CalleeSavedRegs class purely as an expedient way to produce a mask. Update the names of these masks to not mention "CSR". Other targets also seem to do this, so a reasonable alternative is to actually update table-gen to include a new class to do this explicitly, but the current approach seems harmless so I opted to just make it more explicit. Reviewed By: arsenm, sebastian-ne Differential Revision: https://reviews.llvm.org/D109008	2022-06-01 16:24:09 +00:00
Ivan Kosarev	86803008ea	[MIR] Provide location of extra instruction operand when diagnosing it. Also resolves misspelled FileCheck directives caught with D125604. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D125965	2022-05-20 05:56:25 +01:00
Shengchen Kan	6a6b0e4a63	[X86] Check the address in machine verifier 1. The scale factor must be 1, 2, 4, 8 2. The displacement must fit in 32-bit signed integer Noticed by: https://github.com/llvm/llvm-project/issues/55091 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D124455	2022-04-28 10:05:39 +08:00
Matt Arsenault	9c122537cd	MIR: Serialize FunctionContextIdx in MachineFrameInfo	2022-04-22 11:07:41 -04:00
Matt Arsenault	987df725ac	AMDGPU: Serialize VGPRForAGPRCopy	2022-04-19 22:14:52 -04:00
Matt Arsenault	b5ec131267	AMDGPU: Fix allocating GDS globals to LDS offsets These don't seem to be very well used or tested, but try to make the behavior a bit more consistent with LDS globals. I'm not sure what the definition for amdgpu-gds-size is supposed to mean. For now I assumed it's allocating a static size at the beginning of the allocation, and any known globals are allocated after it.	2022-04-19 22:14:48 -04:00
Matt Arsenault	378bb8014d	AMDGPU: Serialize a few more MachineFunctionInfo fields in MIR	2022-04-19 22:12:59 -04:00
Matt Arsenault	f90f4884c8	AMDGPU: Serialize gds size in MIR	2022-04-19 22:12:59 -04:00
Matt Arsenault	5cd17f9d43	AMDGPU: Serialize WWM registers	2022-04-19 21:44:43 -04:00
Matt Arsenault	b8033de063	MIR: Serialize a few bool function fields	2022-04-15 20:31:07 -04:00
Matt Arsenault	5a5034d508	GlobalISel: Verify atomic load/store ordering restriction Reject acquire stores and release loads. This matches the restriction imposed by the LLParser and IR verifier.	2022-04-11 20:12:22 -04:00
Matt Arsenault	7e8ff962b3	AArch64/GlobalISel: Regenerate mir test checks Minimizes the test diffs in future changes from introduction of -NEXT.	2022-04-11 20:12:22 -04:00
Kito Cheng	5286c7aef8	[RISCV][NFC] Add missing lit.local.cfg in test/CodeGen/MIR/RISCV/	2022-04-08 12:10:20 +08:00
Kito Cheng	690085c9b7	[RISCV] Store/restore RISCVMachineFunctionInfo into MIR YAML file RISCVMachineFunctionInfo has some fields like VarArgsFrameIndex and VarArgsSaveSize are calculated at ISel lowering stage, those info are not contained in MIR files, that cause test cases rely on those field can't not reproduce correctly by MIR dump files. This patch adding the MIR read/write for those fields. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D123178	2022-04-08 11:55:48 +08:00
Matt Arsenault	ced1250b0f	MIRParser: Fix asserting with invalid flags on machine operands Constructing an operand with kills on defs and deads on uses asserts in the constructor, so diagnose these.	2022-04-05 21:46:26 -04:00
Venkata Ramanaiah Nalamothu	04fff547e2	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm, ronlieb Differential Revision: https://reviews.llvm.org/D114652	2022-03-09 12:18:02 +05:30
Jay Foad	719bac55df	[MIRParser] Diagnose too large align values in MachineMemOperands When parsing MachineMemOperands, MIRParser treated the "align" keyword the same as "basealign". Really "basealign" should specify the alignment of the MachinePointerInfo base value, and "align" should specify the alignment of that base value plus the offset. This worked OK when the specified alignment was no larger than the alignment of the offset, but in cases like this it just caused confusion: STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8) MIRPrinter would never have printed this, with an offset of 4 but an align of 8, so it must have been written by hand. MIRParser would interpret "align 8" as "basealign 8", but I think it is better to give an error and force the user to write "basealign 8" if that is what they really meant. Differential Revision: https://reviews.llvm.org/D120400 Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8	2022-02-24 15:32:08 +00:00
Matt Arsenault	9c7ca51b2c	MIR: Start diagnosing too many operands on an instruction Previously this would just assert which was annoying and didn't point to the specific instruction/operand.	2022-02-21 10:36:39 -05:00
Jay Foad	ddd3807e69	[AMDGPU] Use new target MMO flag MONoClobber This allows us to set the noclobber flag on (the MMO of) a load instruction instead of on the pointer. This fixes a bug where noclobber was being applied to all loads from the same pointer, even if some of them were clobbered. Differential Revision: https://reviews.llvm.org/D118775	2022-02-02 17:12:36 +00:00
Ron Lieberman	09b53296cf	Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range" This reverts commit `9075009d1f`. Failed amdgpu runtime buildbot # 3514	2021-12-22 11:39:28 -05:00
RamNalamothu	9075009d1f	[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range Currently the return address ABI registers s[30:31], which fall in the call clobbered register range, are added as a live-in on the function entry to preserve its value when we have calls so that it gets saved and restored around the calls. But the DWARF unwind information (CFI) needs to track where the return address resides in a frame and the above approach makes it difficult to track the return address when the CFI information is emitted during the frame lowering, due to the involvment of understanding the control flow. This patch moves the return address ABI registers s[30:31] into callee saved registers range and stops adding live-in for return address registers, so that the CFI machinery will know where the return address resides when CSR save/restore happen during the frame lowering. And doing the above poses an issue that now the return instruction uses undefined register `sgpr30_sgpr31`. This is resolved by hiding the return address register use by the return instruction through the `SI_RETURN` pseudo instruction, which doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the `S_SETPC_B64_return` during the `expandPostRAPseudo()`. As an added benefit, this patch simplifies overall return instruction handling. Note: The AMDGPU CFI changes are there only in the downstream code and another version of this patch will be posted for review for the downstream code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D114652	2021-12-22 20:51:12 +05:30
Matt Arsenault	729bf9b26b	AMDGPU: Enable fixed function ABI by default Code using indirect calls is broken without this, and there isn't really much value in supporting the old attempt to vary the argument placement based on uses. This resulted in more argument shuffling code anyway. Also have the option stop implying all inputs need to be passed. This will no rely on the amdgpu-no-* attributes to avoid passing unnecessary values.	2021-12-04 10:49:18 -05:00
Jeremy Morse	fc9dae420c	[DebugInfo][InstrRef][NFC] "Final" x86 test cleanup These are some final test changes for using instruction referencing on X86: * Most of these tests just have the flag switched so that they run with instr-ref, and just work: these tests were fixed by earlier patches. * There are some spurious differences in textual outputs, * A few have different temporary labels in the output because more MCSymbols are printed to the output. Differential Revision: https://reviews.llvm.org/D114588	2021-11-29 22:56:09 +00:00
Jeremy Morse	1dc0e47cb9	[DebugInfo][NFC] Force some tests to not use instruction-referencing There are various tests that need to be adjusted to test the right thing with instruction referencing -- usually because the internal representation of variables is different, sometimes that location lists change. This patch makes a bunch of tests explicitly not use instruction referencing, so that a check-llvm test with instruction referencing on for x86_64 doesn't fail. I'll then convert the tests to have instr-ref CHECK lines, and similar. Differential Revision: https://reviews.llvm.org/D113194	2021-11-17 11:51:29 +00:00
Simon Pilgrim	d391e4fe84	[X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth. Helps prevent future scheduler model mismatches like those that were only addressed in D44687. Differential Revision: https://reviews.llvm.org/D113302	2021-11-07 15:06:54 +00:00
David Blaikie	f6a561c4d6	DebugInfo: Use clang's preferred names for integer types This reverts `c7f16ab3e3` / r109694 - which suggested this was done to improve consistency with the gdb test suite. Possible that at the time GCC did not canonicalize integer types, and so matching types was important for cross-compiler validity, or that it was only a case of over-constrained test cases that printed out/tested the exact names of integer types. In any case neither issue seems to exist today based on my limited testing - both gdb and lldb canonicalize integer types (in a way that happens to match Clang's preferred naming, incidentally) and so never print the original text name produced in the DWARF by GCC or Clang. This canonicalization appears to be in `integer_types_same_name_p` for GDB and in `TypeSystemClang::GetBasicTypeEnumeration` for lldb. (I tested this with one translation unit defining 3 variables - `long`, `long ()()`, and `int ()()`, and another translation unit that had main, and a function that took `long ()()` as a parameter - then compiled them with mismatched compilers (either GCC+Clang, or Clang+(Clang with this patch applied)) and no matter the combination, despite the debug info for one CU naming the type "long int" and the other naming it "long", both debuggers printed out the name as "long" and were able to correctly perform overload resolution and pass the `long int ()()` variable to the `long (*)()` function parameter) Did find one hiccup, identified by the lldb test suite - that CodeView was relying on these names to map them to builtin types in that format. So added some handling for that in LLVM. (these could be split out into separate patches, but seems small enough to not warrant it - will do that if there ends up needing any reverti/revisiting) Differential Revision: https://reviews.llvm.org/D110455	2021-10-06 16:02:34 -07:00
Arthur Eubanks	05392466f0	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 13:29:23 -07:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Arthur Eubanks	8d64314ffe	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 11:03:51 -07:00
Arthur Eubanks	72cf8b6044	Revert "[IR] Increase max alignment to 4GB" This reverts commit `df84c1fe78`. Breaks some bots	2021-10-06 10:21:35 -07:00
Arthur Eubanks	df84c1fe78	[IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 09:54:14 -07:00
Nikita Popov	80110aafa0	[Tests] Fix incorrect noalias metadata Mostly this fixes cases where !noalias or !alias.scope were passed a scope rather than a scope list. In some cases I opted to drop the metadata entirely instead, because it is not really relevant to the test.	2021-09-18 20:51:00 +02:00
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
Roman Lebedev	564d85e090	The maximal representable alignment in LLVM IR is 1GiB, not 512MiB In LLVM IR, `AlignmentBitfieldElementT` is 5-bit wide But that means that the maximal alignment exponent is `(1<<5)-2`, which is `30`, not `29`. And indeed, alignment of `1073741824` roundtrips IR serialization-deserialization. While this doesn't seem all that important, this doubles the maximal supported alignment from 512MiB to 1GiB, and there's actually one noticeable use-case for that; On X86, the huge pages can have sizes of 2MiB and 1GiB (!). So while this doesn't add support for truly huge alignments, which i think we can easily-ish do if wanted, i think this adds zero-cost support for a not-trivially-dismissable case. I don't believe we need any upgrade infrastructure, and since we don't explicitly record the IR version, we don't need to bump one either. As @craig.topper speculates in D108661#2963519, this might be an artificial limit imposed by the original implementation of the `getAlignment()` functions. Differential Revision: https://reviews.llvm.org/D108661	2021-08-26 12:53:39 +03:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00

1 2 3 4 5 ...

577 Commits