llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	2e5f16516a	[CostModel][X86] Add CodeSize handling for fdiv ops Eventually this will be part of the cost table lookup	2022-08-25 14:08:03 +01:00
Matt Devereau	30b045aba6	[AArch64][SVE] Extend LD1RQ ISel patterns to cover missing addressing modes Add some missing patterns for ld1rq's scalar + scalar addressing mode. Also, adds the scalar + imm and scalar + scalar addressing modes for the patterns added in https://reviews.llvm.org/D130010 Differential Revision: https://reviews.llvm.org/D130993	2022-08-25 13:07:37 +00:00
Benjamin Kramer	3ccaabe051	[NVPTX] Lower llvm.roundeven to cvt.rni	2022-08-25 13:36:22 +02:00
Benjamin Kramer	a385abfeb7	[NVPTX] Factor rounding patterns into a multiclass. NFCI.	2022-08-25 13:36:21 +02:00
Simon Pilgrim	45846854a2	[CostModel][X86] Support cost kind specific look up tables Most of our cost model tables have been created assuming cost kind == recip-throughput. But we're starting to see passes wanting to get accurate costs for the other kinds as well. Some of these can be determined procedurally (e.g. codesize by default could just be the split count after type legalization), but others are going to need to be handled in cost tables - this is especially true for x86 which has so many ISA combinations. I've created a 'CostKindCosts' struct which can hold cost values for the 4 cost kinds, defaulting to -1U for unknown cost, this can be used with the existing CostTblEntryT/CostTableLookup template code. I've also added a [TargetCostKind] accessor to make it much easier to look up individual <Optional> costs. This just changes the ISD::SELECT costs to check the effect (and also to check that the ISD::SETCC are correctly handled for default/None cost kinds) - the plan would be to slowly extend this and move the CostKindTblEntry type somewhere generic to allow other targets to use it once its matured. I'm also going to resurrect D103695 so that it can help with latency/codesize/sizelatency coverage testing. For sizelatency - IIRC the definition was vague to let it be target specific - I've tried to use typical uop counts so they're comparable to MicroOpBufferSize etc. Differential Revision: https://reviews.llvm.org/D132216	2022-08-25 12:23:36 +01:00
zhongyunde	3c8f327ce9	[AArch64] Fix sched model for tsv110 Update three changes: 1.Split the Load/Store resources into two, Ld0St and Ld1, since only one of them is capable of stores. 2.Integer ADD and SUB instructions have different latencies and processor resource usage (pipeline) when they have a shift of zero vs. non-zero, refer to D8043 3.The throughout of scalar DIV instruction. Reviewed By: dmgreen, bryanpkc Differential Revision: https://reviews.llvm.org/D132529	2022-08-25 19:20:07 +08:00
Usman Nadeem	46768052e0	[AArch64][DAGCombine] Fix a bug in performBuildVectorCombine where it could produce an invalid EXTRACT_SUBVECTOR EXTRACT_SUBVECTOR requires that Idx be a constant multiple of ResultType's known minimum vector length. Something like this will produce an invalid extract_subvector: t1: v4i16 = ..... t2: i32 = extract_vector_elt t1, Constant:i64<1> t3: i32 = extract_vector_elt t1, Constant:i64<2> t4: v2i32 = BUILD_VECTOR t2, t3 // produces t5: v2i32 = extract_subvector t...., Constant:i64<1> Differential Revision: https://reviews.llvm.org/D132517 Change-Id: I7a5acf054edee3e89c0f85a28d8869256403ce08	2022-08-24 16:24:19 -07:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands `67504c9549` with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Philip Reames	03798f268b	{RISCV] Backout cttz/ctlz instruction costs Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.	2022-08-24 15:40:48 -07:00
Philip Reames	d4d6e71ea2	[RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.	2022-08-24 15:09:21 -07:00
Philip Reames	42af1a776a	[RISCV] Add empirically measured vector sqrt intrinsic costs	2022-08-24 14:27:57 -07:00
Philip Reames	4d3134866f	[RISCV] Add vector fabs intrinsic costs We have a fabs vector instruction, and are using it for current lowering.	2022-08-24 14:09:51 -07:00
Ilia Diachkov	f61eb41623	[SPIRV] support builtin functions The patch adds support for OpenCL and SPIR-V built-in functions. Their detection and properties are implemented using TableGen. Five tests are added to demonstrate the improvement. Differential Revision: https://reviews.llvm.org/D132024 Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com> Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com> Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>	2022-08-25 00:30:33 +03:00
Saleem Abdulrasool	8f45b5a7a9	RISCV: permit unaligned nop-slide padding emission We may be requested to emit an unaligned nop sequence (e.g. 7-bytes or 3-bytes). These should be 0-filled even though that is not a valid instruction. This matches the behaviour on other architectures like ARM, X86, and MIPS. When a custom section is emitted, it may be classified as text even though it may be a data section or we may be emitting data into a text segment (e.g. a literal pool). In such cases, we should be resilient to the emission request. This was originally identified by the Linux kernel build and reported on D131270 by Nathan Chancellor. Differential Revision: https://reviews.llvm.org/D132482 Reviewed By: luismarques Tested By: Nathan Chancellor	2022-08-24 20:26:48 +00:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit `67504c9549` as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Michael Liao	dda3878653	[LoongArch] Fix build due to TLI interface changes. NFC. - isCheapToSpeculateCttz/isCheapToSpeculateCtlz have one type operand after https://reviews.llvm.org/D132520	2022-08-24 15:17:38 -04:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
Simon Pilgrim	f9de13232f	[X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis. For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling. Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU. Differential Revision: https://reviews.llvm.org/D132520	2022-08-24 17:28:18 +01:00
Kito Cheng	8e8a62006e	[RISCV][NFC] Minor cleanup in RISCVInstrInfo::getOutliningType The only use of TM is checking result of TargetMachine::getFunctionSections, check that directly instead of introdce a local variable.	2022-08-24 23:42:34 +08:00
Phoebe Wang	12b203ea7c	[X86][FP16] Add the missing legal action for EXTRACT_SUBVECTOR Fixes #57340 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D132563	2022-08-24 23:25:07 +08:00
Simon Pilgrim	3cf48963ff	[AMDGPU] Remove old isCheapToSpeculateCttz FIXME As confirmed on D132520 - this should always return true	2022-08-24 15:53:38 +01:00
Pierre van Houtryve	59cf9dd923	[AMDGPU][GISel] Enable Selection of ADD3 for G_PTR_ADD Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be simplified into a single ADD3 instruction instead of two adds. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D131254	2022-08-24 14:44:19 +00:00
Alex Richardson	38107171ed	[RegisterInfoEmitter] Generate isConstantPhysReg(). NFCI This commit moves the information on whether a register is constant into the Tablegen files to allow generating the implementaiton of isConstantPhysReg(). I've marked isConstantPhysReg() as final in this generated file to ensure that changes are made to tablegen instead of overriding this function, but if that turns out to be too restrictive, we can remove the qualifier. This should be pretty much NFC, but I did notice that e.g. the AMDGPU generated file also includes the LO16/HI16 registers now. The new isConstant flag will also be used by D131958 to ensure that constant registers are marked as call-preserved. Differential Revision: https://reviews.llvm.org/D131962	2022-08-24 14:16:20 +00:00
Kito Cheng	96c85f80f0	[RISCV] Don't outline pcrel-lo operand. This issue is found by build llvm-testsuite with `-Oz`, linker will complain `dangerous relocation: %pcrel_lo missing matching %pcrel_hi` and that turn out cause by we outlined pcrel-lo, but leave pcrel-hi there, that's not problem in general, but the problem is they put into different section, they pcrel-hi and pcrel-lo pair (e.g. AUIPC+ADDI) MUST put be present in same section due to the implementation. Outlined function will put into .text name, but the source functions will put in .text.<function-name> if function-section is enabled or the function has `comdat` attribute. There are few solutions for this issue: 1. Always disallow instructions with pcrel-lo flags. 2. Only disallow instructions with pcrel-lo flags that when function-section is enabled or this function has `comdat` attribute. 3. Check the corresponding instruction with pcrel-high also included in the outlining candidate sequence or not, and allow that only when pcrel-high is included in the outlining candidate. First one is most conservative, that might lose some optimization opportunities, and second one could save those opportunities, and last one is hard to implement, and don't have any benefits since pcrel-high are using different label even accessing same symbol. Use custom section name might also cause this problem, but that already filtered by RISCVInstrInfo::isFunctionSafeToOutlineFrom. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D132528	2022-08-24 21:47:46 +08:00
Hassnaa Hamdi	d8f63382e8	AArch64 SVE Add SVE patterns to make use of predicated smin, umin, smax, and umax instructions, add sve-min-max-pred.ll test file for the new patterns Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D132122	2022-08-24 11:09:22 +00:00
MarkGoncharovAl	8c1f18bd3e	[RISCV] : Add support for immediate operands. llvm-exegesis uses operand type information provided in tablegen files to initialize immediate arguments of the instruction. Some of them simply don't have such information. Thus we should set into relevant immediate operands their specific type. Also create verification methods for them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131771	2022-08-24 17:48:39 +08:00
Dmitry Vassiliev	9174a5e9a8	[NVPTX] SHL.64 $r, 31 cannot be converted to a mulwide.s32 In order to convert to mulwide.s32, we compute the 2nd operand as MulWide.32 $r, (1 << 31). (1 << 31) is interpreted as a negative number, and is not equivalent to the original instruction. The code `int64_t r = (int64_t)a << 31;` incorrectly compiled to `mul.wide.s32 %rd7, %r1, -2147483648;` Reviewed By: jchlanda Differential Revision: https://reviews.llvm.org/D132516	2022-08-24 11:39:41 +02:00
gonglingqin	9046ef6f2f	[LoongArch] Implement TargetLowering::hasAndNot() for more optimization chances Differential Revision: https://reviews.llvm.org/D132282	2022-08-24 17:29:18 +08:00
Jay Foad	1bca81c12e	[AMDGPU] Remove unused S_ADD_U64_CO_PSEUDO and S_SUB_U64_CO_PSEUDO	2022-08-24 10:28:35 +01:00
Alex	07a700f814	[RISCV] Add zihintntl compressed instructions Add zihintntl compressed instructions and some files related to zihintntl. This patch is base on {D121670}. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121779	2022-08-24 14:29:02 +08:00
esmeyi	dfe55cc1cd	[AIX] use the original name as the input to create the new symbol for TLS symbol. Summary: Currently, an error was reported when a thread local symbol has an invalid name. D100956 create a new symbol to prefix the TLS symbol name with a dot. When the symbol name is renamed, the error occurs. This patch uses the original symbol name (name in the symbol table) as the input for the symbol for TOC entry. Reviewed By: shchenz, lkail Differential Revision: https://reviews.llvm.org/D132348	2022-08-24 01:36:40 -04:00
ZHU Zijia	9c85382ade	[RISCV] Handle register spill in branch relaxation In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed to `jump` pseudo-instructions. This patch allocates a stack slot for functions with a size greater than 1MiB. If the register scavenger cannot find a scratch register for `jump`, spill a register to the slot before the jump and restore it after the jump. .mbb: foo j .dest_bb bar bar bar .dest_bb: baz The above code will be relaxed to the following code. .mbb: foo sd s11, 0(sp) jump .restore_bb, s11 bar bar bar j .dest_bb .restore_bb: ld s11, 0(sp) .dest_bb: baz Depends on D129999. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D130560	2022-08-24 13:27:56 +08:00
Simon Pilgrim	9317e6311f	[TTI] Add SK_Splice shuffle mask detection and X86 costs Enables fixed sized vectors to detect SK_Splice shuffle patterns and provides basic X86 cost support Differential Revision: https://reviews.llvm.org/D132374	2022-08-23 20:07:30 +01:00
Stefan Pintilie	e329788bf8	[NFC][PowerPC] Clean up a couple of lambdas from the PPCMIPeephole. There were two sections of code that had a lot of lambdas and in the patch D40554 it was suggested that we clean them up as a follow-up NFC patch. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D132394	2022-08-23 13:09:00 -05:00
Raghav	79d2529c10	AMDGPU/MetaData: Restrict address space key to only be emitted for "global_buffer" and "dynamic_shared_pointer" This matches .address_space docs at https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v3 Differential Revision: https://reviews.llvm.org/D132145	2022-08-23 14:01:01 -04:00
Thomas Symalla	5ee0fb7ed2	[NFC][AMDGPU] Some cleanups in the SIOptimizeExecMasking pass. Fix typos and remove an unused argument. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D132292	2022-08-23 18:16:47 +02:00
Philip Reames	df20ff9ae2	[TTI] Kill last couple uses of OperandValueKind in targets [nfc] Use the accessor methods on the containing class instead so that we can change the representation.	2022-08-23 08:54:41 -07:00
Jakub Kuderski	6fa87ec10f	[ADT] Deprecate is_splat and replace all uses with all_equal See the discussion thread for more details: https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D132335	2022-08-23 11:36:27 -04:00
Michael Liao	796124f004	[SPIRV] Fix the wrong patch from https://reviews.llvm.org/D131886 - The body of that predicate lambda is removed by mistake.	2022-08-23 11:18:35 -04:00
Philip Reames	c9608d57b8	[TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC] This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.	2022-08-23 07:55:42 -07:00
Lucas Prates	d1922c9862	[AArch64] Fix list of features for Cortex-X1C This patch fixes the list of subtarget features enabled for the Cortex-X1C processor, including the following: * Fix incorrect version used for FeatureRCPC: * Use FEAT_LRCPC2 instead of FEAT_LRCPC. * Add missing v8.4-A features included in the TRM: * Flag Manipulation Instructions - FeatureFlagM (FEAT_FlagM) * Large System Extension 2 - FeatureLSE2 (FEAT_LSE2) Reviewed By: vhscampos Differential Revision: https://reviews.llvm.org/D132120	2022-08-23 11:35:23 +01:00
gonglingqin	e9a4b8e397	[LoongArch] Optimize the atomic store with amswap_db.[w/d] When AtomicOrdering is release or stronger, use amswap_db.[w/d] $zero, $a1, $a0 instead of dbar 0 st.[w/d] $a0, $a1, 0 Thanks to @xry111 for the suggestion: https://reviews.llvm.org/D128901#3626635 Differential Revision: https://reviews.llvm.org/D129838	2022-08-23 17:11:57 +08:00
Kazushi (Jam) Marukawa	b88aba9d7d	[VE] Support inlineasm memory operand Support inline asm memory operand for VE. Add regression tests also. Reviewed By: efocht Differential Revision: https://reviews.llvm.org/D132380	2022-08-23 13:44:03 +09:00
liqinweng	9181ab9223	[NFC]] Use llvm::all_of instead of std::all_of Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D131886	2022-08-23 12:21:53 +08:00
Chris Bieneman	eebf84c5b3	[DirectX] Remove broken assert This assert always fails. It is unclear to me what it was attempting to test, but removing it gets our tests passing, so it clearly isn't checking the right thing.	2022-08-22 17:25:16 -05:00
Philip Reames	104fa367ee	[TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC] This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both. This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through. I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.	2022-08-22 15:16:39 -07:00
Philip Reames	478cf94378	[X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc] This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties. This change adds some accessor methods and uses them to simplify backend code. The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.	2022-08-22 13:37:59 -07:00
Philip Reames	5e87a020a5	[X86][TTI] Rename OpNInfo to OpNKind [nfc] Both are reasonable names; this is solely that an upcoming change can use the OpNInfo name, and the compiler can tell me if I forgot to update something (instead of silently passing along properties that might not hold.)	2022-08-22 13:37:59 -07:00
Alan Zhao	8c8cfaaf0a	Revert "[ARM] Use getSymbolPreferLocal() in GetARMGVSymbol" This reverts commit `6db15a82cc`. Reverted because this breaks offical Chrome builds targeting Android on arm: https://crbug.com/1354305 Repro: https://drive.google.com/file/d/1pgQI2adwx3DJJqIYvMY4i249ouHU0rmu/view?usp=sharing	2022-08-22 16:16:37 -04:00
David Penry	ced705c440	[ModuloSchedule] Add interface call to accept/reject SMS schedules This interface allows a target to reject a proposed SMS schedule. For Hexagon/PowerPC, all schedules are accepted, leaving behavior unchanged. For ARM, schedules which exceed register pressure limits are rejected. Also, two RegisterPressureTracker methods now need to be public so that register pressure can be computed by more callers. Reapplication of D128941/(reversion:D132037) with small fix. Differential Revision: https://reviews.llvm.org/D132170	2022-08-22 12:10:13 -07:00

1 2 3 4 5 ...

68575 Commits