llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	a9e9dd9a3a	[AArch64] Add bf16 select handling A bfloat select operation will currently crash, but is allowed from C. This adds handling for the operation, turning it into a FCSELHrrr if fullfp16 is present, or converting it to a FCSELSrrr if not. The FCSELSrrr is created via using INSERT_SUBREG/EXTRACT_SUBREG to convert the bf16 to a f32 and using the f32 pattern for FCSELSrrr. (I originally attempted to do this via a tablegen pattern, but it appears that the nzcv glue is places onto the wrong node, causing it to be forgotten and incorrect scheduling to be emitted). The FCSELSrrr can also be used for fp16 selects when +fullfp16 is not present, which helps avoid an unnecessary promotion to f32. Differential Revision: https://reviews.llvm.org/D131253	2022-08-11 14:20:36 +01:00
Simon Pilgrim	5dcf0c342b	[X86] lowerShuffleWithVPMOV - remove oneuse constraints on shuffle(trunc(x),undef) -> vpmov(x) lowering These were added in rG057bdd63 but shuffle combining has gotten a lot better at folding different vector widths since then.	2022-08-11 14:06:42 +01:00
David Stuttard	1d1cc05539	AMDGPU: mbcnt allow for non-zero src1 for known-bits Src1 for mbcnt can be a non-zero literal or register. Take this into account when calculating known bits. Differential Revision: https://reviews.llvm.org/D131478	2022-08-11 13:23:43 +01:00
Yeting Kuo	875694089d	[RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics. The patch uses peephole method to fold merge.vvm and unmasked intrinsics to masked intrinsics. Using peephole intead of tablegen patterns is to avoid large auto gnerated code. Note: The patch ignores segment loads since I don't know how to test them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130442	2022-08-11 17:58:11 +08:00
Jonas Hahnfeld	940733d6a0	[RISCV] Re-enable JIT support Commit `8922adf646` recently made JITTargetMachineBuilder honor the hasJIT property of the target. LLVM supports just-in-time compilation on RISC-V, so set the flag. Differential Revision: https://reviews.llvm.org/D131617	2022-08-11 11:41:02 +02:00
Fangrui Song	96850003d2	[PowerPC] Change a double Log2 for localentry to integral Log2. NFC	2022-08-10 23:45:13 -07:00
aqjune	02e56e2533	[CodeGen] Generate efficient assembly for freeze(poison) version of `mm_cast` intel intrinsics This patch makes the variants of `mm_cast` intel intrinsics that use `shufflevector(freeze(poison), ..)` emit efficient assembly. (These intrinsics are planned to use `shufflevector(freeze(poison), ..)` after shufflevector's semantics update; relevant thread: D103874) To do so, this patch 1. Updates `LowerAVXCONCAT_VECTORS` in X86ISelLowering.cpp to recognize `FREEZE(UNDEF)` operand of `CONCAT_VECTOR` in addition to `UNDEF` 2. Updates X86InstrVecCompiler.td to recognize `insert_subvector` of `FREEZE(UNDEF)` vector as its first operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130339	2022-08-11 13:36:21 +09:00
wanglei	c437412fbc	[LoongArch] Override TargetLowering::isOffsetFoldingLegal() This patch disable GlobalAddress+Offset folding. Differential Revision: https://reviews.llvm.org/D131491	2022-08-11 11:26:54 +08:00
jacquesguan	21bf59c92a	[RISCV] Add cost model for mask vector extend and truncate instruction. As extending from or truncating to mask vector do not use the same instructions as the normal cast, this path changed it to 2 which is the number of instructions we used. Differential Revision: https://reviews.llvm.org/D131552	2022-08-11 10:55:43 +08:00
WANG Xuerui	ed078c48f0	[LoongArch] Add insn aliases `jr` and `ret` Differential Revision: https://reviews.llvm.org/D131512	2022-08-11 10:02:45 +08:00
WANG Xuerui	326f7aed38	[LoongArch] Add codegen support for bitreverse Differential Revision: https://reviews.llvm.org/D131378	2022-08-11 08:55:14 +08:00
Evgenii Stepanov	8ea1cf3111	Revert "[AMDGPU] SIFixSGPRCopies refactoring" Breaks ASan tests. This reverts commit `3f8ae7efa8`.	2022-08-10 11:32:46 -07:00
Venkata Ramanaiah Nalamothu	486594119d	[AMDGPU] Fix prologue/epilogue markers in .debug_line table for trivial functions All the prologue instructions should have unknown source location co-ordinates while the epilogue instructions should have source location of last non-debug instruction after which epilogue instructions are insrted. This ensures the prologue/epilogue markers are generated correctly in the line table. Changes are brought in from the downstream CFI patches. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D131485	2022-08-10 23:00:19 +05:30
Umesh Kalappa	9757f4f2dd	[PowerPC] Don't use the S30 and S31 regs for the pic code These changes to address issue https://github.com/llvm/llvm-project/issues/55857. Since R30/S30 is used as pointer (32 bits) for GOT Table in the ppc32 ABI, remove it from the SPE callee save register when PIC is enabled. This prevents emitting the SPE load and store for S30 and S31 regs. Differential revision: https://reviews.llvm.org/D127495	2022-08-10 10:31:27 -05:00
Amaury Séchet	9bceb8981d	[X86] (0 - SetCC) \| C -> (zext (not SetCC)) * (C + 1) - 1 if we can get a LEA out of it. This adresses various regression in D131260 , as well as is a useful optimization in itself. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131358	2022-08-10 15:12:00 +00:00
Justin Hibbits	f43b228581	PowerPC: Don't hoist float multiply + add to fused operation on SPE SPE doesn't have a fmadd instruction, so don't bother hoisting a multiply and add sequence to this, as it'd become just a library call. Hoisting happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".	2022-08-10 11:04:27 -04:00
Simon Pilgrim	b92c7dc211	[X86] Use DAG.getFreeze() to create freeze node. NFC.	2022-08-10 15:03:56 +01:00
David Truby	b1b9c39629	[AArch64][SVE] Use SVE for VLS fcopysign for wide vectors Currently fcopysign for VLS vectors lowers through NEON even when the vector width is wider than a NEON vector, causing bad codegen as the vectors are split. This patch causes SVE to be used for these vectors instead, giving much better codegen on wide VLS vectors. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D128642	2022-08-10 10:17:19 +00:00
Alex Bradbury	47b1f8362a	[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls Prior to this patch, libcalls inserted by the SelectionDAG legalizer could never be tailcalled. The eligibility of libcalls for tail calling is is partly determined by checking TargetLowering::isInTailCallPosition and comparing the return type of the libcall and the calleer. isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly (which always returns false if not implemented by the target). This patch provides a minimal implementation of TargetLowering::isUsedByReturnOnly - enough to support tail calling libcalls on hard float ABIs. Soft-float ABIs are left for a follow on patch. libcall-tail-calls.ll also shows missed opportunities to tail call integer libcalls, but this is due to issues outside of the isUsedByReturnOnly hook. Differential Revision: https://reviews.llvm.org/D131087	2022-08-10 10:50:29 +01:00
Alex Bradbury	7e7860c5d7	[X86][NFCI] Remove target-specific branch optimisation that's handled in BranchFolding This specific optimisation is handled in OptimizeBlock in BranchFolding so is redundant. As discussed on the review thread, I've verified that we have test coverage for that optimisation within test/CodeGen/X86 by disabling the BranchFolding version of this transform after applying this patch and rerunning the test suite. Differential Revision: https://reviews.llvm.org/D129204	2022-08-10 10:35:31 +01:00
Alex Bradbury	104a24ec8b	[WebAssembly] Produce error when encountering unlowerable Wasm global accesses WebAssembly globals are represented as IR globals with the wasm_var address space (AS1). Prior to this patch, a wasm global load that isn't lowerable will produce a failure to select, while a wasm global store will produced incorrect code. This patch ensures we consistently produce a clear error. As noted in the test cases, it's conceivable that a frontend or an optimisation pass could produce similar IR even in the presence of the semantic restrictions on pointers to Wasm globals in the frontend, which is a separate problem to address. Differential Revision: https://reviews.llvm.org/D131387	2022-08-10 10:34:10 +01:00
jacquesguan	b6b1c0d1c4	[RISCV] Add cost model for fp-mask cast op. The cost of convert from or to mask vector is different from other cases. We could not use PowDiff to calculate it. This patch set it to 3 as we use 3 instruction to make it. Differential Revision: https://reviews.llvm.org/D131149	2022-08-10 17:14:37 +08:00
Phoebe Wang	c7ec6e19d5	[X86][BF16] Make backend type bf16 to follow the psABI X86 psABI has updated to support __bf16 type, the ABI of which is the same as FP16. See https://discourse.llvm.org/t/patch-add-optional-bfloat16-support/63149 This is an alternative of D129858, which has less code modification and supports the vector type as well. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D130832	2022-08-10 08:58:56 +08:00
alex-t	3f8ae7efa8	[AMDGPU] SIFixSGPRCopies refactoring This change finalizes the series of patches aiming to replace old strategy of VGPR to SGPR copies loweriong. Following the https://reviews.llvm.org/D128252 and https://reviews.llvm.org/D130367 code parts that are no longer used were removed. Pass main loop is no longer used for the MIR changes but collect information for further analysis. Actual MIR lowering happens further according the analysys result in the set of separate functions. Another important change concerns the order of lowering: VGPR to SGPR copies lowering is done first to have priority on the rest of the MIR changes. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131246	2022-08-10 00:51:57 +02:00
Pengxuan Zheng	9bb6622423	[ARM] Do not use LOAD_STACK_GUARD with ROPI/RWPI ROPI/RWPI are not supported with LOAD_STACK_GUARD currently. Reviewed By: nickdesaulniers, rengolin Differential Revision: https://reviews.llvm.org/D131427	2022-08-09 14:59:08 -07:00
Dinar Temirbulatov	cab6cd6834	[AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding. After D121595 was commited, I noticed regressions assosicated with small trip count numbersvectorisation by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value. Differential Revision: https://reviews.llvm.org/D130755	2022-08-09 22:10:17 +01:00
Archibald Elliott	b20fe2c25b	[docs][AArch64] Label Features with Arm ARM Names This patch adds the names of the Arm Architecture Reference Manual (ARM) features to the corresponding Subtarget Features in the AArch64 backend and target parser. The aim of this is to make it clearer what architectural features a subtarget feature might enable (so, which features a CPU must provide to support that subtarget feature), and so make it easier to add new CPUs in the future. Differential Revision: https://reviews.llvm.org/D131257	2022-08-09 18:45:50 +01:00
Yaxun (Sam) Liu	e780648a15	[AMDGPU] Unify unreachable intrinsics si-annotate-control-flow does depth first traversal of BB's of a function to insert amdgcn if intrinsics for conditional branches so that isel can generate correct instructions later. si-annotate-control-flow checks whether the successor BB for the 'else' branch of a conditional branch has been visited. If it has been visited, si-annotate-control-flow assumes the conditional branch has been handled and will not try to insert if intrinsic for it. This assumption is not correct when the IR contains multiple unreachable BB's. Then 'if' intrinscs are not inserted and incorrect ISA are generated. This patch fixes the issue by let amdgpu-unify-divergent-exit-nodes unify unreachables even if they are uniformly reached. In this way the IR will not contain multiple exits, and structurizer is able to structurize the IR containing one unified exit. Reviewed by: Ruiling Song, Matt Arsenault Differential Revision: https://reviews.llvm.org/D131181 Fixes: SWDEV-343244	2022-08-09 10:23:32 -04:00
Nikita Popov	f5ed0cb217	[RISCV] Add target feature to force-enable atomics This adds a +forced-atomics target feature with the same semantics as +atomics-32 on ARM (D130480). For RISCV targets without the +a extension, this forces LLVM to assume that lock-free atomics (up to 32/64 bits for riscv32/64 respectively) are available. This means that atomic load/store are lowered to a simple load/store (and fence as necessary), as these are guaranteed to be atomic (as long as they're aligned). Atomic RMW/CAS are lowered to __sync (rather than __atomic) libcalls. Responsibility for providing the __sync libcalls lies with the user (for privileged single-core code they can be implemented by disabling interrupts). Code using +forced-atomics and -forced-atomics are not ABI compatible if atomic variables cross the ABI boundary. For context, the difference between __sync and __atomic is that the former are required to be lock-free, while the latter requires a shared global lock provided by a shared object library. See https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed discussion on the topic. This target feature will be used by Rust's riscv32i target family to support the use of atomic load/store without atomic RMW/CAS. Differential Revision: https://reviews.llvm.org/D130621	2022-08-09 16:04:46 +02:00
gonglingqin	cf75ef460c	[LoongArch] Add codegen support for ISD::ROTL and ISD::ROTR Differential Revision: https://reviews.llvm.org/D131231	2022-08-09 19:39:17 +08:00
WANG Xuerui	7d48a9e1ae	[LoongArch] Support register-register-addressed GPR loads/stores Differential Revision: https://reviews.llvm.org/D131380	2022-08-09 19:13:36 +08:00
Alex Richardson	6db15a82cc	[ARM] Use getSymbolPreferLocal() in GetARMGVSymbol This allows relaxing some relocations to STT_SECTION symbol+offset instead of emitting a relocation against a symbol. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131433	2022-08-09 09:53:47 +00:00
Alex Richardson	9a2b14afa0	[ARM] Emit local aliases (.Lfoo$local) for functions ARMAsmPrinter::emitFunctionEntryLabel() was not calling the base class function so the $local alias was not being emitted. This should not have any function effect right now since ARM does not generate different code for the $local symbols, but it could be improved in the future. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131392	2022-08-09 09:53:47 +00:00
gonglingqin	d3580c2eb6	[LoongArch] Add codegen support for not Differential Revision: https://reviews.llvm.org/D131384	2022-08-09 14:05:09 +08:00
wanglei	8716513e65	[LoongArch] Implement branch analysis This allows a number of optimisation passes to work. E.g. BranchFolding and MachineBlockPlacement. Differential Revision: https://reviews.llvm.org/D131316	2022-08-09 14:03:09 +08:00
WANG Xuerui	f35cb7ba34	[LoongArch] Add codegen support for bswap Differential Revision: https://reviews.llvm.org/D131352	2022-08-09 13:42:03 +08:00
Luo, Yuanke	aaf6c7b05c	[globalisel] Select register bank for DBG_VALUE The register operand of DBG_VALUE is not selected to a proper register bank in both AArch64 and X86. This would cause getRegClass crash after global ISel. After discussion, we think the MIR should assume all vritual register should be set proper register class after global ISel, so this patch is to fix the gap of DBG_VALUE for AArch64 and X86. Differential Revision: https://reviews.llvm.org/D129037	2022-08-09 13:11:51 +08:00
Chen Zheng	22e475f5ac	[NFC] fix warning	2022-08-09 00:22:01 -04:00
Yuta Mukai	3f561996bf	[AArch64] Fix and add A64FX scheduling resource/latency info 1. Missing instruction information (FTSSEL, FMSB, PFIRST and RDFFR) is added and CompleteModel is set to one. 2. Information for pseudo SVE instructions is added. Those instructions are present at the time of scheduling. 3. Resource and latency information for SVE instructions is modified to be more accurate. For example, the description for CMPEQ, which consumes one cycle each of unit FLA and PPR, is as follows. ``` Previous: def A64FXGI01 : ProcResGroup<[A64FXIPFLA, A64FXIPPR]>; def A64FXWrite_4Cyc_GI01 : SchedWriteRes<[A64FXGI01]> {... Modified: def A64FXGI0 : ProcResGroup<[A64FXIPFLA]>; def A64FXGI1 : ProcResGroup<[A64FXIPPR]>; def A64FXWrite_CMP : SchedWriteRes<[A64FXGI0, A64FXGI1]> {... ``` Reference: A64FX Microarchitecture Manual (Table 16-3) https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.7.pdf Reviewed By: dmgreen, kawashima-fj Differential Revision: https://reviews.llvm.org/D131165	2022-08-09 10:53:40 +09:00
Chen Zheng	d9004dfbab	[PowerPC] mapping hardward loop intrinsics to powerpc pseudo Map hardware loop intrinsics loop_decrement and set_loop_iteration to the new PowerPC pseudo instructions, so that the hardware loop intrinsics will be expanded to normal cmp+branch form or ctrloop form based on the CTR register usage on MIR level. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D123366	2022-08-08 21:34:20 -04:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Craig Topper	e2bfbed2bb	[RISCV] Add ReadFStoreData as a SchedRead. The floating point stores use a different register class, it probably makes sense to have a different SchedRead. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D131379	2022-08-08 09:33:19 -07:00
Simon Pilgrim	9ea54ac9ce	[X86] X86ISelDAGToDAG.cpp - use auto for all values derived from cast/dyn_cast (style). NFC.	2022-08-08 14:35:06 +01:00
Simon Tatham	72017e9b16	[llvm-objdump,ARM] Fix big-endian AArch32 disassembly. The ABI for big-endian AArch32, as specified by AAELF32, is above- averagely complicated. Relocatable object files are expected to store instruction encodings in byte order matching the ELF file's endianness (so, big-endian for a BE ELF file). But executable images can //either// do that //or// store instructions little-endian regardless of data and ELF endianness (to support BE32 and BE8 platforms respectively). They signal the latter by setting the EF_ARM_BE8 flag in the ELF header. (In the case of the Thumb instruction set, this all means that each 16-bit halfword of a Thumb instruction is stored in one or other endianness. The two halfwords of a 32-bit Thumb instruction must appear in the same order no matter what, because the first halfword is the one that must avoid overlapping the encoding of any 16-bit Thumb instruction.) llvm-objdump was unconditionally expecting Arm instructions to be stored little-endian. So it would correctly disassemble a BE8 image, but if you gave it a BE32 image or a BE object file, it would retrieve every instruction in byte-swapped form and disassemble it to nonsense. (Even an object file output by LLVM itself, because ARMMCCodeEmitter outputs instructions big-endian in big-endian mode, which is correct for writing an object file.) This patch allows llvm-objdump to correctly disassemble all three of those classes of Arm ELF file. It does it by introducing a new SubtargetFeature for big-endian instructions, setting it from the ELF image type and flags during llvm-objdump setup, and teaching both ARMDisassembler and llvm-objdump itself to pay attention to it when retrieving instruction data from a section being disassembled. Differential Revision: https://reviews.llvm.org/D130902	2022-08-08 10:49:51 +01:00
Cullen Rhodes	a6dec9f5b2	[AArch64][SVE] Add patterns to select masked FP arith Add patterns to select predicated instructions when lowering: fadd(a, select(mask, b, splat(0))) fsub(a, select(mask, b, splat(0))) 'fadd' is unsafe unless no-signed zeros fast-math flag is set, since -0.0 + 0.0 = 0.0 changes the sign. Alive2: https://alive2.llvm.org/ce/z/wbhJh_ Also adds FMA patterns for: fadd(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c) fsub(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c) These patterns require the 'contract' fast-math flag to be set, and the fadd 'nsz' as above. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130564	2022-08-08 08:44:13 +00:00
Kazu Hirata	e20d210eef	[llvm] Qualify auto (NFC) Identified with readability-qualified-auto.	2022-08-07 23:55:27 -07:00
wanglei	0c2b738f8f	[LoongArch] Support for varargs This patch ensures the `$fp` always points to the bottom of the vararg spill region. Includes support for expand `ISD::DYNAMIC_STACKALLOC`. Differential Revision: https://reviews.llvm.org/D130250	2022-08-08 14:01:24 +08:00
Sheng	64d326c33c	[M68k] Add MC support for link/unlk Reviewers: myhsu Differential Revision: https://reviews.llvm.org/D125444	2022-08-08 11:00:11 +08:00
Kazu Hirata	ba0407ba86	[llvm] Use range-based for loops (NFC)	2022-08-07 00:16:21 -07:00
Kazu Hirata	54199d805a	[x86] Remove unused declaration processWaitCnt (NFC) The declaration was introduced without a corresponding definition on Jan 2, 2022 in commit `85e6e748d4`.	2022-08-07 00:16:19 -07:00
Kazu Hirata	d0ec61c9ff	[Target] Remove unused forward declarations (NFC)	2022-08-07 00:16:16 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Krzysztof Parzyszek	2bc390bdd6	[RDF] Use default TargetOperandInfo if not given in constructor All current in-tree users use the default implementation.	2022-08-06 14:32:52 -05:00
Chen Zheng	ef60e44fe8	[PowerPC] fix stack size allocated for float point argument This is for https://github.com/llvm/llvm-project/issues/56469 Allocate 4 bytes for float point arguments on PPC32. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D129558	2022-08-06 08:38:52 -04:00
Markus Böck	f7b73b7e8e	[llvm] Remove uses of deprecated `std::iterator` std::iterator has been deprecated in C++17 and some standard library implementations such as MS STL or libc++ emit deperecation messages when using the class. Since LLVM has now switched to C++17 these will emit warnings on these implementations, or worse, errors in build configurations using -Werror. This patch fixes these issues by replacing them with LLVMs own llvm::iterator_facade_base which offers a superset of functionality of std::iterator. Differential Revision: https://reviews.llvm.org/D131320	2022-08-06 14:07:37 +02:00
Leon Clark	6a275cd53c	Transform illegal intrinsics to V_ILLEGAL Related tasks: - SWDEV-240194 - SWDEV-309417 - SWDEV-334876 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123693	2022-08-06 08:59:00 +01:00
Craig Topper	75c64c7c4e	[RISCV] Don't use li+sh3add for constants that can use lui+add. If we're adding a constant that can't use addi we try a few tricks, one of which is using li+sh3add. We should not do this if lui+add would work. For example adding 8192. Using sh3add prevents folding a sext.w to form addw, thus increasing instruction count.	2022-08-05 12:47:03 -07:00
Philip Reames	9a9848f4b9	[RISCVInsertVSETVLI] Remove an unsound optimization This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem: vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo vsetivli a2, a1, e32, m4, ta, mu With the unsound result being: vsetivli a1, a0, e32, m1, ta, mu vsetivli a2, a0, e32, m4, ta, mu Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively. Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16 After the unsound optimization: a1 = 16 and a2 = 33 This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound. This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own. Differential Revision: https://reviews.llvm.org/D131264	2022-08-05 12:13:08 -07:00
Paul Walker	0533c39a76	[SVE] Expand DUPM patterns to handle all integer vector types. NOTE: i8 vector splats are ignored because the immediate range of DUP already has full coverage. Differential Revision: https://reviews.llvm.org/D131078	2022-08-05 16:00:08 +00:00
Mirko Brkusanin	19bb535ed9	[AMDGPU] Remove unused MIMG tablegen variants There are no AMDGPUSampleVariant versions for _G16, it is treated more like a modifier for derivatives (_D) (also for intrinsics where it is overloaded type instead of part of instrinsic name) so we ended up making more variants for these instruction then we actually needed. 32-bit derivatives need 6 dwords at most, while 16-bit need 4 at most. Using same AMDGPUSampleVariant for both, we ended up creating 2 extra variants per instruction than were necessary. In total this deletes 260 unused tablegen records. Differential Revision: https://reviews.llvm.org/D131252	2022-08-05 15:30:47 +02:00
Dawid Jurczak	1bd31a6898	[NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T Extracted from https://reviews.llvm.org/D129781 and address comment: https://reviews.llvm.org/D129781#3655571 Differential Revision: https://reviews.llvm.org/D130268	2022-08-05 13:35:41 +02:00
Phoebe Wang	2312b747b8	[X86] Move getting module flag into `runOnMachineFunction` to reduce compile-time. NFCI Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D131245	2022-08-05 01:58:17 -07:00
wanglei	57eb77d411	[LoongArch] Implement more of the ABI According to the description of the LoongArch abi documentation, (https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html) the calling convention of LoongArch is almost the same as the RISCV's (except for the vector part), so we borrow the implementation of RISCV. This patch only guarantees the correctness of lp64d, because only the part of lp64d is described in detail in the documentation. Differential Revision: https://reviews.llvm.org/D130249	2022-08-05 15:14:16 +08:00
David Green	38c2366b3f	[AArch64][GlobalISel] Recognise some CCMPri This is a simple addition to emitConditionalComparison, to match CCMP with immediates using getIConstantVRegValWithLookThrough, letting it select the CCMPri variants of the instructions. Differential Revision: https://reviews.llvm.org/D131073	2022-08-05 07:48:42 +01:00
Phoebe Wang	7f648d27a8	Reland "[X86][MC] Always emit `rep` prefix for `bsf`" `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: craig.topper, skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-05 10:22:48 +08:00
Craig Topper	12a1ca9c42	[RISCV] Relax another one use restriction in performSRACombine. When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C) it's possible that the add is used by multiple sras. We should allow the combine if all the SRAs will eventually be updated. After transforming all of the sras, the shls will share a single (sext_inreg (add X, C1), i32). This pattern occurs if an sra with 32 is used as index in multiple GEPs with different scales. The shl from the GEPs will be combined with the sra before we get a chance to match the sra pattern.	2022-08-04 14:32:31 -07:00
Mingming Liu	bc8f2f3649	[AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation. 1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method. 2) This patch also changes a few callsites (VectorCombine.cpp, SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method. 3) This is a split of D128302. Differential Revision: https://reviews.llvm.org/D131114	2022-08-04 12:58:25 -07:00
Craig Topper	a2de12c987	[RISCV] Relax a one use restriction performSRACombine When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) ignore the use count on the (shl X, 32). The sext_inreg after the transform is free. So we're only making 2 new instructions, the add and the shl. So we only need to be concerned with replacing the original sra+add. The original shl can have other uses. This helps if there are multiple different constants being added to the same shl.	2022-08-04 11:25:08 -07:00
Joshua Cranmer	2138c90645	[IR] Move support for dxil::TypedPointerType to LLVM core IR. This allows the construct to be shared between different backends. However, it still remains illegal to use TypedPointerType in LLVM IR--the type is intended to remain an auxiliary type, not a real LLVM type. So no support is provided for LLVM-C, nor bitcode, nor LLVM assembly (besides the bare minimum needed to make Type->dump() work properly). Reviewed By: beanz, nikic, aeubanks Differential Revision: https://reviews.llvm.org/D130592	2022-08-04 10:41:11 -04:00
jacquesguan	b61cfc91ea	[RISCV] Add cost modelling for vector widenning reduction. In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction. So as the vector float widenning reduction. Differential Revision: https://reviews.llvm.org/D129994	2022-08-04 15:31:31 +08:00
Phoebe Wang	6f867f9102	[X86] Support ``-mindirect-branch-cs-prefix`` for call and jmp to indirect thunk This is to address feature request from https://github.com/ClangBuiltLinux/linux/issues/1665 Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D130754	2022-08-04 15:12:15 +08:00
Thomas Lively	b19de814ad	[WebAssembly] Improve codegen for v128.bitselect Add patterns selecting ((v1 ^ v2) & c) ^ v2 and ((v1 ^ v2) & ~c) ^ v2 to v128.bitselect. Resolves #56827. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D131131	2022-08-03 23:28:37 -07:00
Craig Topper	91e8079cd5	[X86] Teach PostprocessISelDAG to fold ANDrm+TESTrr when chain result is used. The isOnlyUserOf prevented the fold if the chain result had any users. What we really care about is the the data result from the AND is only used by the TEST, and the flags results from the ANDs aren't used at all. It's ok if the chain has users, we just need to replace those users with the chain from the TESTrm. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D131117	2022-08-03 21:00:22 -07:00
Craig Topper	53d560b22f	[RISCV] Prevent infinite loop after D129980. D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop. To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation. Fixes PR56905. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131113	2022-08-03 15:19:07 -07:00
Craig Topper	84e9194828	Revert "[X86][MC] Always emit `rep` prefix for `bsf`" This reverts commit `c2066d19cd`. It's causing failures on the build bots.	2022-08-03 14:51:34 -07:00
Chris Bieneman	ce0bb316eb	[DX] [NFC] Move hasSection check up Juming out earlier if the global doesn't have a section is just a cleaner early out.	2022-08-03 15:54:53 -05:00
Craig Topper	ff91b2d9df	[X86] Promote i16 CTTZ/CTTZ_ZERO_UNDEF always. If we're going to emit a rep prefix before bsf as proposed in D130956, it makes sense to promote i16 operations to i32 to avoid the false depedency of tzcntw. Reviewed By: skan, pengfei Differential Revision: https://reviews.llvm.org/D130995	2022-08-03 13:12:20 -07:00
David Truby	9a976f3661	[llvm] Always use TargetConstant for FP_ROUND ISD Nodes This patch ensures consistency in the construction of FP_ROUND nodes such that they always use ISD::TargetConstant instead of ISD::Constant. This additionally fixes a bug in the AArch64 SVE backend where patterns were matching against TargetConstant nodes and sometimes failing when passed a Constant node. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D130370	2022-08-03 14:02:11 +01:00
Alex Bradbury	28f12a09ae	[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics An unnecessary sext.w is generated when masking the result of the riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed. Although this isn't a particularly important optimisation, removing the sext.w simplifies implementation of an additional cmpxchg-related optimisation in D130192. Although I can't produce a test with different codegen for the other atomics intrinsics, these are added as well for completeness. Differential Revision: https://reviews.llvm.org/D130191	2022-08-03 13:41:58 +01:00
Dmitry Preobrazhensky	05b3aadfff	[AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16 Enable SGPRs for the following operands of these opcodes: - src operands of VOP3 variant. - src2 operand of DPP variants. Differential Revision: https://reviews.llvm.org/D130989	2022-08-03 15:08:23 +03:00
Dmitry Preobrazhensky	ae553f9e49	[AMDGPU][MC][GFX10] Correct encoding of VOP3 v_cmpx* opcodes Encode dst=EXEC but allow disassembler accept any dst value. Differential Revision: https://reviews.llvm.org/D130978	2022-08-03 15:03:44 +03:00
Fraser Cormack	646e2f4803	[VP] Rename VP int<->float conversion ISD opcodes These should be named like the non-VP versions for consistency. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130967	2022-08-03 10:04:38 +01:00
Phoebe Wang	c2066d19cd	[X86][MC] Always emit `rep` prefix for `bsf` `BMI` new instruction `tzcnt` has better performance than `bsf` on new processors. Its encoding has a mandatory prefix '0xf3' compared to `bsf`. If we force emit `rep` prefix for `bsf`, we will gain better performance when the same code run on new processors. GCC has already done this way: https://c.godbolt.org/z/6xere6fs1 Fixes #34191 Reviewed By: skan Differential Revision: https://reviews.llvm.org/D130956	2022-08-03 17:09:36 +08:00
Liu, Chen3	5bbb0a831f	[X86] Using `X86MemOperand` instead of `Operand` for `i32mem_TC` and `i64mem_TC` To fix build fail when X86_GEN_FOLD_TABLES is enabled. Differential Revision: https://reviews.llvm.org/D131049	2022-08-03 16:17:51 +08:00
Nikita Popov	b128e057c1	[AA] Make ModRefInfo a bitmask enum (NFC) Mark ModRefInfo as a bitmask enum, which allows using normal & and \| operators on it. This supersedes various functions like unionModRef() and intersectModRef(). I think this makes the code cleaner than going through helper functions... Differential Revision: https://reviews.llvm.org/D130870	2022-08-03 10:05:55 +02:00
Craig Topper	f19497f7b0	[RISCV] Use InstVisitor in RISCVCodeGenPrepare. NFC Makes it easy to add new instructions to look at without dispatching manually.	2022-08-02 21:19:30 -07:00
Paul Kirth	d434e40f39	[llvm][NFC] Refactor code to use ProfDataUtils In this patch we replace common code patterns with the use of utility functions for dealing with profiling metadata. There should be no change in functionality, as the existing checks should be preserved in all cases. Reviewed By: bogner, davidxl Differential Revision: https://reviews.llvm.org/D128860	2022-08-03 00:09:45 +00:00
Austin Kerbow	3dfa562643	[AMDGPU] Add CL option for max-ilp scheduler. When compiling for multiple targets the scheduler that is selected via the -misched option is applied globally. This patch adds a target CL option instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D131022	2022-08-02 16:52:14 -07:00
Craig Topper	a5605f1f68	[RISCV] Fix operand number in debug message in RISCVMergeBaseOffset. This used to print from the ADDI where the operand number was correct. It recently changed to print from the LUI or AUIPC which needs to use operand 1 instead of 2. This shows up as a crash with -debug.	2022-08-02 15:27:23 -07:00
Austin Kerbow	40eec27618	[AMDGPU] Add llvm_unreachable to switch statement added in `d7100b398`.	2022-08-02 13:45:38 -07:00
Austin Kerbow	d7100b398b	[AMDGPU] Add GCNMaxILPSchedStrategy Creates a new scheduling strategy that attempts to maximize ILP for a single wave. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130869	2022-08-02 13:21:24 -07:00
Xiang Li	20f7f9b709	[NFC][DirectX backend] Fix crash when emit_obj for DirectX backend. When emit-obj from clang directly, DirectX backend will hit assert caused by not initialize passes for AsmPrinter. The fix will initialize the passes by calling createPassConfig. Also ignore global variable which not has section in DXILAsmPrinter::emitGlobalVariable to avoid hit llvm_unreachable in DXILTargetObjectFile::SelectSectionForGlobal. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D130856	2022-08-02 12:09:07 -07:00
Vladislav Dzhidzhoev	71aecbb75c	[AArch64] Treat x18 as callee-saved in functions with Windows calling convention on Darwin rGcf97e0ec42b8 makes $x18 to be treated as callee-saved in functions with Windows calling convention on non-Windows OSes. Here we mark $x18 as callee-saved for functions with Windows calling convention on Darwin, as well as on other non-Windows platforms, in order to prevent some miscompilations (like miscompilation of win64cc-darwin-backup-x18.ll). Since getCalleeSavedRegs doesn't return x18 in list of callee-saved registers, assignCalleeSavedSpillSlots and determineCalleeSaves consider different sets of registers as callee-saved. It causes an error: ``` Assertion failed: ((!HasCalleeSavedStackSize \|\| getCalleeSavedStackSize() == Size) && "Invalid size calculated for callee saves"), function getCalleeSavedStackSize, file AArch64MachineFunctionInfo.h, line 292. ``` Differential Revision: https://reviews.llvm.org/D130676	2022-08-02 20:33:42 +03:00
Guozhi Wei	85a6dd50ad	[MIPS] Expose the ZERO register as a constant physical register The ZERO register should be exposed as a constant physical register through the interface TargetRegisterInfo::isConstantPhysReg. Differential Revision: https://reviews.llvm.org/D130932	2022-08-02 17:04:52 +00:00
Craig Topper	ae6877836e	[RISCV] Add scheduler classes to PseudoVMV*R_V. I think these pseudos will exist when the post-RA scheduler runs so they should have sched classes. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D130945	2022-08-02 09:38:32 -07:00
Craig Topper	2e5c516a3d	[RISCV] Add scheduler class to PseudoReadVLENB. Reviewed By: monkchiang Differential Revision: https://reviews.llvm.org/D130938	2022-08-02 09:38:32 -07:00
Alexander Timofeev	a321d95b59	[AMDGPU] avoid blind converting to VALU REG_SEQUENCE and PHIs In the `2e29b0138c` we introduce a specific solving algorithm that analyzes the VGPR to SGPR copies use chains and either lowers the copy to v_readfirstlane_b32 or converts the whole chain to VALU forms. Same time we still have the code that blindly converts to VALU REG_SEQUENCE and PHIs in case they produce SGPR but have VGPRs input operands. In case the REG_SEQUENCE and PHIs are in the VGPR to SGPR copy use chain, and this chain was considered long enough to convert copy to v_readfistlane_b32, further lowering them to VALU leads to several kinds of issues. At first, we have v_readfistlane_b32 which is completely useless because most parts of its use chain were moved to VALU forms. Second, we may encounter subtle bugs related to the EXEC-dependent CF because of the weird mixing of SALU and VALU instructions. This change removes the code that moves REG_SEQUENCE and PHIs to VALU. Instead, we use the fact that both REG_SEQUENCE and PHIs have copy semantics. That is, if they define SGPR but have VGPR inputs, we insert VGPR to SGPR copies to make them pure SGPR. Then, the new copies are processed by the common VGPR to SGPR lowering algorithm. This is Part 2 in the series of commits aiming at the massive refactoring of the SIFixSGPRCopies pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D130367	2022-08-02 18:37:57 +02:00
Jay Foad	e301e071ba	[AMDGPU] Remove IR SpeculativeExecution pass from codegen pipeline This pass seems to have very little effect because all it does is hoist some instructions, but it is followed later in the codegen pipeline by the IR CodeSinking pass which does the opposite. Differential Revision: https://reviews.llvm.org/D130258	2022-08-02 17:35:20 +01:00
Jay Foad	c24d68fff1	[AMDGPU] Take advantage of VOP3 literals in convertToThreeAddress This improves a corner case where v_fmac can be converted to v_fma on GFX10+ even if it has a literal operand. Differential Revision: https://reviews.llvm.org/D130992	2022-08-02 17:27:11 +01:00
Phoebe Wang	23021d4d8c	[X86][FP16] Fix vector_shuffle and lowering without f16c feature problems The problem Alexander reported on D127982 was caused by an optimization for AVX512-FP16 instruction. We must limit it to the feature enabled only. During the investigation, I found we didn't expand for fp_round/fp_extend without F16C. This may result runtime crash, so change them too. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130817	2022-08-02 22:26:41 +08:00

1 2 3 4 5 ...

68457 Commits