llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Roelofs	6611fbc62a	[AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC	2021-07-12 15:37:11 -07:00
Eli Friedman	6c04b7dd4f	[AArch64] Optimize overflow checks for [s\|u]mul.with.overflow.i32. Saves one instruction for signed, uses a cheaper instruction for unsigned. Differential Revision: https://reviews.llvm.org/D105770	2021-07-12 15:30:42 -07:00
Amy Kwan	35909ff6cf	[PowerPC] Fix the splat immediate in PPCMIPeephole depending on if we have an Altivec and VSX splat instruction. An assertion of the following can occur because Altivec and VSX splats use a different operand number for the immediate: ``` int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed. ``` This patch updates PPCMIPeephole.cpp assign the correct splat immediate. Differential Revision: https://reviews.llvm.org/D105790	2021-07-12 16:20:11 -05:00
Wouter van Oortmerssen	1689d14ed1	[WebAssembly] fix typo in range check for Asm locals	2021-07-12 13:07:11 -07:00
Simon Pilgrim	ae0d73ac3b	[CostModel][X86] Adjust fptosi/fptoui SSE/AVX legalized costs based on llvm-mca reports. Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-12 20:38:25 +01:00
Thomas Johnson	6b3eba7c28	[ARC] Add disassembly for the conditioned move immediate instruction This change is a step towards implementing codegen for __builtin_clz(). Full support for CLZ with a regression test will follow shortly. Differential Revision: https://reviews.llvm.org/D105560	2021-07-12 12:35:56 -07:00
Jinsong Ji	2377eca93c	[PowerPC] Custom Lowering BUILD_VECTOR for v2i64 for P7 as well The lowering for v2i64 is now guarded with hasDirectMove, however, the current lowering can handle the pattern correctly, only lowering it when there is efficient patterns and corresponding instructions. The original guard was added in D21135, and was for Legal action. The code has evloved now, this guard is not necessary anymore. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D105596	2021-07-12 17:56:10 +00:00
Thomas Lively	cbabfc63b1	[WebAssembly] Custom combines for f32x4.demote_zero_f64x2 Replace the clang builtin function and LLVM intrinsic for f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern. Differential Revision: https://reviews.llvm.org/D105755	2021-07-12 10:32:18 -07:00
Craig Topper	d5c97f4bf0	[X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack. There are some calls to functions like `__alloca` that are missing a regmask operand. Lack of a regmask operand means that all registers that aren't mentioned by def operands are preserved. __alloca only updates EAX and ESP and has def operands for them so this is ok. Because there is no regmask the register allocator won't spill the FP registers across the call. Assuming we want to keep the FP stack untoched across these calls, we need to handle this is in the FP stackifier. We might want to add a proper regmask operand to the code that creates these calls to indicate all registers are preserved, but we'd still need this change to the FP stackifier to know to preserve the FP stack for such a regmask. The test is kind of long, but bugpoint wasn't able to reduce it any further. Fixes PR50782 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D105762	2021-07-12 10:15:38 -07:00
Albion Fung	ef49d925e2	[PowerPC] Implement trap and conversion builtins for XL compatibility This patch implements trap and FP to and from double conversions. The builtins generate code that mirror what is generated from the XL compiler. Intrinsics are named conventionally with builtin_ppc, but are aliased to provide the same builtin names as the XL compiler. Differential Revision: https://reviews.llvm.org/D103668	2021-07-12 11:04:17 -05:00
Benjamin Kramer	0da3573a9e	[AArch64] Silence unused variable warning. NFC. AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable] auto OpCode = N->getOpcode(); ^	2021-07-12 16:01:11 +02:00
Cullen Rhodes	9e42675103	[AArch64] Add target features for Armv9-A Scalable Matrix Extension (SME) First patch in a series adding MC layer support for the Arm Scalable Matrix Extension. This patch adds the following features: sme, sme-i64, sme-f64 The sme-i64 and sme-f64 flags are for the optional I16I64 and F64F64 features. If a target supports I16I64 then the following instructions are implemented: * 64-bit integer ADDHA and ADDVA variants (D105570). * SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, and USMOPS instructions that accumulate 16-bit integer outer products into 64-bit integer tiles. If a target supports F64F64 then the FMOPA and FMOPS instructions that accumulate double-precision floating-point outer products into double-precision tiles are implemented. Outer products are implemented in D105571. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06 Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D105569	2021-07-12 13:28:10 +00:00
Michael Liao	8253fa2298	Fix warning '-Wparentheses'. NFC.	2021-07-12 09:25:30 -04:00
Jonas Paulsson	96421af5f8	[SystemZ] Bugfix for the 'N' code for inline asm operand. Don't use a local MachineOperand copy in SystemZAsmPrinter::PrintAsmOperand() and change the register as it may break the MRI tracking of register uses. Use an MCOperand instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D105757	2021-07-12 15:04:08 +02:00
Simon Pilgrim	96b4117d51	[CostModel][X86] Adjust truncate SSE/AVX legalized costs based on llvm-mca reports. Update truncation costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-12 13:50:43 +01:00
David Green	f73334c46d	[AArch64] Set the latency of Cortex-A55 stores to 1 This sets the latency of stores to 1 in the Cortex-A55 scheduling model, to better match the values given in the software optimization guide. The latency of a store in normal llvm scheduling does not appear to have a lot of uses. If the store has no outputs then the latency is somewhat meaningless (and pre/post increment update operands use the WriteAdr write for those operands instead). The one place it does alter things is the latency between a store and the end of the scheduling region, which can in turn have an effect on the critical path length. As a result a latency of 1 is more correct and offers ever-so-slightly better scheduling of instructions near the end of the block. They are marked as RetireOOO to keep the llvm-mca from introducing stalls where non would exist. Differential Revision: https://reviews.llvm.org/D105541	2021-07-12 13:39:35 +01:00
David Truby	c305557acd	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-12 11:14:17 +01:00
Simon Pilgrim	e4aa6ad132	[X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result Annoyingly, i686 cmpsd handling still fails to remove the unnecessary neg(and(x,1))	2021-07-12 09:56:59 +01:00
Fangrui Song	57503524b1	[AArch64] De-capitalize some Emit* functions AsmParser/AsmPrinter/Streamer are mostly consistent on emit* functions now.	2021-07-11 22:05:39 -07:00
Daniel Egger	98c2e4115d	[ARM] Add lowering of uadd_sat to uq{add\|sub}8 and uq{add\|sub}16 This follow the lead of https://reviews.llvm.org/D68974 to add lowering of unsigned saturated addition/subtraction. Differential Revision: https://reviews.llvm.org/D105413	2021-07-11 15:58:11 +01:00
Amara Emerson	97c426394a	[AArch64][GlobalISel] Implement moreElements legalization for G_SHUFFLE_VECTOR. Differential Revision: https://reviews.llvm.org/D103301	2021-07-10 00:25:26 -07:00
Amara Emerson	58a2cb5143	[GlobalISel] Add a new artifact combiner for unmerge which looks through general artifact expressions. The original motivation for this was to implement moreElementsVector of shuffles on AArch64, which resulted in complex sequences of artifacts like unmerge(unmerge(concat...)) which the combiner couldn't handle. It seemed here that the better option, instead of writing ever-more-complex combines, was to have a way to find the original "non-artifact" source registers for a given definition, walking through arbitrary expressions of unmerge/concat/insert. As long as the bits aren't extended or truncated, this is a pretty simple algorithm that avoids the need for lots of combines and instead jumps straight to the final result we want. I've only used this new technique in 2 places within tryCombineUnmerge, using it in more general situations resulted in infinite loops in AMDGPU. So for now it's used when we would otherwise fail to combine and that seems to work. In order to support looking through G_INSERTs, I also had to add it as an artifact in isArtifact(), which caused a whole lot of issues in tests. AMDGPU started infinite looping since full legalization of G_INSERT doensn't seem to be there. To work around this, I've temporarily added a CLI option to use the old behaviour so that the MIR tests will still run and terminate. Other minor changes include no longer making >128b G_MERGE/UNMERGE legal. We never had isel support for that anyway and it was a remnant of the legacy legalizer rules. However being legal prevented the combiner from checking if it was dead and deleting them. Differential Revision: https://reviews.llvm.org/D104355	2021-07-09 22:35:00 -07:00
Thomas Lively	e5220104d0	[WebAssembly] Custom combines for f64x2.promote_low_f32x4 Replace the clang builtin function and LLVM intrinsic previously used to select the f64x2.promote_low_f32x4 instruction with custom combines from standard SelectionDAG nodes. Implement the new combines to share code with the similar combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232. Differential Revision: https://reviews.llvm.org/D105675	2021-07-09 18:59:29 -07:00
Derek Schuff	ac02baab48	WebAssembly: Update datalayout to match fp128 ABI change This fix goes along with `d1a96e906c` and makes the fp128 alignment match clang's long double alignment. Differential Revision: https://reviews.llvm.org/D105749	2021-07-09 16:51:36 -07:00
Kazu Hirata	5f306feb4d	[WebAssembly] Fix warnings	2021-07-09 16:40:01 -07:00
Wouter van Oortmerssen	f3e6c3f327	[WebAssembly] Fixed 2 warnings in Asm Type Checker	2021-07-09 14:38:52 -07:00
Wouter van Oortmerssen	9647a6f719	[WebAssembly] Added initial type checker to MC Assembler This to protect against non-sensical instruction sequences being assembled, which would either cause asserts/crashes further down, or a Wasm module being output that doesn't validate. Unlike a validator, this type checker is able to give type-errors as part of the parsing process, which makes the assembler much friendlier to be used by humans writing manual input. Because the MC system is single pass (instructions aren't even stored in MC format, they are directly output) the type checker has to be single pass as well, which means that from now on .globaltype and .functype decls must come before their use. An extra pass is added to Codegen to collect information for this purpose, since AsmPrinter is normally single pass / streaming as well, and would otherwise generate this information on the fly. A `-no-type-check` flag was added to llvm-mc (and any other tools that take asm input) that surpresses type errors, as a quick escape hatch for tests that were not intended to be type correct. This is a first version of the type checker that ignores control flow, i.e. it checks that types are correct along the linear path, but not the branch path. This will still catch most errors. Branch checking could be added in the future. Differential Revision: https://reviews.llvm.org/D104945	2021-07-09 14:07:25 -07:00
Stanislav Mekhanoshin	4a3b055653	[AMDGPU] Fix flags of V_MOV_B64_PSEUDO In particular it was not rematerializable. Differential Revision: https://reviews.llvm.org/D105724	2021-07-09 12:49:28 -07:00
Graham Yiu	ecd15fbf6b	[ARC][NFC] Include file re-ordering - Sort includes in alphabetical order via clang-format	2021-07-09 12:20:32 -07:00
Jeremy Morse	30cce54dad	[X86] Return src/dest register from stack spill/restore recogniser LLVM provides target hooks to recognise stack spill and restore instructions, such as isLoadFromStackSlot, and it also provides post frame elimination versions such as isLoadFromStackSlotPostFE. These are supposed to return the store-source and load-destination registers; unfortunately on X86, the PostFE recognisers just return "1", apparently to signify "yes it's a spill/load". This patch alters the hooks to correctly return the store-source and load-destination registers: This is really useful for debug-info as we it helps follow variable values as they move on/off the stack. There should be no codegen changes: the only other users of these PostFE target hooks are MachineInstr::getRestoreSize and MachineInstr::getSpillSize, which don't attempt to interpret the returned register location. While we're here, delete the (InstrRef) LiveDebugValues heuristic that tries to find the spill source register by looking for a killed reg -- we should be able to rely on the target hooks for that. This involves temporarily turning off a n InstrRef LivedDebugValues test on aarch64 (patch to re-enable it is in D104521). Differential Revision: https://reviews.llvm.org/D105428	2021-07-09 18:12:30 +01:00
Sylvestre Ledru	0ac7532cc1	m86k: adjust the usage of ArgInfo after change 9b057f647d70fc958d4a1a7a00e2deba65 Fails with: ``` /build/llvm-toolchain-snapshot-13~++20210709092633+88326bbce38c/llvm/lib/Target/M68k/GlSel/M68kCallLowering.cpp: In member function 'virtual bool llvm::M68kCallLowering::lowerReturn(llvm::MachineIRBuilder&, const llvm::Value*, llvm::ArrayRef<llvm::Register>, llvm::FunctionLoweringInfo&, llvm::Register) const': /build/llvm-toolchain-snapshot-13~++20210709092633+88326bbce38c/llvm/lib/Target/M68k/GlSel/M68kCallLowering.cpp:71:42: error: no matching function for call to 'llvm::CallLowering::ArgInfo::ArgInfo(<brace-enclosed initializer list>)' ArgInfo OrigArg{VRegs, Val->getType()}; ``` Differential Revision: https://reviews.llvm.org/D105689	2021-07-09 18:56:49 +02:00
zhijian	841077a7e9	[AIX][XCOFF] Use bit order of has_vec and longtbtable bits as defined in AIX header debug.h Summary: The bit order of the has_vec and longtbtable bits in the traceback table generated by the XL compiler flipped at some point after v12.1. This is different from the definition is the AIX header debug.h. The change in the XL compiler that caused the deviation from the OS header definition was unintentional. Since both orderings are extant and the XL compiler runtime also expects the ordering defined by the OS, we will correct the output from LLVM to match the defined ordering given by the OS (which is also consistent with the Assembler Language Reference). Mitigation for traceback tables encoded with the wrong ordering is required for either ordering. Reviewers: XingXue, HubertTong Differential Revision: https://reviews.llvm.org/D105487	2021-07-09 11:06:46 -04:00
Simon Pilgrim	9dbeac16ba	[X86] ReplaceNodeResults - fp_to_sint/uint - manually widen v2i32 results to let us add AssertSext/AssertZext Its proving tricky to move this to the generic legalizer code, so manually insert the v2i32 subvector into v4i32, insert the AssertSext/AssertZext node, then extract the subvector again. This avoids masks in the truncation/pack code, which means we avoid a PSHUFB in the fp_to_sint/uint code for sub-128 bit types (specific targets can still combine the packs to a pshufb if they have fast variable per-lane shuffles). This was noticed when I was trying to improve fp_to_sint/uint costs with D103695 (and some targets had very high fp_to_sint costs due to the PSHUFB), so we can then update the fp_to_uint codegen from D89697.	2021-07-09 12:07:33 +01:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Kai Luo	55bd12d4b7	[PowerPC] Remove implicit use register after transformToImmFormFedByLI() When the instruction has imm form and fed by LI, we can remove the redundat LI instruction. Below is an example: ``` renamable $x5 = LI8 2 renamable $x4 = exact SRD killed renamable $x4, killed renamable $r5, implicit $x5 ``` will be converted to: ``` renamable $x5 = LI8 2 renamable $x4 = exact RLDICL killed renamable $x4, 62, 2, implicit killed $x5 ``` But when we do this optimization, we forget to remove implicit killed $x5 This bug has caused a lnt case error. This patch is to fix above bug. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D85288	2021-07-09 04:42:54 +00:00
Muhammad Omair Javaid	932e3d9960	Revert "GlobalISel/AArch64: don't optimize away redundant branches at -O0" This reverts commit `458c230b5e`. This broke LLDB buildbot testcase where breakpoint set at start of loop failed to hit. https://lab.llvm.org/buildbot/#/builders/96/builds/9404 https://github.com/llvm/llvm-project/blob/main/lldb/test/API/commands/process/attach/main.cpp#L15 Differential Revision: https://reviews.llvm.org/D105238	2021-07-09 08:23:36 +05:00
Stanislav Mekhanoshin	e5b0fe1b83	[AMDGPU] Mark more SOP instructions as rematerializable The rest of the SOP instructions implicitly set SCC and not suitable for the rematerialization. Differential Revision: https://reviews.llvm.org/D105670	2021-07-08 16:00:45 -07:00
Craig Topper	631516301e	[ARM] Pass 2 instead of 0 to PHINode::Create in MVEGatherScatterLowering. NFC This parameter controls how much space is reserved for incoming values. There are always going to be 2 incoming values in this case. While there remove the unused std::vector right below. Found while looking at porting this code to RISCV.	2021-07-08 15:59:33 -07:00
Thomas Lively	3dd75f5371	[WebAssembly] Scalarize extract_vector_elt of binops Override the `shouldScalarizeBinop` target lowering hook using the same implementation used in the x86 backend. This causes `extract_vector_elt`s of vector binary ops to be scalarized if the scalarized version would be supported. Differential Revision: https://reviews.llvm.org/D105646	2021-07-08 14:31:53 -07:00
Nikita Popov	9e225a2a71	[AMDGPU] Simplify GEP construction (NFC) Noticed while making a related change. This code was doing something really peculiar: Creating an APInt by parsing a string. And then creating a SmallVector with one element to create the GEP. Instead create the APInt from integers and directly pass the single index to GetElementPtrInst::Create().	2021-07-08 21:21:43 +02:00
Nikita Popov	cfb94212d4	[AMDGPU] Pass explicit GEP type in printf transform (NFC) This code is working on an i8*. Avoid nullptr element type in preparation for removing support.	2021-07-08 21:21:43 +02:00
Nikita Popov	b5a7da4391	[NVPTX] Pass explicit GEP type (NFC) Use source element type of original GEP, as we're just changing the address space.	2021-07-08 21:21:43 +02:00
Alexey Bataev	0d74fd3fdf	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Craig Topper	6dd94cbff5	[ARM] Use matchSimpleRecurrence to simplify some code in MVEGatherScatterLowering. NFCI Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105262	2021-07-08 11:42:56 -07:00
Matt Arsenault	43f25e61ce	Mips/GlobalISel: Remove custom splitToValueTypes	2021-07-08 13:39:06 -04:00
Matt Arsenault	9b057f647d	GlobalISel: Track original argument index in ArgInfo SelectionDAG's equivalents in ISD::InputArg/OutputArg track the original argument index. Mips relies on this, and its currently reinventing its own parallel CallLowering infrastructure which tracks these indexes on the side. Add this to help move towards deleting the custom mips handling.	2021-07-08 13:39:02 -04:00
Matt Arsenault	2f9504aa41	Mips/GlobalISel: Use correct callee calling convention This was using the convention from the calling function.	2021-07-08 13:38:57 -04:00
Simon Pilgrim	8ef67fa9d2	[CostModel][X86] Account for older SSE targets with slow fp->int conversions Both the conversion cost and the xmm->gpr transfer cost tend to be a lot higher on early SSE targets	2021-07-08 18:08:24 +01:00
Stanislav Mekhanoshin	74a5760d35	[AMDGPU] Set LoopInfo as preserved by SIAnnotateControlFlow The pass does not change loops, it just adds calls. Differential Revision: https://reviews.llvm.org/D105583	2021-07-08 09:34:43 -07:00
Jeremy Morse	63cc251eb9	[DebugInfo][InstrRef][4/4] Support DBG_INSTR_REF through all backend passes This is a cleanup patch -- we're now able to support all flavours of variable location in instruction referencing mode. This patch updates various tests for debug instructions to be broader: numerous code paths try to ignore debug isntructions, and they now have to ignore the additional DBG_PHI and DBG_INSTR_REFs that we can generate. A small amount of rework happens for LiveDebugVariables: as we don't need to track live intervals through regalloc any more, we can get away with unlinking debug instructions before regalloc, then re-inserting them after. Note that this isn't (yet) true of DBG_VALUE_LISTs, they still have to go through live interval tracking. In SelectionDAG, add a helper lambda that emits half-formed DBG_INSTR_REFs for arguments in instr-ref mode, DBG_VALUE otherwise. This is one of the final locations where DBG_VALUEs are emitted for vreg arguments. X86InstrInfo now un-sets the debug instr number on SUB instructions that get mutated into CMP instructions. As the instruction no longer computes a subtraction, we can't use it for variable locations. Differential Revision: https://reviews.llvm.org/D88898	2021-07-08 16:42:24 +01:00

1 2 3 4 5 ...

63362 Commits