llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	de285eacb0	[InstCombine] allow for constant-folding in GEP transform This would crash the reduced test or as described in https://llvm.org/PR51485 ...because we can't mark a constant (-expression) with 'inbounds'.	2021-08-16 10:36:56 -04:00
Jeremy Morse	547b712500	Suppress signedness-comparison warning This is a follow-up to `54a61c94f9`.	2021-08-16 15:29:43 +01:00
Jeremy Morse	54a61c94f9	[DebugInfo][InstrRef] Honour too-much-debug-info cutouts VarLoc based LiveDebugValues will abandon variable location propagation if there are too many blocks and variable assignments in the function. If it didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end up with 1 million DBG_VALUEs just at the start of blocks. Instruction-referencing LiveDebugValues should honour this limitation too (because the same limitation applies to it). Hoist the relevant command line options into LiveDebugValues.cpp and pass it down into the implementation classes as an argument to ExtendRanges. I've duplicated all the run-lines in live-debug-values-cutoffs.mir to have an instruction-referencing flavour. Differential Revision: https://reviews.llvm.org/D107823	2021-08-16 15:06:40 +01:00
Roman Lebedev	febcedf18c	Revert "[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer" https://bugs.llvm.org/show_bug.cgi?id=51490 was filed. This reverts commit `35a8bdc775`.	2021-08-16 14:30:29 +03:00
David Sherwood	9b19b77883	[NFC] Remove unused code in llvm::createSimpleTargetReduction	2021-08-16 09:50:45 +01:00
Roman Lebedev	2eb554a9fe	Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit `3d9beefc7d`.	2021-08-16 11:07:42 +03:00
Cullen Rhodes	09507b5325	[AArch64][SME] Disable NEON in streaming mode In streaming mode most of the NEON instruction set is illegal, disable NEON when compiling with `+streaming-sve`, unless NEON is explictly requested. Subsequent patches will add support for the small subset of NEON instructions that are legal in streaming mode. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D107902	2021-08-16 07:56:48 +00:00
Christian Sigg	93c55d5ea2	Reset all options in cl::ResetCommandLineParser() Reset cl::Positional, cl::Sink and cl::ConsumeAfter options as well in cl::ResetCommandLineParser(). Reviewed By: rriddle, sammccall Differential Revision: https://reviews.llvm.org/D103356	2021-08-16 09:56:22 +02:00
Craig Topper	b82ce77b2b	[X86] Support avx512fp16 compare instructions in the IntelInstPrinter. This enables printing of the mnemonics that contain the predicate in the Intel printer. This requires accounting for the memory size that is explicitly printed in Intel syntax. Those changes have been synced to the ATT printer as well. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108093	2021-08-16 12:31:36 +08:00
Sanjay Patel	ca637014f1	[Analysis][SimplifyLibCalls] improve function signature check for memcmp This would assert/crash as shown in: https://llvm.org/PR50850 The matching for bcmp/bcopy should probably also be updated, but that's another patch.	2021-08-15 16:11:26 -04:00
Craig Topper	ff95d2524a	[X86] Prevent accidentally accepting cmpeqsh as a valid mnemonic. We should only accept as vcmpeqsh. Same for all the other 31 comparison values.	2021-08-15 12:00:56 -07:00
Craig Topper	819818f7d5	[X86] Modify the commuted load isel pattern for VCMPSHZrm to match VCMPSSZrm/VCMPSDZrm. This allows commuting any immediate value. The previous code only commuted equality immediates. This was inherited from an earlier version of VCMPSSZrm/VCMPSDZrm.	2021-08-15 11:43:56 -07:00
David Blaikie	62a4c2c10e	DWARFVerifier: Check section-relative references at the end of the section This ensures that debug_types references aren't looked for in debug_info section. Behavior is still going to be questionable in an unlinked object file - since cross-cu references could refer to symbols in another .debug_info (or, in theory, .debug_types) chunk - but if a producer only uses ref_addr to refer to things within the same .debug_info chunk in an object file (eg: whole program optimization/LTO - producing two CUs into a single .debug_info section in an object file - the ref_addrs there could be resolved relative to that .debug_info chunk, not needing to consider comdat (DWARFv5 type units or other creatures) chunks of .debug_info, etc)	2021-08-15 11:40:24 -07:00
Craig Topper	786b8fcc9b	[X86] Add vcmpsh/vcmpph to X86InstrInfo::commuteInstructionImpl. They were already added to findCommuteOpIndices, but they also need to be in X86InstrInfo::commuteInstructionImpl in order to adjust the immediate control.	2021-08-15 11:36:13 -07:00
Paul Walker	cd0e196413	[DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation. visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when removing "redundant" INSERT_SUBVECTOR operations. This patch adds an extra check to ensure such combines only occur after operation legalisation if any resulting BITBAST is itself legal. Differential Revision: https://reviews.llvm.org/D108086	2021-08-15 18:25:49 +01:00
Kazu Hirata	e6e687f2d9	[AsmParser] Remove MDSignedOrUnsignedField (NFC) The last use was removed on Apr 18, 2020 in commit `aad3d578da`.	2021-08-15 09:31:39 -07:00
David Green	c6b7db015f	[InstCombine] Add call to matchSAddSubSat from min/max This adds a call to matchSAddSubSat from smin/smax instrinsics, allowing the same patterns to match if the canonical form of a min/max is an intrinsics, not a icmp/select. Differential Revision: https://reviews.llvm.org/D108077	2021-08-15 17:25:16 +01:00
Roman Lebedev	3d9beefc7d	Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) ... with test change this time. LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:16:04 +03:00
Roman Lebedev	60dd0121c9	Revert "[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Forgot to stage the test change. This reverts commit `78af5cb213`.	2021-08-15 19:15:09 +03:00
Roman Lebedev	78af5cb213	[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:02:34 +03:00
Roman Lebedev	35a8bdc775	[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer Currently/previously, while SCEV guaranteed that it produces the same value, the way it was produced may be illegal IR, so we have an ugly check that the replacement is valid. But now that the SCEV strictness wrt the pointer/integer types has been improved, i believe this invariant is already upheld by the SCEV itself, natively. I think we should add an assertion, wait for a week, and then, if all is good, rip out all this checking. Or we could just do the latter directly i guess. This reverts commit rL127839. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108043	2021-08-15 18:59:32 +03:00
Nikita Popov	944dfa4975	[IndVars] Don't check for pointer exit count (NFC) After recent changes, exit counts and BE taken counts are always integers, so convert these to assertions. While here, also convert the loop invariance checks to asserts. Exit counts are always loop invariant.	2021-08-15 16:49:30 +02:00
Qiu Chaofan	a240b29f21	[NFC] Simply update a FIXME comment X86 overrided LowerOperationWrapper was moved to common implementation in `a7eae62`.	2021-08-15 22:43:46 +08:00
Nikita Popov	3c503ba06a	[FunctionImport] Fix build with old mingw (NFC) std::errc::operation_not_supported is not universally supported. Make use of LLVM's errc interoperability header, which lists known-good errc values.	2021-08-15 15:47:59 +02:00
Harald van Dijk	957334382c	[ExecutionEngine] Check for libunwind before calling __register_frame libgcc and libunwind have different flavours of __register_frame. Both flavours are already correctly handled, except that the code to handle the libunwind flavour is guarded by __APPLE__. This change uses the presence of __unw_add_dynamic_fde in libunwind instead to detect whether libunwind is used, rather than hardcoding it as Apple vs. non-Apple. Fixes PR44074. Thanks to Albert Jin <albert.jin@gmail.com> and Chris Schafmeister <chris.schaf@verizon.net> for identifying the problem. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D106129	2021-08-15 13:35:53 +01:00
Paul Walker	f7a831daa6	[LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested. Previously we emitted a "does not support scalable vectors" remark for all targets whenever vectorisation is attempted. This pollutes the output for architectures that don't support scalable vectors and is likely confusing to the user. Instead this patch introduces a debug message that reports when scalable vectorisation is allowed by the target and only issues the previous remark when scalable vectorisation is specifically requested, for example: #pragma clang loop vectorize_width(2, scalable) Differential Revision: https://reviews.llvm.org/D108028	2021-08-15 12:15:52 +01:00
Nikita Popov	81b106584f	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076	2021-08-15 12:35:52 +02:00
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
Itay Bookstein	530aa7e4da	[Linker] Import GlobalIFunc when importing symbols from another module Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107988	2021-08-14 22:01:11 -07:00
Wang, Pengfei	f1de9d6dae	[X86] AVX512FP16 instructions enabling 2/6 Enable FP16 binary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105264	2021-08-15 08:56:33 +08:00
luxufan	4ec32375bc	[JITLink] Unify x86-64 MachO and ELF 's optimize GOT/Stub function This patch unify optimizeELF_x86_64_GOTAndStubs and optimizeMachO_x86_64_GOTAndStubs into a pure optimize_x86_64_GOTAndStubs Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D108025	2021-08-15 00:33:09 +08:00
Kazu Hirata	915cc69259	[Aarch64] Remove redundant c_str (NFC) Identified with readability-redundant-string-cstr.	2021-08-14 08:49:40 -07:00
eopXD	012173680f	[LoopIdiom] let the pass deal with runtime memset size The current LIR does not deal with runtime-determined memset-size. This patch utilizes SCEV and check if the PointerStrideSCEV and the MemsetSizeSCEV are equal. Before comparison the pass would try to fold the expression that is already protected by the loop guard. Testcase file `memset-runtime.ll`, `memset-runtime-debug.ll` added. This patch deals with proper loop-idiom. Proceeding patch wants to deal with SCEV-s that are inequal after folding with the loop guards. Reviewed By: lebedev.ri, Whitney Differential Revision: https://reviews.llvm.org/D107353	2021-08-14 19:22:06 +08:00
Dawid Jurczak	107401002e	[NFC][DSE] Clean up KnownNoReads and MemorySSAScanLimit in DSE Another simple cleanups set in DSE. CheckCache is removed since `1f1145006b` and in consequence KnownNoReads is useless. Also update description of MemorySSAScanLimit which default value is 150 instead 100. Differential Revision: https://reviews.llvm.org/D107812	2021-08-14 11:26:57 +02:00
Lang Hames	27ea3f1607	[JITLink][x86-64] Rename Relaxable edges to REXRelaxable. The existing relaxable edges all assume a REX prefix. ELF includes non-REX relaxations, so rename these edges to make room for the new kinds.	2021-08-14 18:28:49 +10:00
Lang Hames	632135acae	[JITLink][x86-64] Rename BranchPCRel32ToPtrJumpStub(Relaxable -> Bypassable). ELF allows for branch optimizations other than bypass, so rename this edge kind to avoid any confusion.	2021-08-14 17:49:31 +10:00
Anshil Gandhi	29e11a1aa3	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `c4e5425aa5`.	2021-08-13 23:58:04 -06:00
Anshil Gandhi	c4e5425aa5	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-13 22:44:08 -06:00
Jessica Paquette	50efbf9cbe	[GlobalISel] Narrow binops feeding into G_AND with a mask This is a fairly common pattern: ``` %mask = G_CONSTANT iN <mask val> %add = G_ADD %lhs, %rhs %and = G_AND %add, %mask ``` We have combines to eliminate G_AND with a mask that does nothing. If we combined the above to this: ``` %mask = G_CONSTANT iN <mask val> %narrow_lhs = G_TRUNC %lhs %narrow_rhs = G_TRUNC %rhs %narrow_add = G_ADD %narrow_lhs, %narrow_rhs %ext = G_ZEXT %narrow_add %and = G_AND %ext, %mask ``` We'd be able to take advantage of those combines using the trunc + zext. For this to work (or be beneficial in the best case) - The operation we want to narrow then widen must only be used by the G_AND - The G_TRUNC + G_ZEXT must be free - Performing the operation at a narrower width must not produce a different value than performing it at the original width after masking. Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign. Differential Revision: https://reviews.llvm.org/D107929	2021-08-13 18:31:13 -07:00
Matt Arsenault	cc56152f83	GlobalISel: Add helper function for getting EVT from LLT This can only give an imperfect approximation, but is enough to avoid crashing in places where we call into EVT functions starting from LLTs.	2021-08-13 21:10:13 -04:00
Craig Topper	d63f117210	[RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode.	2021-08-13 18:00:09 -07:00
Matt Arsenault	a77ae4aa6a	AMDGPU: Stop attributor adding attributes to intrinsic declarations	2021-08-13 20:51:48 -04:00
Matt Arsenault	5beb9a0e6a	AMDGPU: Respect compute ABI attributes with unknown OS Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL, so we still have the awkward unknown OS case to deal with. Previously if the HSA ABI intrinsics appeared, we we would not add the ABI registers to the function. We would emit an error later, but we still need to produce some compile result. Start adding the registers to any compute function, regardless of the OS. This keeps the internal state more consistent, and will help avoid numerous test crashes in a future patch which starts assuming the ABI inputs are present on functions by default.	2021-08-13 20:44:46 -04:00
Arthur Eubanks	16e8134e7c	[NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr()	2021-08-13 16:56:42 -07:00
Arthur Eubanks	c19d7f8af0	[CallPromotion] Check for inalloca/byval mismatch Previously we would allow promotion even if the byval/inalloca attributes on the call and the callee didn't match. It's ok if the byval/inalloca types aren't the same. For example, LTO importing may rename types. Fixes PR51397. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107998	2021-08-13 16:52:04 -07:00
Arthur Eubanks	d5ff5ef65e	[NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr()	2021-08-13 16:49:05 -07:00
Arthur Eubanks	dc41c558dd	[NFC] Make AttributeList::hasAttribute(AttributeList::ReturnIndex) its own method AttributeList::hasAttribute() is confusing. In an attempt to change the name to something that suggests using other methods, fix up some existing uses.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	f80ae58068	[NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex) getAttribute() is confusing, use a clearer method.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	8e9ffa1dc6	[NFC] Cleanup callers of AttributeList::hasAttributes() AttributeList::hasAttributes() is confusing, use clearer methods like hasFnAttrs().	2021-08-13 12:16:52 -07:00
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	a9831cce1e	[NFC] Remove public uses of AttributeList::getAttributes() Use methods that better convey the intent.	2021-08-13 11:38:12 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Amy Kwan	581a80304c	[PowerPC] Disable CTR Loop generate for fma with the PPC double double type. It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this intrinsic needs to remain as a call as it cannot be lowered to any instruction, which also means we need to disable CTR loop generation for fma involving the ppcf128 type. This patch accomplishes this behaviour. Differential Revision: https://reviews.llvm.org/D107914	2021-08-13 12:27:24 -05:00
Haowei Wu	571b0d84d2	[IFS] Fix the copy constructor warning in IFSStub.cpp This change fixes the gcc warning on copy constructor in IFSStub.cpp file. Differential Revision: https://reviews.llvm.org/D108000	2021-08-13 10:17:53 -07:00
Alfonso Gregory	17bc82dd3b	[AsmWriter][NFC] Simplify writeDIGenericSubrange Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107469	2021-08-13 09:31:13 -07:00
Jessica Paquette	ccfc079047	[AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT These are lowered, matching SDAG behaviour. (See llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll) These fall back ~159 times on a build of clang with GISel enabled. Differential Revision: https://reviews.llvm.org/D107777	2021-08-13 09:02:25 -07:00
Jamie Schmeiser	64f29e2dd1	Fix bad assert in print-changed code Summary: The assertion that both functions were not missing was incorrect and would fail when one of the functions was missing. Fixed it and moved the assertion earlier to check the input parameters to better capture first-failure. Added lit test. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D107989	2021-08-13 10:54:30 -04:00
Roman Lebedev	0dc6b597db	Revert "[SCEV] Remove premature assert. PR46786" Since then, the SCEV pointer handling as been improved, so the assertion should now hold. This reverts commit `b96114c1e1`, relanding the assertion from commit `141e845da5`.	2021-08-13 17:50:22 +03:00
Roman Lebedev	c46546bd52	Reland "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological"" The commit originally unearthed a problem, reported as https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff#1019890 Now that the problem has been fixed, and the assertion no longer fires, let's see if there are other cases it fires on. This reverts commit `5c8c24d2de`, relanding commit `f30a7dff8a`.	2021-08-13 15:45:03 +03:00
Roman Lebedev	2702fb1148	[SimplifyCFG] Restart if `removeUndefIntroducingPredecessor()` made changes It might changed the condition of a branch into a constant, so we should restart and constant-fold terminator, instead of continuing with the tautological "conditional" branch. This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff	2021-08-13 15:45:03 +03:00
Roman Lebedev	5c8c24d2de	Revert "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological" The assertion does not hold on a provided reproducer. Reverting until after fixing the problem. This reverts commit `f30a7dff8a`.	2021-08-13 13:16:22 +03:00
Dylan Fleming	4be7fb9762	[SVE] Add folds for truncation of vscale Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107453	2021-08-13 10:18:00 +01:00
Rosie Sumpter	46abd1fbe8	[LoopFlatten] Fix assertion failure in checkOverflow There is an assertion failure in computeOverflowForUnsignedMul (used in checkOverflow) due to the inner and outer trip counts having different types. This occurs when the IV has been widened, but the loop components are not successfully rediscovered. This is fixed by some refactoring of the code in findLoopComponents which identifies the trip count of the loop.	2021-08-13 10:07:49 +01:00
luxufan	ee65938357	[JITLink] Update ELF_x86_64 's edge kind to generic edge kind This patch uses a switch statement to map the ELF_x86_64's edge kind to generic edge kind, and merge the ELF_x86_64 's applyFixup function to the x86_64 's applyFixup function. Some edge kinds were not have corresponding generic edge kinds, so I added three generic edge kinds asa follows: 1. RequestGOTAndTransformToDelta64, which is similar to RequestGOTAndTransformToDelta32. 2. GOTDelta64. This generic kind is similar to Delta64, except the GOTDelta64 computes the delta relative to GOTSymbol 3. RequestGOTAndTransformToGOTDelta64. This edge kind was used to deal with ELF_x86_64's GOT64 edge kind, it request the fixGOTEdge function to change the target to GOT entry, and set the edge kind to generic edge kind GOTDelta64. These added generic edge kinds may named haphazardly, or can't express its meaning well. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D107967	2021-08-13 12:53:54 +08:00
Shivam Gupta	835ea22b37	[AVR] Enable machine verifier Reviewed By: mhjacobson, benshi001 Differential Revision: https://reviews.llvm.org/D107853	2021-08-13 12:11:22 +08:00
Michael Kruse	b1de32d6dd	[OMPIRBuilder] Clarify CanonicalLoopInfo. NFC. Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives. CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards. In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0 Also see discussion in D105706. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D107540	2021-08-12 21:02:19 -05:00
Giorgis Georgakoudis	60e643fe05	[OpenMP][Fix] Fix disable spmdization option Besides SPMDization, other analysis and optimization for original, frontend-generated SPMD regions uses information from the AAKernelInfoFunction attribute. This fix makes sure disabling SPMDization through the corresponding option applies only to generic mode regions, which should not be SPMDized, while it leaves unaffected the attribute state of original SPMD regions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108001	2021-08-12 17:59:14 -07:00
Ruiling Song	e1beebbac5	SplitKit: Don't further split subrange mask in buildCopy We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829	2021-08-13 07:36:38 +08:00
Heejin Ahn	adb96d2e76	[WebAssembly] Fix leak in Emscripten SjLj For SjLj, we allocate a table to record setjmp buffer info in the entry of each setjmp-calling function by inserting a `malloc` call, and insert a `free` call to free the buffer before each `ret` instruction. But this is not sufficient; we have to free the buffer before we throw. In SjLj handling, normal functions that can possibly throw or longjmp are wrapped with an invoke and caught within the function so they don't end up escaping the function. But three functions throw and escape the function: - `__resumeException` (Emscripten library function used for Emscripten EH) - `emscripten_longjmp` (Emscripten library function used for Emscripten SjLj) - `__cxa_throw` (libc++abi function called when for C++ `throw` keyword) The first two functions are used to rethrow the current exception/longjmp when the caught exception/longjmp is not for the current function. `__cxa_throw` is used for exception, and because we consider that a function that cannot longjmp, it escapes the function right away, before which we should free the buffer. Currently `lsan.test_longjmp3` and `lsan.test_exceptions_longjmp3` fail in Emscripten; this CL fixes these. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107852	2021-08-12 16:32:46 -07:00
Heejin Ahn	aca198cf74	[WebAssembly] Error out when Emscripten SjLj setjmp is used with Wasm EH Currently, when Wasm EH is used with Emscripten SjLj, Emscripten SjLj cannot handle `invoke` instructions - it assumes all `invoke`s have been lowered away with Emscripten EH. But in Wasm EH they are lowered in instruction selection, so they are still present in the IR stage. This happens when 1. Wasm EH and Emscripten SjLj are used together 2. A function that calls `setjmp` uses exceptions, i.e., has `invoke`s We were already erroring out with an assertion failure in this case, but this CL makes it error out more properly with a valid error message. Wasm EH + Wasm SjLj will not have this restrictions. (it will have another restriction though, e.g., `setjmp` cannot be called within `catch`. But why would anyone do that..) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107687	2021-08-12 16:19:04 -07:00
Lei Huang	8930af45c3	[PowerPC] Implement XL compatibility builtin __addex Add builtin and intrinsic for `__addex`. This patch is part of a series of patches to provide builtins for compatibility with the XL compiler. Reviewed By: stefanp, nemanjai, NeHuang Differential Revision: https://reviews.llvm.org/D107002	2021-08-12 16:38:21 -05:00
Heejin Ahn	78e87970af	[WebAssembly] Disable offset folding for function addresses Wasm does not support function addresses with offsets, but isel can generate folded SDValues in the form of (@func + offset) without this patch. Fixes https://bugs.llvm.org/show_bug.cgi?id=43133. Reviewed By: dschuff, sbc100 Differential Revision: https://reviews.llvm.org/D107940	2021-08-12 13:40:41 -07:00
Sanjay Patel	14eefa57f2	[InstCombine] factorize min/max intrinsic ops with common operand (2nd try) This is a re-try of `6de1dbbd09` which was reverted because it missed a null check. Extra test for that failure added. Original commit message: This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 16:32:07 -04:00
Amy Huang	427520a8fa	Revert "[InstCombine] factorize min/max intrinsic ops with common operand" This reverts commit `6de1dbbd09` because it causes a compiler crash.	2021-08-12 12:36:25 -07:00
Florian Hahn	f999312872	Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts the revert `28c04794df`. The failing MLIR test that caused the revert should be fixed in this version. Also includes a PPC test fix previously in `1f87c7c478`.	2021-08-12 18:31:57 +01:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Roman Lebedev	f30a7dff8a	[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological We really shouldn't deal with a conditional branch that can be trivially constant-folded into an unconditional branch. Indeed, barring failure to trigger BB reprocessing, that should be true, so let's assert as much, and hope the assertion never fires. If it does, we have a bug to fix.	2021-08-12 20:03:09 +03:00
Roman Lebedev	628f63d3d5	[SimplifyCFG] If FoldTwoEntryPHINode() changed things, restart Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()` doesn't get asked to deal with non-unconditional branches, and if i do that, then said assertion fires on existing tests, and this is what prevents it from firing.	2021-08-12 20:03:09 +03:00
Sanjay Patel	790c29ab86	[InstCombine] fold umax/umin intrinsics based on demanded bits This is a direct translation of the select folds added with D53033 / D53036 and another step towards canonicalization using the intrinsics (see D98152).	2021-08-12 12:37:45 -04:00
maekawatoshiki	dd3eea6566	[LICM] Support sinking in LNICM Currently, LNICM pass does not support sinking instructions out of loop nest. This patch enables LNICM to sink down as many instructions to the exit block of outermost loop as possible. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D107219	2021-08-13 00:56:26 +09:00
Sanjay Patel	cd44cc86e3	[InstCombine] remove unused function argument; NFC This was just added with `6de1dbbd09` , and I missed pulling the extra arg from the final revision.	2021-08-12 11:47:25 -04:00
Johannes Doerfert	4e7d7cae67	[Attributor][FIX] Do not try to rewrite functions with casted call sites If we cast a function at the call site it is hard(er) to get the rewrite correct, let's not attempt it for now. Fixes PR51448.	2021-08-12 10:39:53 -05:00
Johannes Doerfert	5f543919b2	[Attributor][FIX] Guard constant casts with type size checks	2021-08-12 10:39:53 -05:00
Johannes Doerfert	a420f80bf1	[Attributor] Do not delete volatile stores to null/undef See D106309. Differential Revision: https://reviews.llvm.org/D107906	2021-08-12 10:39:52 -05:00
David Green	ae9a346ef8	[ARM] Fix DAG combine loop in reduction distribution Given a constant operand, the MVE and DAGCombine combines could fight, each redistributing in the opposite order. Add a guard to the MVE vecreduce distribution to prevent that.	2021-08-12 16:37:39 +01:00
Sanjay Patel	be0698559b	[InstCombine] remove shl(neg x), y transform This diff was accidentally committed with: `1b5a195845`	2021-08-12 11:27:22 -04:00
Sanjay Patel	6de1dbbd09	[InstCombine] factorize min/max intrinsic ops with common operand This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 11:19:09 -04:00
Sanjay Patel	1b5a195845	[InstCombine] add tests for factorization of min/max intrinsics; NFC	2021-08-12 11:19:09 -04:00
Victor Huang	99e00663d4	[PowerPC] Fix return address computation for "__builtin_return_address" When depth > 0, callee frame address is used to compute the return address of callee producing improper return address. This patch adds the fix to use caller frame address to compute the return address of callee. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D107646	2021-08-12 09:44:49 -05:00
Liqiang Tao	422fc5603a	[llvm][Inline] Refactor out InlineOrder Move InlineOrder to separated file. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D107831	2021-08-12 22:19:53 +08:00
Mehdi Amini	28c04794df	Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts commit `a1ef81de35`. Broke the MLIR buildbot.	2021-08-12 11:57:19 +00:00
Florian Hahn	a1ef81de35	[Matrix] Overload stride arg in matrix.columnwise.load/store. This patch adjusts the intrinsics definition of llvm.matrix.column.major.load and llvm.matrix.column.major.store to allow overloading the type of the stride. The bitwidth of the stride is used to perform the offset computation. This fixes a crash when using __builtin_matrix_column_major_load or __builtin_matrix_column_major_store on 32 bit platforms. The stride argument of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit platforms. Note that we still perform offset computations with 64 bit width on 32 bit platforms for accesses that do not take a user-specified stride. This can be fixed separately. Fixes PR51304. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D107349	2021-08-12 10:45:25 +01:00
David Truby	9c47d6b48d	[llvm][sve] Lowering for VLS extending loads This patch enables extending loads for fixed length SVE code generation. There is a slight regression here in the mulh tests; since these tests load the parameter and then extend it these are treated as extending loads which are merged, preventing the mulh instruction from being generated. As this affects scalable SVE codegen as well this should be addressed in a separate patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D107057	2021-08-12 09:43:39 +00:00
Cullen Rhodes	419deccfd1	[AArch64] NFC: Remove register decoder tables in disassembler The register classes are generated by TableGen, use them instead of handwritten tables. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107763	2021-08-12 07:28:56 +00:00
Fangrui Song	67d4d7cf68	[Object] Add missing PPC_DYNAMIC_TAG macros	2021-08-12 00:05:04 -07:00
Christudasan Devadasan	5d940b71ae	Reapply "SROA: Enhance speculateSelectInstLoads" Originally committed as `ffc3fb665d` Reverted in `fcf2d5f402` due to an assertion failure. Original commit message: Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-11 22:58:54 -04:00
Amara Emerson	73056f239e	[AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules. These rules were originally written when the new predicate based legalizer was introduced in an attempt to preserve existing behaviour. It wasn't properly kept up to date as things like vector support was split out into G_CONCAT_VECTORS, and frankly, even if it was, it was too complex. It's much easier to start from scratch with what we can actually support, which is just a few type combinations. Anything illegal we should either legalize, or should be eliminated as a side effect of artifact combination. Differential Revision: https://reviews.llvm.org/D107937	2021-08-11 16:45:23 -07:00
Usman Nadeem	9396c3ec7b	[AArch64][SVE] Remove assertion/range check for i16 values during immediate selection The assertion can fail in some cases when an i16 constant is promoted to i32. e.g. in the added test case the value `i16 -32768` is within the range of i16 but the assert fails when the constant is promoted to positive `i32 32768` by an earlier call to DAG.getConstant(). Differential Revision: https://reviews.llvm.org/D107880 Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de	2021-08-11 14:50:20 -07:00
Amara Emerson	2c1789bc8c	[AArch64][GlobalISel] Add ptradd_immed_chain combine to post-legalizer combiner.	2021-08-11 13:59:23 -07:00
Akira Hatanaka	643ce61fb3	[ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the release call findSafeStoreForStoreStrongContraction checks whether it's safe to move the release call to the store by inspecting all instructions between the two, but was ignoring retain instructions. This was causing objects to be released and deallocated before they were retained. rdar://81668577	2021-08-11 13:50:19 -07:00
Usman Nadeem	a7c4e9b1f7	[InstSimplify] Eliminate vector reverse of a splat vector experimental.vector.reverse(splat(X)) -> splat(X) Differential Revision: https://reviews.llvm.org/D107793 Change-Id: Id29ba88fd669ff8686712e96b1bdc46dda5b853c	2021-08-11 11:27:58 -07:00
Rong Xu	4c5909ba83	[SampleFDO] Add two passes of MIRAddFSDiscriminatorsPass This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register allocation, and Pass2 of MIRAddFSDiscriminatorsPass before Block-Placement. This is still under --enable-fs-discrmininator option (default false). This would reduce the turn-around time for FSAFDO transition. Differential Revision: https://reviews.llvm.org/D104579	2021-08-11 11:11:04 -07:00
Kazu Hirata	63c566b1fd	[DWARF] Remove extractFast (NFC) The last use was removed on Dec 13, 2016 in commit `c8c1032c0c`. This patch repurposes the function comment for the other variant of extractFast.	2021-08-11 09:55:00 -07:00
Sanjay Patel	a0a9c9e188	[InstCombine] avoid breaking up min/max (cmp+sel) idioms This is a quick fix for a motivating case that looks like this: https://godbolt.org/z/GeMqzMc38 As noted, we might be able to restore the min/max patterns with select folds, or we just wait for this to become easier with canonicalization to min/max intrinsics.	2021-08-11 12:48:11 -04:00
Yolanda Chen	8fa16cc628	[LTO][lld] Add lto-pgo-warn-mismatch option When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX) due to source changes (e.g. `#if` code runs for profile generation but not for profile use) To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option. In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option. Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO. Differential Revision: https://reviews.llvm.org/D104431	2021-08-11 09:45:55 -07:00
Fraser Cormack	885be620f9	[LegalizeTypes][NFC] Remove else-after-return Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107890	2021-08-11 16:48:28 +01:00
David Green	8c50b5fbfe	[ARM] Add extra debug messages for validating live outs. NFC We are running into more and more cases where the liveouts of low overhead loops do not validate. Add some extra debug messages to make it clearer why.	2021-08-11 10:35:53 +01:00
Wang, Pengfei	6c4809825d	Revert "[lld] Add lto-pgo-warn-mismatch option" This reverts commit `0cfb00a1c9`.	2021-08-11 16:25:42 +08:00
Cullen Rhodes	1fe0e6a380	[AArch64][SME] Support ptrue(s) in streaming mode The ptrue and ptrues instructions are legal in streaming mode, missed in D106272. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107807	2021-08-11 07:49:36 +00:00
Rainer Orth	7bbbf29561	[ELF] Don't emit SHF_GNU_RETAIN on Solaris The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris. Initially, as reported in Bug 49437, it caused dozens of testsuite failures on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but `SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag (in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a completely different semantics, which confuses Solaris `ld` very much. Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris `ld` doesn't support, causing it to SEGV and break the build. The linker is currently being hardened to not accept non-native OS ABIs to avoid this. The need for linker support is already documented in `clang/include/clang/Basic/AttrDocs.td`, but not currently checked. This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D107747	2021-08-11 09:27:51 +02:00
madhur13490	61526b1262	[DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D107845	2021-08-11 12:14:28 +05:30
Yolanda Chen	0cfb00a1c9	[lld] Add lto-pgo-warn-mismatch option When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX). To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option. In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option. Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104431	2021-08-11 14:43:26 +08:00
Petr Hosek	389dc94d4b	[InstrProfiling] Generate runtime hook for Fuchsia When none of the translation units in the binary have been instrumented we shouldn't need to link the profile runtime. However, because we pass -u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still be pulled in and incur some overhead. On Fuchsia which uses runtime counter relocation, it also means that we cannot reference the bias variable unconditionally. This change modifies the InstrProfiling pass to pull in the profile runtime only when needed by declaring the __llvm_profile_runtime symbol in the translation unit only when needed. For now we restrict this only for Fuchsia, but this can be later expanded to other platforms. This approach was already used prior to `9a041a7522`, but we changed it to always generate the __llvm_profile_runtime due to a TAPI limitation, but that limitation may no longer apply, and it certainly doesn't apply on platforms like Fuchsia. Differential Revision: https://reviews.llvm.org/D98061	2021-08-10 23:21:15 -07:00
Petr Hosek	c0c1c3cf93	Revert "[InstrProfiling] Emit bias variable eagerly" This reverts commit `6660cec568` since it was superseded by https://reviews.llvm.org/D98061.	2021-08-10 23:21:15 -07:00
Johannes Doerfert	fc32a5c87d	[Attributor][NFC] Try to make the windows build bots happy Failed for some reason, potentially because of the inner type declaration in combination with the `using`. This might help. Failure: https://lab.llvm.org/buildbot/#/builders/127/builds/15432	2021-08-11 01:11:37 -05:00
Johannes Doerfert	e7e3585cde	[Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly PHI nodes are not pass through but change their value, we have to account for that to avoid missing stores. Follow up for D107798 to fix PR51249 for good. Differential Revision: https://reviews.llvm.org/D107808	2021-08-11 00:49:54 -05:00
Johannes Doerfert	96da6dd6ba	[Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249) AAPointerInfoFloating needs to visit all uses and some multiple times if we go through PHI nodes. Attributor::checkForAllUses keeps a visited set so we don't recurs endlessly. We now allow recursion for non-phi uses so we track all pointer offsets via PHI nodes properly without endless recursion. This replaces the first attempt D107579. Differential Revision: https://reviews.llvm.org/D107798	2021-08-11 00:49:54 -05:00
Johannes Doerfert	e0c5d83a92	[OpenMP][FIX] Disabled optimizations have to be made known To avoid simplification with wrong constants we need to make sure we know that we won't perform specific optimizations based on the users request. The non-SPMDzation and non-CustomStateMachine flags did only prevent the final transformation but allowed to value simplification to go ahead. Differential Revision: https://reviews.llvm.org/D107862	2021-08-11 00:49:53 -05:00
Craig Topper	a8ae41fb51	[SelectionDAGBuilder] Save iterator to avoid second DenseMap lookup. NFC We were calling find and then using operator[]. Instead keep the iterator from find and use it to get the value. Just happened to notice while investigating how we decide what extends to use between basic blocks.	2021-08-10 22:37:48 -07:00
Mircea Trofin	510402c2c8	[NFC][MLGO] 'Use' variable used for asserts	2021-08-10 19:55:17 -07:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
Hongtao Yu	68d6c3e448	[CSSPGO] Additional cleanup as a follow-up to D107838	2021-08-10 19:04:20 -07:00
Hongtao Yu	78523516bc	[CSSPGO] Do not use getCanonicalFnName in pseudo probe descriptor decoding Pseudo probe descriptors are created very early in the pipeline where function names just come from the front end and are not yet decorated. So calling getCanonicalFnName on the function names in probe desc is basically a no-op, which also addes a depenency from MC to ProfileData unnessesarily. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D107838	2021-08-10 18:24:39 -07:00
Amara Emerson	7ec4ce157b	[AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality. With contributions by Sebastian Neubauer Differential Revision: https://reviews.llvm.org/D105676	2021-08-10 16:41:18 -07:00
Sushma Unnibhavi	7bdce6bcbd	[M68k][GloballSel] RegBankSelect implementation Implementation of RegBankSelect for the M68k backend. Differential Revision: https://reviews.llvm.org/D107542	2021-08-10 15:24:43 -07:00
Adrian Prantl	a353edb8d6	Simplify coro::salvageDebugInfo() (NFC-ish) This patch removes the hand-rolled implementation of salvageDebugInfo for cast and GEPs and replaces it with a call into llvm::salvageDebugInfoImpl(). A side-effect of this is that additional redundant convert operations are introduced, but those don't have any negative effect on the resulting DWARF expression. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107384	2021-08-10 15:21:18 -07:00
Adrian Prantl	d6b6880172	Streamline the API of salvageDebugInfoImpl (NFC) This patch refactors / simplifies salvageDebugInfoImpl(). The goal here is to simplify the implementation of coro::salvageDebugInfo() in a followup patch. 1. Change the return value to I.getOperand(0). Currently users of salvageDebugInfoImpl() assume that the first operand is I.getOperand(0). This patch makes this information explicit. A nice side-effect of this change is that it allows us to salvage expressions such as add i8 1, %a in the future. 2. Factor out the creation of a DIExpression and return an array of DIExpression operations instead. This change allows users that call salvageDebugInfoImpl() in a loop to avoid the costly creation of temporary DIExpressions and to defer the creation of a DIExpression until the end. This patch does not change any functionality. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107383	2021-08-10 15:21:18 -07:00
Nikita Popov	17db125b48	[MemCpyOpt] Optimize MemoryDef insertion When converting a store into a memset, we currently insert the new MemoryDef after the store MemoryDef, which requires all uses to be renamed to the new def using a whole block scan. Instead, we can insert the new MemoryDef before the store and not rename uses, because we know that the location is immediately overwritten, so all uses should still refer to the old MemoryDef. Those uses will get renamed when the old MemoryDef is actually dropped, which is efficient. I expect something similar can be done for some of the other MSSA updates in MemCpyOpt. This is an alternative to D107513, at least for this particular case. Differential Revision: https://reviews.llvm.org/D107702	2021-08-10 21:28:29 +02:00
Thomas Johnson	b821086876	[ARC] Add codegen for count trailing zeros intrinsic for the ARC backend Differential Revision: https://reviews.llvm.org/D107828	2021-08-10 12:07:35 -07:00
Fangrui Song	76093b1739	[InlineAdvisor] Add single quotes around caller/callee names Clang diagnostics refer to identifier names in quotes. This patch makes inline remarks conform to the convention. New behavior: ``` % clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline] int bar(int a) { return foo(a); } ^ ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D107791	2021-08-10 11:51:31 -07:00
Sanjay Patel	b267d3ce8d	[InstCombine] avoid infinite loops from min/max canonicalization The intrinsics have an extra chunk of known bits logic compared to the normal cmp+select idiom. That allows folding the icmp in each case to something better, but that then opposes the canonical form of min/max that we try to form for a select. I'm carving out a narrow exception to preserve all existing regression tests while avoiding the inf-loop. It seems unlikely that this is the only bug like this left, but this should fix: https://llvm.org/PR51419	2021-08-10 14:42:37 -04:00
Jinsong Ji	2cfd427626	[AIX] Don't crash on unimplemented lowerRelativeReference We may call lowerRelativeReference in MC to determine whether target supports this lowering. We should return nullptr instead of crashing when we haven't implemented the real lowering. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D107830	2021-08-10 17:43:06 +00:00
Matt Arsenault	d719f1c3cc	AMDGPU: Add alloc priority to global ranges The requested register class priorities weren't respected globally. Not sure why this is a target option, and not just the expected behavior (recently added in `1a6dc92be7`). This avoids an allocation failure when many wide tuple spills are introduced. I think this is a workaround since I would not expect the allocation priority to be required, and only a performance hint. The allocator should be smarter about when only a subregister needs to be spilled and restored. This does regress a couple of degenerate store stress lit tests which shouldn't be too important.	2021-08-10 13:12:34 -04:00
Matt Arsenault	1b41945da0	RegAllocGreedy: Add spaces between registers in debug message	2021-08-10 13:12:34 -04:00
Craig Topper	6f5edc3487	[RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y)) Similar for sub except sub isn't commutative. Modify the existing and/or/xor folds to also work on ISD::SELECT and not just RISCVISD::SELECT_CC. This is needed to make sure we do this transform before type legalization turns i32 add/sub into add/sub+sign_extend_inreg on RV64. If we don't do this before that, the sign_extend_inreg will still be after the select. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107603	2021-08-10 09:02:56 -07:00
Sanjay Patel	e260e10c4a	[InstSimplify] fold min/max with limit constant This is already done within InstCombine: https://alive2.llvm.org/ce/z/MiGE22 ...but leaving it out of analysis makes it harder to avoid infinite loops there.	2021-08-10 10:57:25 -04:00
Sanjay Patel	188832f419	Revert "[InstSimplify] fold min/max with limit constant; NFC" This reverts commit `f43859b437`. This is not NFC, so I'll try again without that mistake in the commit message.	2021-08-10 10:50:09 -04:00
Sanjay Patel	f43859b437	[InstSimplify] fold min/max with limit constant; NFC This is already done within InstCombine: https://alive2.llvm.org/ce/z/MiGE22 ...but leaving it out of analysis makes it harder to avoid infinite loops there.	2021-08-10 10:43:07 -04:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
Tim Northover	5ad0860899	AArch64: support @llvm.va_copy in GISel	2021-08-10 13:11:03 +01:00
Konstantin Schwarz	64bef13f08	[GlobalISel] Look through truncs and extends in narrowScalarShift If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be shifted individually by the constant shift amount. However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization code was used, producing intermediate shifts that are potentially illegal on some targets. This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D89100	2021-08-10 13:49:22 +02:00
Carl Ritson	a1783b54e8	[SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI. Avoid stack overflow errors on systems with small stack sizes by removing recursion in FoldCondBranchOnPHI. This is a simple change as the recursion was only iteratively calling the function again on the same arguments. Ideally this would be compiled to a tail call, but there is no guarantee. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107803	2021-08-10 19:14:31 +09:00
David Green	c140ff493e	[ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available This changes a couple of calls to LiveRegs.contains to !LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a test to look more correct to me, given r7 should be the frame pointer so is not available), and another in the ARMLoadStoreOptimizer, that I don't have a test for, it was just found by inspection. Differential Revision: https://reviews.llvm.org/D107454	2021-08-10 09:53:26 +01:00
Tony Tye	53eb469195	[AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer C++20 no longer requires the failure memory ordering to be no stronger than the success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge instruction memory orderings Add common operation to merge memory orders that allows non strict memory orderings to be combined. Use it in SIMemoryLegalizer and MachineMemOperand::getMergedOrdering. Reviewed By: efriedma, rampitec Differential Revision: https://reviews.llvm.org/D106729	2021-08-10 08:43:03 +00:00
David Sherwood	ce394161cb	[InstCombine] Add more complex folds for extractelement + stepvector I have updated cheapToScalarize to also consider the case when extracting lanes from a stepvector intrinsic. This required removing the existing 'bool IsConstantExtractIndex' and passing in the actual index as a Value instead. We do this because we need to know if the index is <= known minimum number of elements returned by the stepvector intrinsic. Effectively, when extracting lane X from a stepvector we know the value returned is also X. New tests added here: Transforms/InstCombine/vscale_extractelement.ll Differential Revision: https://reviews.llvm.org/D106358	2021-08-10 09:17:21 +01:00
Cullen Rhodes	81f057c253	[AArch64][SVE] NFC: Remove unused p0-p7 with element size predicates Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107752	2021-08-10 07:56:22 +00:00
David Sherwood	8439415333	[IR] Let ConstantVector::getSplat use poison instead of undef This patch updates ConstantVector::getSplat to use poison instead of undef when using insertelement/shufflevector to splat. This follows on from D93793. Differential Revision: https://reviews.llvm.org/D107751	2021-08-10 08:27:43 +01:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Usman Nadeem	5420fc4a27	[AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend Replace vector unpack operation with a scalar extend operation. unpack(splat(X)) --> splat(extend(X)) If we have both, unpkhi and unpklo, for the same vector then we may save a register in some cases, e.g: Hi = unpkhi (splat(X)) Lo = unpklo(splat(X)) --> Hi = Lo = splat(extend(X)) Differential Revision: https://reviews.llvm.org/D106929 Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b	2021-08-09 14:58:54 -07:00
Usman Nadeem	85bbc05154	[AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value Move the last{a,b} operation to the vector operand of the binary instruction if the binop's operand is a splat value. This essentially converts the binop to a scalar operation. Example: // If x and/or y is a splat value: lastX (binop (x, y)) --> binop(lastX(x), lastX(y)) Differential Revision: https://reviews.llvm.org/D106932 Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0	2021-08-09 14:48:41 -07:00
Eli Friedman	ac20e56911	[AArch64] Implement FCOPYSIGN for SVE. I was originally going to try to implement this in target-independent code, but it's actually sort of tricky to generate the correct sequence for vectors like nxv2f32. So just stick this in target-specific code, at least for now. Differential Revision: https://reviews.llvm.org/D107608	2021-08-09 12:06:48 -07:00
Arnold Schwaighofer	b987c283ae	[coro] Correct CurrentBlock tracking bug recently introduced We use the CurrentBlock to determine whether we have already processed a block. Don't reuse this variable for setting where we should insert the rematerialization. The rematerialization block is different to the current block when we rematerialize for coro suspend block users. Differential Revision: https://reviews.llvm.org/D107573	2021-08-09 10:41:41 -07:00
Bradley Smith	73ecb9987b	[AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter The patterns for fixed length gather/scatter with 32-bit offsets and 64-bit memory type are slightly different that the rest of the patterns, as such the lowering needs to be slightly different to ensure the correct types are used. Differential Revision: https://reviews.llvm.org/D107576	2021-08-09 14:05:22 +00:00
Jeremy Morse	d4ce9e463d	[DWARF] Revert sharing subprograms across CUs This patch is a revert of `e08f205f5c`. In that patch, DW_TAG_subprograms were permitted to be referenced across CU boundaries, to improve stack trace construction using call site information. Unfortunately, as documented in PR48790, the way that subprograms are "owned" by dwarf units is sufficiently complicated that subprograms end up in unexpected units, invalidating cross-unit references. There's no obvious way to easily fix this, and several attempts have failed. Revert this to ensure correct DWARF is always emitted. Three tests change in addition to the reversion, but they're all very light alterations. Differential Revision: https://reviews.llvm.org/D107076	2021-08-09 12:43:43 +01:00
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Luo, Yuanke	53642d5b80	[NFC] Fix the formula for reciprocal calculation. Differential Revision: https://reviews.llvm.org/D107713	2021-08-09 16:03:56 +08:00
Min-Yih Hsu	7cbcde4aa3	[M68k] Use separate asm operand class for different widths of address This could help asm parser to pick the correct variant of instruction. This patch also migrated all the control instructions MC tests.	2021-08-09 00:07:19 -07:00
Min-Yih Hsu	cf277f0b31	[M68k][NFC] Coalesce render methods in different asm register op class And assign RegClass (i.e. operand class for all GPR) as the super class of ARegClass and DRegClass. Note that this is a NFC change because actually we already had XRDReg to model either address or data register operands (as well as test coverage for it). The new super class syntax added here is just making the relations between three RegClass-es more explicit.	2021-08-09 00:07:19 -07:00
Cullen Rhodes	1a18bb9270	[AArch64] NFC: Remove DecodeVectorRegisterClass from disassembler The decoder function and table are the same as FPR128, use that instead. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107644	2021-08-09 06:52:47 +00:00
Simon Atanasyan	990e8025b5	[MC][ELF] Do not error on parsing .debug_* section directive for MIPS MIPS .debug_* sections should have SHT_MIPS_DWARF section type to distinguish among sections contain DWARF and ECOFF debug formats, but in assembly files these sections have SHT_PROGBITS (@progbits) type. Now assembler shows 'changed section type for ...' error when parsing `.section .debug_*,"",@progbits` directive for MIPS targets. The same problem exists for x86-64 target and this patch extends workaround implemented in D76151. The patch adds one more case when assembler ignores section types mismatch after `SwitchSection()` call. Differential Revision: https://reviews.llvm.org/D107707	2021-08-09 08:54:56 +03:00
Christudasan Devadasan	fcf2d5f402	Revert "SROA: Enhance speculateSelectInstLoads" This reverts commit `ffc3fb665d`.	2021-08-09 01:13:39 -04:00
Michael Liao	b5e470aa2e	[LowerMemIntrinsics] Typo fix.	2021-08-08 22:38:58 -04:00
Craig Topper	2f3b738960	[RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64. This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.	2021-08-08 18:30:48 -07:00
Craig Topper	88bc29f5f2	[RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC Previously we converted ISD condition codes to integers and stored them directly in our MIR instructions. The ISD enum kind of belongs to SelectionDAG so that seems like incorrect layering. This patch instead uses a CondCode node on RISCV::SELECT_CC until isel and then converts it from ISD encoding to a RISCV specific value. This value can be converted to/from the RISCV branch opcodes in the RISCV namespace. My larger motivation is to possibly support a microarchitectural feature of some CPUs where a short forward branch over a single instruction can be predicated internally. This will require a new pseudo instruction for select that needs to carry a branch condition and live probably until RISCVExpandPseudos. At that point it can be expanded to control flow without other instructions ending up in the predicated basic block. Using an ISD encoding in RISCVExpandPseudos doesn't seem like correct layering. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107400	2021-08-08 17:25:37 -07:00
Craig Topper	20dfe051ab	[RISCV] Move the $rs operand of PseudoStore from outs to ins. NFC This is the data to be stored so it should be an input. To keep operand order similar between loads and stores, move the temp register to the first dest operand of floating point loads. Rework the assembler code accordingly. This doesn't have any functional effect because this Pseudo is only used by the assembler which doesn't use ins/outs. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107309	2021-08-08 15:58:24 -07:00
Kazu Hirata	d9c9d13365	[DWARF] Remove collectChildrenAddressRanges (NFC) The last use was removed on Dec 21, 2018 in commit `c3f30a7fc6`.	2021-08-08 08:57:32 -07:00
Dorit Nuzman	67278b8a90	[LV] Support Interleaved Store Group With Gaps Teach LV to use masked-store to support interleave-store-group with gaps (instead of scatters/scalarization). The symmetric case of using masked-load to support interleaved-load-group with gaps was introduced a while ago, by https://reviews.llvm.org/D53668; This patch completes the store-scenario leftover from D53668, and solves PR50566. Reviewed by: Ayal Zaks Differential Revision: https://reviews.llvm.org/D104750	2021-08-08 10:32:02 +03:00
Min-Yih Hsu	657bb7262d	[M68k] Separate ADDA from ADD and migrate rest of the arithmetic MC tests Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed together, which not only violated Motorola assembly's syntax but also made asm parsing more difficult. This patch separates these two kinds of instructions migrate rest of the tests from test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic. Note that we observed minor regressions on codegen quality: Sometimes isel uses ADD instead of ADDA even the latter can lead to shorter sequence of code. This issue implies that some isel patterns might need to be updated.	2021-08-07 17:19:12 -07:00
Craig Topper	d4ee84ceee	[RISCV] Support FP_TO_S/UINT_SAT for i32 and i64. The fcvt fp to integer instructions saturate if their input is infinity or out of range, but the instructions produce a maximum integer for nan instead of 0 required for the ISD opcodes. This means we can use the instructions to do the saturating conversion, but we'll need to fix up the nan case at the end. We can probably improve the i8 and i16 default codegen as well, but I'll leave that for a follow up. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107230	2021-08-07 16:06:00 -07:00
Nikita Popov	88003cea1c	[MemCpyOpt] Remove MemDepAnalysis-based implementation The MemorySSA-based implementation has been enabled for a few months (since D94376). This patch drops the old MDA-based implementation entirely. I've kept this to only the basic cleanup of dropping various conditions -- the code could be further cleaned up now that there is only one implementation. Differential Revision: https://reviews.llvm.org/D102113	2021-08-07 22:35:44 +02:00
Krishna	a9a176ca3b	[InstCombine] Remove nnan requirement for transformation to fabs from select In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs. (i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub). (ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D106872	2021-08-07 22:38:45 +05:30
Craig Topper	24dfba8d50	[X86] Teach shouldSinkOperands to recognize pmuldq/pmuludq patterns. The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg pattern on the inputs. Ideally we pattern match these away during isel. It is possible for LICM or other middle end optimizations to separate the extend from the mul. This prevents SelectionDAG from removing it or depending on how the extend is lowered, we may not be able to generate an AssertSExt/AssertZExt in the mul basic block. This will prevent pmuldq/pmuludq from being formed at all. This patch teaches shouldSinkOperands to recognize this so that CodeGenPrepare will clone the extend into the same basic block as the mul. Fixes PR51371. Differential Revision: https://reviews.llvm.org/D107689	2021-08-07 08:45:56 -07:00
Roman Lebedev	0a241e90d4	[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))` Instead of expanding it ourselves, we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2: https://alive2.llvm.org/ce/z/ymz7zE (self) https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext) https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)	2021-08-07 17:31:33 +03:00
Roman Lebedev	c6ff867f92	[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))` Now that we canonicalize low bit splatting to the form we were emitting here ourselves, emit simpler IR that will be canonicalized later. See `1e801439be` for proofs: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)	2021-08-07 17:31:24 +03:00
Roman Lebedev	e71870512f	[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305) Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF), so we should have a preference. It seems like mask+negation is better than two shifts.	2021-08-07 17:25:28 +03:00
Christudasan Devadasan	ffc3fb665d	SROA: Enhance speculateSelectInstLoads Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-07 09:09:14 -04:00
Florian Hahn	a00aafc30d	[VPlan] Iterate over phi recipes to detect reductions to fix. After refactoring the phi recipes, we can now iterate over all header phis in a VPlan to detect reductions when it comes to fixing them up when tail folding. This reduces the coupling with the cost model & legal by using the information directly available in VPlan. It also removes a call to getOrAddVPValue, which references the original IR value which may become outdated after VPlan transformations. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100102	2021-08-07 14:06:50 +01:00
Amara Emerson	4c2e01232c	[GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs. We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() instead of eraseFromParent(). We should probably use that in other places too but fix this issue which affects clang bootstrap builds for now.	2021-08-07 01:41:02 -07:00
Nemanja Ivanovic	62fe3dcf98	Fix PPC buildbot break caused by `4c4093e6e3` This commit adds the isnan intrinsic and provides a default expansion for it in the SDAG. However, it makes the assumption that types it operates on are IEEE-compliant types. This is not always the case. An example of that is PPC "double double" which has a representation that - Does not need to conform to IEEE requirements for isnan as it is not an IEEE-compliant type - Does not have a representation that allows for straightforward reinterpreting as an integer and use of integer operations The result was that this commit broke __builtin_isnan for ppc_fp128 making many valid numeric values report a NaN. This patch simply changes the expansion to always expand to unordered comparison (regardless of whether FP exceptions are tracked). This is inline with previous semantics.	2021-08-06 22:10:20 -05:00
Steffen Larsen	1b4c85fc02	[NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5. PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D107046	2021-08-06 16:13:35 -07:00
Sanjay Patel	0369714b31	[InstCombine] reduce vector casting before icmp There may be some generalizations (see test comments) of these patterns, but this should handle the cases motivated by: https://llvm.org/PR51315 https://llvm.org/PR51259 The backend may want to transform differently, but at least for the x86 examples that I looked at, there does not appear to be any significant perf diff either way.	2021-08-06 17:09:38 -04:00
Michael Liao	05783e1cfe	[amdgpu] Revise the conversion from i64 to f32. - Replace 'cmp+sel' with 'umin' if possible. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107507	2021-08-06 17:01:47 -04:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
Thomas Johnson	f8a4495149	[ARC] Add codegen for llvm.ctlz intrinsic for the ARC backend Differential Revision: https://reviews.llvm.org/D107611	2021-08-06 12:18:06 -07:00
Artem Belevich	6a9cf21f5a	[CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that there are users that do not expect LLVM to materialize `memset` intrinsic. While other passes can do that, too, MemCpyOpt triggers it more frequently and breaks sanitizers and some downstream users. For now introduce a flag to force-enable the flag and opt-in only CUDA compilation with NVPTX back-end. Differential Revision: https://reviews.llvm.org/D106401	2021-08-06 11:13:52 -07:00
Zheng Chen	30b0c455b1	[LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D107618	2021-08-06 16:51:09 +00:00
Jon Roelofs	eae4a44c1d	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-06 09:48:39 -07:00
Michael Liao	d1cacd5928	[MemCpyOpt] Teach memcpyopt to handle loads from the constant memory. - Loads from the constant memory (either explicit one or as the source of memory transfer intrinsics) won't alias any stores. Reviewed By: asbirlea, efriedma Differential Revision: https://reviews.llvm.org/D107605	2021-08-06 12:43:52 -04:00
Craig Topper	b2ca4dc935	[LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available. This isn't optimal, but prevents crashing when the libcall isn't available. It just calculates the full product and makes sure the high bits match the sign of the low half. Each of the pieces should go through their own type legalization. This can make D107420 unnecessary. Needs tests, but I wanted to start discussion about D107420. Reviewed By: FreddyYe Differential Revision: https://reviews.llvm.org/D107581	2021-08-06 09:43:01 -07:00
David Green	77e8f4eeee	[ARM] Define ComplexPatternFuncMutatesDAG Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we need to define ComplexPatternFuncMutatesDAG. Differential Revision: https://reviews.llvm.org/D107476	2021-08-06 17:35:11 +01:00
Kazu Hirata	276be84d0a	[CodeGen] Remove computeDefOperandLatency (NFC) The last use was removed on Oct 9, 2016 in commit `5c924d7117`.	2021-08-06 08:26:55 -07:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Mircea Trofin	ae1a2a09e4	[NFC][MLGO] Make logging more robust 1) add some self-diagnosis (when asserts are enabled) to check that all features have the same nr of entries 2) avoid storing pointers to mutable fields because the proto API contract doesn't actually guarantee those stay fixed even if no further mutation of the object occurs. Differential Revision: https://reviews.llvm.org/D107594	2021-08-06 04:44:52 -07:00
Reshabh Sharma	5173854f19	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-06 15:53:33 +05:30
Simon Pilgrim	dbce6a8d9d	[ARM] Fold insert_subvector to concat_vectors D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage. I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.	2021-08-06 11:21:31 +01:00
Simon Pilgrim	18e6a03b1a	[X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281) As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width. The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data. This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize. Differential Revision: https://reviews.llvm.org/D107158	2021-08-06 11:21:31 +01:00
Cullen Rhodes	08bc441174	[AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst	2021-08-06 09:23:18 +00:00

... 2 3 4 5 6 ...

149978 Commits