llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Roelofs	b15f2bd3ad	[early-ifcvt] Add OptRemarks	2020-08-28 15:51:18 -06:00
Matt Arsenault	9145d75226	AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples The implicit def of the super register would appear to kill any live uses of components before the spill, and would be deleted by MachineCopyPropagation. We need to add implicit uses of the super register, similarly to what copyPhysReg does. VGPR tuples appear to be correctly handled already. I need to double check the SGPR->memory path.	2020-08-28 17:50:37 -04:00
Craig Topper	aab90384a3	[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None) There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None. So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline. This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations. Differential Revision: https://reviews.llvm.org/D86744	2020-08-28 13:23:45 -07:00
Arthur Eubanks	cfde93e5d6	[ObjCARCOpt] Port objc-arc to NPM Since doInitialization() in the legacy pass modifies the module, the NPM pass is a Module pass. Reviewed By: ahatanak, ychen Differential Revision: https://reviews.llvm.org/D86178	2020-08-28 12:59:33 -07:00
Tyker	6d3657417e	[SROA] Improve handleling of assumes bundles by SROA This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e And gives SROA the ability to remove assumes if it allows promoting an alloca to register Without removing assumes when it can't promote to register. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86570	2020-08-28 21:55:45 +02:00
Nikita Popov	ffe05dd125	[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178) Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding usub.sat(a, b) + b to umax(a, b). The backend will expand umax back to usubsat if that is profitable. We may also want to handle uadd.sat(a, b) - b in the future. Differential Revision: https://reviews.llvm.org/D63060	2020-08-28 21:52:29 +02:00
serge-sans-paille	2296182181	Skip analysis re-computation when no changes are reported This is a follow-up to https://reviews.llvm.org/D80707, generalized to CallGraphSCC, Loop and Region Differential Revision: https://reviews.llvm.org/D86442	2020-08-28 21:41:01 +02:00
Sjoerd Meijer	5f1cad4d29	[ARM] Skip combining base updates for vld1x NEON intrinsics Skip this for now, to avoid a backend crash in: UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412 This should fix PR45824. Differential Revision: https://reviews.llvm.org/D86784	2020-08-28 20:29:15 +01:00
Benjamin Kramer	8782c72765	Strength-reduce SmallVectors to arrays. NFCI.	2020-08-28 21:14:20 +02:00
Benjamin Kramer	52cc97a0db	[CodeGenPrepare] Zap the argument of llvm.assume when deleting it We know that the argument is mostly likely dead, so we can purge it early. Otherwise it would make it to codegen, and can block further optimizations.	2020-08-28 20:52:22 +02:00
Snehasish Kumar	94faadaca4	[llvm][CodeGen] Machine Function Splitter We introduce a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently introduced in LLVM from the Propeller project. The pass targets functions with profile coverage, identifies cold blocks and moves them to a separate section. The linker groups all cold blocks across functions together, decreasing fragmentation and improving icache and itlb utilization. We evaluated the Machine Function Splitter pass on clang bootstrap and SPECInt 2017. For clang bootstrap we observe a mean 2.33% runtime improvement with a ~32% reduction in itlb and stlb misses. Additionally, L1 icache misses reduced by 9.5% while L2 instruction misses reduced by 20%. For SPECInt we report the change in IntRate the C/C++ benchmarks. All benchmarks apart from mcf and x264 improve, on average by 0.6% with the max for deepsjeng at 1.6%. Benchmark % Change 500.perlbench_r 0.78 502.gcc_r 0.82 505.mcf_r -0.30 520.omnetpp_r 0.18 523.xalancbmk_r 0.37 525.x264_r -0.46 531.deepsjeng_r 1.61 541.leela_r 0.83 557.xz_r 0.15 Differential Revision: https://reviews.llvm.org/D85368	2020-08-28 11:10:14 -07:00
Anna Welker	064981f0ce	[ARM][MVE] Enable MVE gathers and scatters by default Enable MVE gather/scatters by default, which requires some minor adaptations in some tests. Differential revision: https://reviews.llvm.org/D86776	2020-08-28 19:05:29 +01:00
David Green	4ca60915bc	[ARM] Correct predicate operand for offset gather/scatter These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask. Differential Revision: https://reviews.llvm.org/D86791	2020-08-28 17:48:15 +01:00
Albion Fung	331dcc43ea	[PowerPC] Implemented Vector Load with Zero and Signed Extend Builtins This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1. Differential Revision: https://reviews.llvm.org/D82502#inline-797941	2020-08-28 11:28:58 -05:00
Denis Antrushin	fabd4c1ae1	[Statepoint] Always spill base pointer. There is a subtle problem with new statepoint lowering scheme when base and pointers are the same (see PR46917 for more context): %1 = STATEPOINT ... %0, %0(tied-def 0)... if, for some reason, register allocator desides to put two instances of %0 into two different objects (registers or spill slots), we may end up with $reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)... and nothing will prevent later passes to sink uses of $reg2 below statepoint, which is incorrect. As a short term solution, always put base pointers on stack during lowering. A longer term solution may be to rework MIR statepoint format to avoid GC pointer duplication in statepoint argument list. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D86712	2020-08-28 23:22:07 +07:00
Yonghong Song	443d352a1c	[GlobalISel] fix a compilation error with gcc 6.3.0 With gcc 6.3.0, I hit the following compilation error: ../lib/CodeGen/GlobalISel/Combiner.cpp: In member function ‘bool llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&, llvm::GISelCSEInfo*)’: ../lib/CodeGen/GlobalISel/Combiner.cpp:156:54: error: suggest parentheses around ‘&&’ within ‘\|\|’ [-Werror=parentheses] assert(!CSEInfo \|\| !errorToBool(CSEInfo->verify()) && ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ "CSEInfo is not consistent. Likely missing calls to " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "observer on mutations"); Fix the code as suggested by the compiler.	2020-08-28 09:16:52 -07:00
QingShan Zhang	deb4b25807	[DAGCombine] Don't delete the node if it has uses immediately This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node. ``` if (NegX && (CostX <= CostY)) { Cost = std::min(CostX, CostZ); RemoveDeadNode(NegY); return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags); #<-- NegY is used here if NegY == NegX. } ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86689	2020-08-28 16:13:43 +00:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Xing GUO	f20e6c7253	[DWARFYAML] Abbrev codes in a new abbrev table should start from 1 (by default). The abbrev codes in a new abbrev table should start from 1 (by default), rather than inherit the value from the code in the previous table. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86545	2020-08-28 21:18:11 +08:00
Denis Antrushin	248a67f144	[Statepoint] Turn assert into check in foldPatchpoint. Original D81646 had check for tied regs in foldPatchpoint(). Due to unfortunate miscommunication with review comments and adressing some comments post commit, it turned into assertion. We had an offline talk and agreed that with current implementation this path is possible, so I'm changing it back to check. Note that this is workaround until ussues described in PR46917 are resolved.	2020-08-28 20:00:23 +07:00
Sam Parker	b30adfb529	[ARM][LowOverheadLoops] Liveouts and reductions Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero. Differential Revision: https://reviews.llvm.org/D86613	2020-08-28 13:56:16 +01:00
Benjamin Kramer	3524c23ff2	[SCCP] Use bulk-remove API to bulk-remove attributes. NFCI.	2020-08-28 14:44:14 +02:00
Benjamin Kramer	dce72dc870	[FunctionAttrs] Bulk remove attributes. NFC.	2020-08-28 12:56:19 +02:00
Ties Stuij	d678e14c55	[AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported Previously in addTypeForNeon, we would set the operations for bfloat vectors like other generic types. But as bfloat is a storage-only type a number of operations shouldn't be set. This patch fixes that. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85101	2020-08-28 11:44:37 +01:00
Florian Hahn	43aa7227df	[DSE,MemorySSA] Check if Current is valid for elimination first. This changes getDomMemoryDef to check if a Current is a valid candidate for elimination before checking for reads. Before the change, we were spending a lot of compile-time in checking for read accesses for Current that might not even be removable. This patch flips the logic, so we skip Current if they cannot be removed before checking all their uses. This is much more efficient in practice. It also adds a more aggressive limit for checking partially overlapping stores. The main problem with overlapping stores is that we do not know if they will lead to elimination until seeing all of them. This patch limits adds a new limit for overlapping store candidates, which keeps the number of modified overlapping stores roughly the same. This is another substantial compile-time improvement (while also increasing the number of stores eliminated). Geomean -O3 -0.67%, ReleaseThinLTO -0.97%. http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86487	2020-08-28 11:19:04 +01:00
Florian Hahn	fd6ebea50d	[MemLoc] Support memcmp in MemoryLocation::getForArgument. This patch adds support for memcmp in MemoryLocation::getForArgument. memcmp reads from the first 2 arguments up to the number of bytes of the third argument. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86725	2020-08-28 10:19:54 +01:00
Florian Hahn	20e989e9de	[BuildLibCalls] Add argmemonly to more lib calls. strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr, memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp should all only access memory through their arguments. I broke out strcoll, strcasecmp, strncasecmp because the result depends on the locale, which might get accessed through memory. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86724	2020-08-28 09:50:38 +01:00
Martin Storsjö	db1ec04963	[ValueTracking] Remove a stray semicolon. NFC. This silences warnings when built with GCC at least.	2020-08-28 09:24:10 +03:00
Martin Storsjö	37ef743cbf	[MC] [Win64EH] Avoid producing malformed xdata records If there's no unwinding opcodes, omit writing the xdata/pdata records. Previously, this generated truncated xdata records, and llvm-readobj would error out when trying to print them. If writing of an xdata record is forced via the .seh_handlerdata directive, skip it if there's no info to make a sensible unwind info structure out of, and clearly error out if such info appeared later in the process. Differential Revision: https://reviews.llvm.org/D86527	2020-08-28 09:05:36 +03:00
serge-sans-paille	b1f4e5979b	(Expensive) Check for Loop, SCC and Region pass return status This generalizes the logic introduced in https://reviews.llvm.org/D80916 to other passes. It's needed by https://reviews.llvm.org/D86442 to assert passes correctly report their status. Differential Revision: https://reviews.llvm.org/D86589	2020-08-28 07:56:35 +02:00
Kai Luo	cbea17568f	[PowerPC] PPCBoolRetToInt: Don't translate Constant's operands When collecting `i1` values via `findAllDefs`, ignore Constant's operands, since Constant's operands might not be `i1`. Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE ``` llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant llvm::ConstantExpr::getZExt(llvm::Constant , llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed. ``` Differential Revision: https://reviews.llvm.org/D85007	2020-08-28 01:56:12 +00:00
Alina Sbirlea	d370836c20	[MemorySSA] Assert defining access is not a MemoryUse.	2020-08-27 18:21:10 -07:00
Harmen Stoppels	cdcb9ab10e	Revert "Use find_library for ncurses" The introduction of find_library for ncurses caused more issues than it solved problems. The current open issue is it makes the static build of LLVM fail. It is better to revert for now, and get back to it later. Revert "[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows" This reverts commit `1ed1e16ab8`. Revert "Fix msan build" This reverts commit `34fe9613dd`. Revert "[CMake] Always mark terminfo as unavailable on Windows" This reverts commit `76bf26236f`. Revert "[CMake] Fix OCaml build failure because of absolute path in system libs" This reverts commit `8e4acb82f7`. Revert "[CMake] Don't look for terminfo libs when LLVM_ENABLE_TERMINFO=OFF" This reverts commit `495f91fd33`. Revert "Use find_library for ncurses" This reverts commit `a52173a3e5`. Differential revision: https://reviews.llvm.org/D86521	2020-08-27 17:57:26 -07:00
Matt Arsenault	5feca7c9c3	GlobalISel: Implement computeNumSignBits for G_SEXT_INREG	2020-08-27 19:44:37 -04:00
Matt Arsenault	af1c1e20f4	AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize	2020-08-27 19:39:44 -04:00
Matt Arsenault	9d3dc276a6	AMDGPU: Fix broken switch braces	2020-08-27 19:39:39 -04:00
Matt Arsenault	f08bbde83f	Correctly revert "GlobalISel: Use & operator on KnownBits" I mis-resolved the revert through moving the code to another function.	2020-08-27 19:08:31 -04:00
Matt Arsenault	6cf4f25670	Revert "GlobalISel: Use & operator on KnownBits" This reverts commit `e53b799779`. Confusingly, this does not simply and the two sets of known bits, but implements known bits for the and operator.	2020-08-27 18:52:34 -04:00
Vitaly Buka	23524fdece	[ValueTracking] Replace recursion with Worklist Now findAllocaForValue can handle nontrivial phi cycles.	2020-08-27 14:44:49 -07:00
Brad Smith	d870e36326	[SSP] Restore setting the visibility of __guard_local to hidden for better code generation. Patch by: Philip Guenther	2020-08-27 17:17:38 -04:00
Shinji Okumura	50ebd1afa9	[Attributor] Do not manifest noundef for dead positions Even if noundef is deduced for a position, we should not manifest it when the position is dead. This is because the associated values with dead positions are replaced with undef values by AAIsDead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86565	2020-08-28 05:58:18 +09:00
Matt Arsenault	abc99ab572	GlobalISel: Implement known bits for min/max	2020-08-27 16:56:17 -04:00
Matt Arsenault	ee679638d7	MIR: Infer not-SSA for subregister defs It's possible to have a single virtual register def with a subreg index that would pass the previous check, but it's not possible to have a subregister def in SSA. This is in preparation for adding stricter checks for SSA MIR.	2020-08-27 16:56:16 -04:00
Vitaly Buka	a40660551e	[StackSafety] Ignore allocas with partial lifetime markers Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D86672	2020-08-27 13:54:41 -07:00
Vitaly Buka	a6927c8621	[NFC][ValueTracking] Add OffsetZero into findAllocaForValue For StackLifetime after finding alloca we need to check that values ponting to the begining of alloca. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D86692	2020-08-27 13:46:22 -07:00
Matt Arsenault	a1bc37c9e5	AMDGPU: Use caller subtarget, not intrinsic declaration Intrinsic declarations use the default subtarget, but this should be using the subtarget for the calling function. I haven't been able to come up with a case where it matters though.	2020-08-27 16:42:09 -04:00
Krzysztof Parzyszek	4ef9275b9b	[Hexagon] Emit better 32-bit multiplication sequence for HVXv62+	2020-08-27 15:24:32 -05:00
Eli Friedman	8d21985a75	[RegisterScavenging] Delete dead function unprocess().	2020-08-27 13:19:32 -07:00
Roman Lebedev	b85f91fdce	[InstSimplify] SimplifyPHINode(): check that instruction is in basic block first As pointed out in post-commit review, this can legally be called on instructions that are not inserted into basic blocks, so don't blindly assume that there is basic block.	2020-08-27 22:32:03 +03:00
Christopher Tetreault	035833ae42	[SVE] Remove bad call to VectorType::getNumElements() from HeapProfiler Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D86727	2020-08-27 12:16:00 -07:00
Shinji Okumura	c5e6872ec6	[Attributor] Guarantee getAAFor not to update AA in the manifestation stage If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated. This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86635	2020-08-28 04:07:42 +09:00
Christopher Tetreault	5e63083435	[SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D82056	2020-08-27 12:02:20 -07:00
Christopher Tetreault	5a55e2781c	[SVE] Remove calls to VectorType::getNumElements from IR Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D81500	2020-08-27 11:16:10 -07:00
Matt Arsenault	e53b799779	GlobalISel: Use & operator on KnownBits Avoid repeating for zero and one	2020-08-27 14:07:18 -04:00
Matt Arsenault	531f7063ba	GlobalISel: Implement known bits for G_MERGE_VALUES	2020-08-27 14:07:18 -04:00
Mikhail Maltsev	ae1396c7d4	[ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics This patch adjusts the following ARM/AArch64 LLVM IR intrinsics: - neon_bfmmla - neon_bfmlalb - neon_bfmlalt so that they take and return bf16 and float types. Previously these intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from implementation lacking bf16 IR type). The neon_vbfdot[q] intrinsics are adjusted similarly. This change required some additional selection patterns for vbfdot itself and also for vector shuffles (in a previous patch) because of SelectionDAG transformations kicking in and mangling the original code. This patch makes the generated IR cleaner (less useless bitcasts are produced), but it does not affect the final assembly. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86146	2020-08-27 18:43:16 +01:00
Craig Topper	ba852e1e19	[X86] Don't call hasFnAttribute and getFnAttribute for 'prefer-vector-width' and 'min-legal-vector-width' in getSubtargetImpl We only need to call getFnAttribute and then check if the Attribute is None or not.	2020-08-27 10:40:20 -07:00
Owen Anderson	e9d9a61208	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan Differential Revision: D70800	2020-08-27 17:29:41 +00:00
Teresa Johnson	5b9d462b7d	[HeapProf] Fix bot failures from instrumentation pass Fix bot failure from 7ed8124d46f94601d5f1364becee9cee8538265e: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/8533 Since we are always using dynamic shadow, insertDynamicShadowAtFunctionEntry should always return true for modifying the function.	2020-08-27 10:21:19 -07:00
Aditya Nandakumar	db464a3dbf	[GISel] Add new GISel combiners for G_SELECT https://reviews.llvm.org/D83833 Patch adds two new GICombinerRules for G_SELECT. The rules include: combining selects with undef comparisons into their first selectee value, and to combine away selects with constant comparisons. Patch additionally adds a new combiner test for the AArch64 target to test these new G_SELECT combiner rules and the existing select_same_val combiner rule. Patch by mkitzan	2020-08-27 09:40:15 -07:00
Simon Moll	c48b06c44f	[sda][nfc] clang-formatting	2020-08-27 18:27:44 +02:00
Shinji Okumura	7a68f0f1e0	[Attributor] Add a phase flag to Attributor Add a new flag that indicates which stage in the process we are in. This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86678	2020-08-28 01:16:38 +09:00
Aditya Nandakumar	5c2db1655b	[GISel]: Fix one more CSE Non determinism https://reviews.llvm.org/D86676 Sometimes we can have the following code x:gpr(s32) = G_OP Say we build G_OP2 to the same x and then delete the previous instruction. Using something like Register X = ...; auto NewMIB = CSEBuilder.buildOp2(X, ... args); Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method. This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code. Also this patch adds this verification at the end of the combiners as well.	2020-08-27 09:06:21 -07:00
Lucas Prates	3d943bcd22	[CodeGen] Properly propagating Calling Convention information when lowering vector arguments When joining the legal parts of vector arguments into its original value during the lower of Formal Arguments in SelectionDAGBuilder, the Calling Convention information was not being propagated for the handling of each individual parts. The same did not happen when lowering calls, causing a mismatch. This patch fixes the issue by properly propagating the Calling Convention details. This fixes Bugzilla #47001. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86715	2020-08-27 17:01:10 +01:00
Teresa Johnson	7ed8124d46	[HeapProf] Clang and LLVM support for heap profiling instrumentation See RFC for background: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html Note that the runtime changes will be sent separately (hopefully this week, need to add some tests). This patch includes the LLVM pass to instrument memory accesses with either inline sequences to increment the access count in the shadow location, or alternatively to call into the runtime. It also changes calls to memset/memcpy/memmove to the equivalent runtime version. The pass is modeled on the address sanitizer pass. The clang changes add the driver option to invoke the new pass, and to link with the upcoming heap profiling runtime libraries. Currently there is no attempt to optimize the instrumentation, e.g. to aggregate updates to the same memory allocation. That will be implemented as follow on work. Differential Revision: https://reviews.llvm.org/D85948	2020-08-27 08:50:35 -07:00
Roman Lebedev	6102310d81	[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify, nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places.. While i could teach EarlyCSE how to hash PHI nodes, we can't really do much (anything?) even if we find two identical PHI nodes in different basic blocks, same-BB case is the interesting one, and if we teach InstSimplify about it (which is what i wanted originally, https://reviews.llvm.org/D86530), we get EarlyCSE support for free. So i would think this is pretty uncontroversial. On vanilla llvm test-suite + RawSpeed, this has the following effects: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| instsimplify.NumPHICSE \| 0 \| 23779 \| 23779 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 7942328 \| 7942392 \| 64 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 273069192 \| 273084704 \| 15512 \| 0.01% \| 0.01% \| \| correlated-value-propagation.NumPhis \| 18412 \| 18539 \| 127 \| 0.69% \| 0.69% \| \| early-cse.NumCSE \| 2183283 \| 2183227 \| -56 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 542090 \| -8015 \| -1.46% \| 1.46% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640264 \| 3664769 \| 24505 \| 0.67% \| 0.67% \| \| instcombine.NumDeadInst \| 1778193 \| 1783183 \| 4990 \| 0.28% \| 0.28% \| \| instcount.NumCallInst \| 1758401 \| 1758799 \| 398 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330533 \| -24 \| -0.01% \| 0.01% \| \| instcount.TotalInsts \| 8831952 \| 8832286 \| 334 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019808 \| 999607 \| -20201 \| -1.98% \| 1.98% \| ``` I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call` transforms, and counter-intuitively results in more instructions total. That being said, the PHI count doesn't decrease that much, and looking at some examples, it seems at least some of them were previously getting PHI CSE'd in SimplifyCFG of all places.. I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time. As a comment in `InstCombinerImpl::visitPHINode()` already stated, there are no guarantees on the ordering of the operands of a PHI node, so if we just naively compare them, we may false-negatively say that the nodes are not equal when the only difference is operand order, which is especially important since the fold is in InstSimplify, so we can't rely on InstCombine sorting them beforehand. Fixing this for the general case is costly (geomean +0.02%), and does not appear to catch anything in test-suite, but for the same-BB case, it's trivial, so let's fix at least that. As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions this appears to cause geomean +0.03% compile time increase (regression), but geomean -0.01%..-0.04% code size decrease (improvement).	2020-08-27 18:47:04 +03:00
Alexandre Ganea	a6a37a2fcd	[Support] On Windows, add optional support for {rpmalloc\|snmalloc\|mimalloc} This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree. To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc. When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future. When enabled, this changes the memory stack from: new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc to: new/delete -> {rpmalloc\|snmalloc\|mimalloc} -> VirtualAlloc The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration. Differential Revision: https://reviews.llvm.org/D71786	2020-08-27 11:09:46 -04:00
diggerlin	6923b0a76e	Revert "[AIX][XCOFF] emit symbol visibility for xcoff object file." This reverts commit `a081868921`. Based on the Hubert Tong'comment https://reviews.llvm.org/D84265#inline-799085	2020-08-27 11:07:58 -04:00
Benjamin Kramer	b5924a8e27	[Hexagon] Fold another layer of single-use variable into assert. NFCI.	2020-08-27 16:52:34 +02:00
Benjamin Kramer	2b7df2707f	[Hexagon] Fold single-use variable into assert. NFCI.	2020-08-27 16:44:22 +02:00
Matt Arsenault	6c770a09be	AMDGPU: Hoist subtarget lookup	2020-08-27 10:27:56 -04:00
Krzysztof Parzyszek	154daf1f94	[Hexagon] Widen short vector stores to HVX vectors using masked stores Also invent a flag -hexagon-hvx-widen=N to set the minimum threshold for widening short vectors to HVX vectors.	2020-08-27 09:25:08 -05:00
Florian Hahn	419c6948df	[SimplifyLibCalls] Remove over-eager early return in strlen optzns. Currently we bail out early for strlen calls with a GEP operand, if none of the GEP specific optimizations fire. But there could be later optimizations that still apply, which we currently miss out on. An example is that we do not apply the following optimization strlen(x) == 0 --> *x == 0 Unless I am missing something, there seems to be no reason for bailing out early there. Fixes PR47149. Reviewed By: lebedev.ri, xbolva00 Differential Revision: https://reviews.llvm.org/D85886	2020-08-27 15:19:45 +01:00
Pavel Labath	9cb222e749	[cmake] Make gtest include directories a part of the library interface This applies the same fix that D84748 did for macro definitions. Appropriate include path is now automatically set for all libraries which link against gtest targets, which avoids the need to set include_directories in various parts of the project. Differential Revision: https://reviews.llvm.org/D86616	2020-08-27 15:35:57 +02:00
serge-sans-paille	4e29d25669	Fix OpenMP deduplicateRuntimeCalls return status Differential Revision: https://reviews.llvm.org/D86705	2020-08-27 15:01:04 +02:00
serge-sans-paille	5621571fc7	Fix Attributor return status Differential Revision: https://reviews.llvm.org/D86703	2020-08-27 15:01:04 +02:00
Jay Foad	45eeb8c2a9	[AMDGPU] Remove unused variable introduced in r251860	2020-08-27 13:28:32 +01:00
Drew Wock	0ec098e22b	[FPEnv] Allow fneg + strict_fadd -> strict_fsub in DAGCombiner This is the first of a set of DAGCombiner changes enabling strictfp optimizations. I want to test to waters with this to make sure changes like these are acceptable for the strictfp case- this particular change should preserve exception ordering and result precision perfectly, and many other possible changes appear to be able to as well. Copied from regular fadd combines but modified to preserve ordering via the chain, this change allows strict_fadd x, (fneg y) to become struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x. Differential Revision: https://reviews.llvm.org/D85548	2020-08-27 08:17:01 -04:00
Florian Hahn	bb024c3c4e	[DSE,MemorySSA] Remove short-cut to check if all paths are covered. The post-order number early continue does not work in some cases, e.g. if a path from EarlierAccess to an exit includes a node that dominates EarlierAccess in a cycle. The short-cut only has very minor impact on compile-time, so it seems straight-forward to remove it for now: http://llvm-compile-time-tracker.com/compare.php?from=062412e79fcfedf2cf004433e42036b0333e3f83&to=d7386016a77ce1387bdbbf360f1de157faea9d31&stat=instructions Fixes PR47285.	2020-08-27 12:42:40 +01:00
OCHyams	b6cca0ec05	Revert "[DWARF] Add cuttoff guarding quadratic validThroughout behaviour" This reverts commit `b9d977b0ca`. This cutoff is no longer required. The commit 34ffa7fc501 (D86153) introduces a performance improvement which was tested against the motivating case for this patch. Discussed in differential revision: https://reviews.llvm.org/D86153	2020-08-27 11:52:30 +01:00
OCHyams	57d8acac64	[DwarfDebug] Improve validThroughout performance (4/4) Almost NFC (see end). The backwards scan in validThroughout significantly contributed to compile time for a pathological case, causing the 'X86 Assembly Printer' pass to account for roughly 70% of the run time. This patch guards the loop against running unnecessarily, bringing the pass contribution down to 4%. Almost NFC: There is a hack in validThroughout which promotes single constant value DBG_VALUEs in the prologue to be live throughout the function. We're more likely to hit this code path with this patch applied. Similarly to the parent patches there is a small coverage change reported in the order of 10s of bytes. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86153	2020-08-27 11:52:30 +01:00
OCHyams	3c491881d2	[DwarfDebug] Improve multi-BB single location detection in validThroughout (3/4) With the changes introduced in D86151 we can now check for single locations which span multiple blocks for inlined scopes and blocks. D86151 introduced the InstructionOrdering parameter, replacing a scan through MBB instructions. The functionality to compare instruction positions across blocks was add there, and this patch just removes the exit checks that were previously (but no longer) required. CTMark shows a geomean binary size reduction of 2.2% for RelWithDebInfo builds. llvm-locstats (using D85636) shows a very small variable location coverage change in 5 of 10 binaries, but just like in D86151 it is only in the order of 10s of bytes. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D86152	2020-08-27 11:52:29 +01:00
OCHyams	0b5a8050ea	[DwarfDebug] Improve single location detection in validThroughout (2/4) With this patch we're now accounting for two more cases which should be considered 'valid throughout': First, where RangeEnd is ScopeEnd. Second, where RangeEnd comes before ScopeEnd when including meta instructions, but are both preceded by the same non-meta instruction. CTMark shows a geomean binary size reduction of 1.5% for RelWithDebInfo builds. `llvm-locstats` (using D85636) shows a very small variable location coverage change in 2 of 10 binaries, but it is in the order of 10s of bytes which lines up with my expectations. I've added a test which checks both of these new cases. The first check in the test isn't strictly necessary for this patch. But I'm not sure that it is explicitly tested anywhere else, and is useful for the final patch in the series. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86151	2020-08-27 11:52:29 +01:00
OCHyams	e048ea7b1a	[NFC][DebugInfo] Create InstructionOrdering helper class (1/4) Group the map and methods used to query instruction ordering for trimVarLocs (D82129) into a class. This will make it easier to reuse the functionality upcoming patches. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86150	2020-08-27 11:52:29 +01:00
Mikhail Maltsev	23d5e93f34	[AArch64] Optimize instruction selection for certain vector shuffles This patch adds code to recognize vector shuffles which can be represented as VDUP (splat) of a vector lane with of a different (wider) type than the original vector lane type. For example: shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1> is essentially: shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0> Such patterns are generated by the SelectionDAG machinery in some cases (see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double bitcasts from shuffles" part). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86225	2020-08-27 11:06:49 +01:00
Paul Walker	81337c915f	[SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source. Differential Revision: https://reviews.llvm.org/D86394	2020-08-27 10:57:37 +01:00
Sander de Smalen	4e9b66de3f	[AArch64][SVE] Add missing debug info for ACLE types. This patch adds type information for SVE ACLE vector types, by describing them as vectors, with a lower bound of 0, and an upper bound described by a DWARF expression using the AArch64 Vector Granule register (VG), which contains the runtime multiple of 64bit granules in an SVE vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86101	2020-08-27 10:56:42 +01:00
Alex Richardson	5ba4d0365b	[RISC-V] fmv.s/fmv.d should be as cheap as a move Since the canonical floatig-point move is fsgnj rd, rs, rs, we should handle this case in RISCVInstrInfo::isAsCheapAsAMove(). Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D86518	2020-08-27 10:32:23 +01:00
Alex Richardson	a11eeb4d4a	[RISC-V] Mark C_MV as a move instruction Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D86517	2020-08-27 10:32:23 +01:00
Alex Richardson	2259ce8c91	[RISC-V] ADDI/ORI/XORI x, 0 should be as cheap as a move The isTriviallyRematerializable hook is only called for instructions that are tagged as isAsCheapAsAMove. Since ADDI 0 is used for "mv" it should definitely be marked with "isAsCheapAsAMove". This change avoids one stack spill in most of the atomic-rmw.ll tests functions. It also avoids stack spills in two of our out-of-tree CHERI tests. ORI/XORI with zero may or may not be the same as a move micro-architecturally, but since we are already doing it for register == x0, we might as well do the same if the immediate is zero. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D86480	2020-08-27 10:32:22 +01:00
Vitaly Buka	469debe027	[ValueTracking] Support select in findAllocaForValue	2020-08-27 02:13:52 -07:00
Florian Hahn	e717fdb0f1	[DSE,MemorySSA] Traverse use-def chain without MemSSA Walker. For DSE with MemorySSA it is beneficial to manually traverse the defining access, instead of using a MemorySSA walker, so we can better control the number of steps together with other limits and also weed out invalid/unprofitable paths early on. This patch requires a follow-up patch to be most effective, which I will share soon after putting this patch up. This temporarily XFAIL's the limit tests, because we now explore more MemoryDefs that may not alias/clobber the killing def. This will be improved/fixed by the follow-up patch. This patch also renames some `Dom` variables to `Earlier`, because the dominance relation is not really used/important here and potentially confusing. This patch allows us to aggressively cut down compile time, geomean -O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores removed. Subsequent patches will increase the number of removed stores again, while keeping compile-time in check. http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86486	2020-08-27 10:02:02 +01:00
Sjoerd Meijer	1d8af682ef	Revert "[Verifier] Additional check for intrinsic get.active.lane.mask" This reverts commit `8d5f64c4ed`. Thanks to Eli Friedma for pointing out that this check is not appropiate here, this check will be moved to the Lint pass.	2020-08-27 09:27:05 +01:00
Piotr Sobczak	4e9d207117	[AMDGPU] Preserve vcc_lo when shrinking V_CNDMASK There is no justification for changing vcc_lo to vcc when shrinking V_CNDMASK, and such a change could later confuse live variable analysis. Make sure the original register is preserved. Differential Revision: https://reviews.llvm.org/D86541	2020-08-27 10:22:50 +02:00
Shinji Okumura	6c25eca614	[Attributor] Add flag for undef value to the state of AAPotentialValues Currently, an undef value is reduced to 0 when it is added to a set of potential values. This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85592	2020-08-27 16:30:29 +09:00
Sam Parker	03141aa04a	[ARM] Enable outliner at -Oz for M-class Enable default outlining when the function has the minsize attribute and we're targeting an m-class core. Differential Revision: https://reviews.llvm.org/D82951	2020-08-27 08:02:56 +01:00
Martin Storsjö	04879086b4	Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain" This reverts commit `9936455204`. That commit caused failed assertions e.g. like this: $ cat alloca.c a; b() { float c; d(); a = __builtin_alloca(d); c = e(); f(a); return c; } $ clang -target aarch64-linux-gnu -c alloca.c -O2 clang: ../lib/Target/AArch64/AArch64InstrInfo.cpp:3446: void llvm::emitFrameOffset(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::iterator, const llvm::DebugLoc&, unsigned int, unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo, llvm::MachineInstr::MIFlag, bool, bool, bool): Assertion `(DestReg != AArch64::SP \|\| Bytes % 16 == 0) && "SP increment/decrement not 16-byte aligned"' failed.	2020-08-27 09:39:56 +03:00
luxufan	888c02deee	[RISCV] add the MC layer support of riscv vector Zvamo extension Implements the assemble and disassemble support of RISCV Vector extension zvamo instructions, base on the 0.9 spec version. Reviewed by HsiangKai Differential Revision: https://reviews.llvm.org/D85069	2020-08-27 14:11:38 +08:00
Sam Parker	a3e41d4581	[ARM] Make MachineVerifier more strict about terminators Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict. Differential Revision: https://reviews.llvm.org/D40061	2020-08-27 07:10:20 +01:00
Amy Kwan	76b0f99ea8	[PowerPC] Implement Vector Multiply High/Divide Extended Builtins in LLVM/Clang This patch implements the function prototypes vec_mulh and vec_dive in order to utilize the vector multiply high (vmulh[s\|u][w\|d]) and vector divide extended (vdive[s\|u][w\|d]) instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82609	2020-08-26 23:14:34 -05:00
Matt Arsenault	5207545a86	GlobalISel: IRTranslate minimum of pointer sizes on memcpy I forgot to squash this with `0b7f6cc71a`	2020-08-26 20:10:00 -04:00
Matt Arsenault	0b7f6cc71a	GlobalISel: Add generic instructions for memory intrinsics AArch64, X86 and Mips currently directly consumes these and custom lowering to produce a libcall, but really these should follow the normal legalization process through the libcall/lower action.	2020-08-26 20:08:45 -04:00
Lang Hames	605df8112c	[ORC][JITLink] Switch to unique ownership for EHFrameRegistrars. This will make stateful registrars (e.g. a future TargetProcessControl based registrar) easier to deal with.	2020-08-26 16:59:45 -07:00
Arthur Eubanks	486ed88533	[ConstProp] Remove ConstantPropagation As discussed in http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html. Currently no users outside of unit tests. Replace all instances in tests of -constprop with -instsimplify. Notable changes in tests: * vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead * InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D85159	2020-08-26 15:51:30 -07:00
Craig Topper	92d3e70df3	[X86] Change pentium4 tuning settings and scheduler model back to their values before D83913. Clang now defaults to -march=pentium4 -mtune=generic so we don't need modern tune settings on pentium4.	2020-08-26 15:38:12 -07:00
Alina Sbirlea	0b34226304	Use properlyDominates in RDFLiveness when sorting on dominance. Summary: When looking for all reaching definitions, we sort basic blocks on dominance. When sorting looking for properlyDominates() handles the case A == B. Authored by: pranavb Differential Revision: https://reviews.llvm.org/D86661	2020-08-26 15:16:40 -07:00
Ahmed Bougacha	383f7c8858	[AArch64] Use CCAssignFnForReturn helper in more spots. NFC. It was added for GISel, but SDAG could use it too!	2020-08-26 14:39:11 -07:00
Nikita Popov	d7c119d89c	[InstSimplify] Fold min/max intrinsic based on icmp of operands This is a reboot of D84655, now performing the inner icmp simplification query without undef folds. It should be possible to handle the current foldMinMaxSharedOp() fold based on this, by moving the logic into icmp of min/max instead, making it more general. We can't drop the folds for constant operands, because those also allow undef, which we exclude here. The tests use assumes for exhaustive coverage, and have a few more examples of misc folds we get based on icmp simplification. Differential Revision: https://reviews.llvm.org/D85929	2020-08-26 22:02:57 +02:00
Muhammad Asif Manzoor	fd536eeed9	[AArch64][SVE] Add lowering for llvm fceil Add the functionality to lower fceil for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84548	2020-08-26 15:59:44 -04:00
Owen Anderson	9936455204	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan	2020-08-26 19:38:38 +00:00
Sanjay Patel	54a5dd485c	[DAGCombiner] allow store merging non-i8 truncated ops We have a gap in our store merging capabilities for shift+truncate patterns as discussed in: https://llvm.org/PR46662 I generalized the code/comments for this function in earlier commits, so we only need ease the type restriction and adjust the address/endian checking to make this work. AArch64 lets us switch endian to make sure that patterns are matched either way. Differential Revision: https://reviews.llvm.org/D86420	2020-08-26 15:23:08 -04:00
Aleksandr Platonov	ceffd6993c	[Support][Windows] Fix incorrect GetFinalPathNameByHandleW() return value check in realPathFromHandle() `GetFinalPathNameByHandleW(,,N,)` returns: - `< N` on success (this value does not include the size of the terminating null character) - `>= N` if buffer is too small (this value includes the size of the terminating null character) So, when `N == Buffer.capacity() - 1`, we need to resize buffer if return value is > `Buffer.capacity() - 2`. Also, we can set `N` to `Buffer.capacity()`. Thus, without this patch `realPathFromHandle()` returns unfilled buffer when length of the final path of the file is equal to `Buffer.capacity()` or `Buffer.capacity() - 1`. Reviewed By: andrewng, amccarth Differential Revision: https://reviews.llvm.org/D86564	2020-08-26 22:11:44 +03:00
Arthur Eubanks	098d3f9827	[InstSimplify] Simplify to vector constants when possible InstSimplify should do all transformations that ConstProp does, but one thing that ConstProp does that InstSimplify wouldn't is inline vector instructions that are constants, e.g. into a ret. Previously vector instructions wouldn't be inlined in InstSimplify because llvm::Simplify*Instruction() would return nullptr for specific instructions, such as vector instructions that were actually constants, if it couldn't simplify them. This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and SimplifyShuffleVectorInst to return a vector constant when possible. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85946	2020-08-26 11:40:36 -07:00
Francesco Petrogalli	61dfa00957	[MC][SVE] Fix data operand for instruction alias of `st1d`. The version of `st1d` that operates with vector plus immediate addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was generating `<Zn>.s` instead of `<Zn>.d>`. Differential Revision: https://reviews.llvm.org/D86633	2020-08-26 18:22:17 +00:00
Steven Wu	476ca33089	[LTO] Don't apply LTOPostLink module flag during writeMergedModule For `ld64` which uses legacy LTOCodeGenerator, it relies on writeMergedModule to perform `ld -r` (generates a linked object file). If all the inputs to `ld -r` is fullLTO bitcode, `ld64` will linked the bitcode module, internalize all the symbols and write out another fullLTO bitcode object file. This bitcode file doesn't have all the bitcode inputs and it should not have LTOPostLink module flag. It will also cause error when this bitcode object file is linked with other LTO object file. Fix the issue by not applying LTOPostLink flag during writeMergedModule function. The flag should only be added when all the bitcode are linked and ready to be optimized. rdar://problem/58462798 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D84789	2020-08-26 11:17:45 -07:00
Krzysztof Parzyszek	e15143d31b	[Hexagon] Implement llvm.masked.load and llvm.masked.store for HVX	2020-08-26 13:10:22 -05:00
Matt Arsenault	f78687df9b	AMDGPU: Don't assert on misaligned DS read2/write2 offsets This would assert with unaligned DS access enabled. The offset may not be aligned. Theoretically the pattern predicate should check the memory alignment, although it is possible to have the memory be aligned but not the immediate offset. In this case I would expect it to use ds_{read\|write}_b64 with unaligned access, but am not clear if there's a reason it doesn't.	2020-08-26 14:08:05 -04:00
Wei Mi	c67ccf5faf	[SampleFDO] Enhance profile remapping support for searching inline instance and indirect call promotion candidate. Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change. However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change. To add that support, before any remapping lookup happens, all the names in the profile will be inserted into remapper and the Key to the name mapping will be recorded in a map called NameMap in the remapper. During name lookup, a Key will be returned for the given name and it will be used to extract an equivalent name in the profile from NameMap. So with the help of the NameMap, we can translate any given name to an equivalent name in the profile if it exists. Whenever we try to match a name in the module to a name in the profile, we will try the match with the original name first, and if it doesn't match, we will use the equivalent name got from remapper to try the match for another time. In this way, the patch can enhance the profile remapping support for searching inline instance and searching indirect call promotion candidate. In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. If existing profile remapping was enabled, the performance degradation dropped to 1.2%. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14% (Note the experiment was done before searching indirect call promotion candidate was added. We hope with the remapping support of searching indirect call promotion candidate, the degradation can drop to 0% in the end. It will be evaluated post commit). Differential Revision: https://reviews.llvm.org/D86332	2020-08-26 11:07:35 -07:00
Juneyoung Lee	684b43c0cf	[IR] Add NoUndef attribute to Intrinsics.td This patch adds NoUndef to Intrinsics.td. The attribute is attached to llvm.assume's operand, because llvm.assume(undef) is UB. It is attached to pointer operands of several memory accessing intrinsics as well. This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check unnecessary, so it is removed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86576	2020-08-27 02:54:48 +09:00
Craig Topper	09288bcbf5	[X86] Add assembler support for .d32 and .d8 mnemonic suffixes to control displacement size. This is an older syntax than the {disp32} and {disp8} pseudo prefixes that were added a few weeks ago. We can reuse most of the support for that to support .d32 and .d8 as well.	2020-08-26 10:45:50 -07:00
Roman Lebedev	95848ea101	[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks As FIXME said, they really should be checking for a single user, not use, so let's do that. It is not that unusual to have the same value as incoming value in a PHI node, not unlike how a PHI may have the same incoming basic block more than once. There isn't a nice way to do that, Value::users() isn't uniqified, and Value only tracks it's uses, not Users, so the check is potentially costly since it does indeed potentially involes traversing the entire use list of a value.	2020-08-26 20:20:41 +03:00
Owen Anderson	9061eb8245	Revert "Fix frame pointer layout on AArch64 Linux." This broke stage2 of clang-cmake-aarch64-full. This reverts commit `a0aed80b22`.	2020-08-26 17:17:14 +00:00
aartbik	72305a08ff	[llvm] [DAG] Fix bug in llvm.get.active.lane.mask lowering This intrinsic only accepted proper machine vector lengths. Fixed by this change. With unit tests. https://bugs.llvm.org/show_bug.cgi?id=47299 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D86585	2020-08-26 10:16:31 -07:00
Steven Wu	34b289b6db	[ThinLTO][Legacy] Compute PreservedGUID based on IRName in Symtab Instead of computing GUID based on some assumption about symbol mangling rule from IRName to symbol name, lookup the IRName from all the symtabs from all the input files to see if there are any matching symbols entry provides the IRName for GUID computation. rdar://65853754 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D84803	2020-08-26 10:15:00 -07:00
jasonliu	413054400d	[XCOFF][AIX] Support relocation generation for large code model Summary: Support TOCU and TOCL relocation type for object file generation. Reviewed by: DiggerLin Differential Revision: https://reviews.llvm.org/D84549	2020-08-26 17:12:28 +00:00
Craig Topper	28bd47fc47	[LegalizeTypes] Remove WidenVecRes_Shift and just use WidenVecRes_Binary This function seems to allow for the shift amount to have a different type than the result, but I don't think we do that anywhere else for vector shifts. We also don't have any support for legalizing the shift amount alone if the result is legal and the shift amount type isn't. The code coverage report here shows this code as uncovered http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp.html Differential Revision: https://reviews.llvm.org/D86475	2020-08-26 09:57:41 -07:00
Kai Nacke	ed07e1fe0f	[SystemZ/ZOS] Add header file to encapsulate use of <sysexits.h> The non-standard header file `<sysexits.h>` provides some return values. `EX_IOERR` is used to as a special value to signal a broken pipe to the clang driver. On z/OS Unix System Services, this header file does not exists. This patch - adds a check for `<sysexits.h>`, removing the dependency on `LLVM_ON_UNIX` - adds a new header file `llvm/Support/ExitCodes`, which either includes `<sysexits.h>` or defines `EX_IOERR` - updates the users of `EX_IOERR` to include the new header file Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D83472	2020-08-26 12:44:30 -04:00
Owen Anderson	a0aed80b22	Fix frame pointer layout on AArch64 Linux. When floating point callee-saved registers were used, the frame pointer would incorrectly point to the bottom of the CSR space (containing saved floating-point registers), rather than to the frame record. While all frame offsets were calculated consistently, resulting in working code, this prevented stack walkers from being about to traverse the frame list.	2020-08-26 16:09:49 +00:00
Sjoerd Meijer	bda8fbe2d2	[LV] Fallback strategies if tail-folding fails This implements 2 different vectorisation fallback strategies if tail-folding fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This can be controlled with option -prefer-predicate-over-epilogue, that has been changed to take a numeric value corresponding to the tail-folding preference and preferred fallback. Patch by: Pierre van Houtryve, Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D79783	2020-08-26 16:55:25 +01:00
Jay Foad	a75e67b3b4	[AMDGPU] Make more use of Subtarget reference in SIInstrInfo	2020-08-26 15:04:00 +01:00
Jay Foad	75d159f924	[LegalizeTypes] Add ROTL/ROTR to ScalarizeVectorResult. We can scalarize these just like any other binary operation. Fixes https://bugs.llvm.org/show_bug.cgi?id=47303 caused by D77152. Differential Revision: https://reviews.llvm.org/D86601	2020-08-26 14:42:57 +01:00
Dibya Ranjan Mishra	a7da7e421c	[Support] Allow printing the stack trace only for a given depth Differential Revision: https://reviews.llvm.org/D85458	2020-08-26 09:27:42 -04:00
Matt Arsenault	ff34116cf0	AMDGPU: Use Subtarget reference in SIInstrInfo	2020-08-26 09:18:41 -04:00
Matt Arsenault	21ccedc24f	AMDGPU/GlobalISel: Tolerate negated control flow intrinsic outputs If the condition output is negated, swap the branch targets. This is similar to what SelectionDAG does for when SelectionDAGBuilder decides to invert the condition and swap the branches. This is leaving behind a dead constant def for some reason.	2020-08-26 08:58:54 -04:00
Matt Arsenault	eb074088c9	GlobalISel: Combine G_ADD of G_PTRTOINT to G_PTR_ADD This produces less work for addressing mode matching. I think this is safe since I don't think machine IR is supposed to give the same aliasing properties as getelementptr in the IR.	2020-08-26 08:57:15 -04:00
Jay Foad	831457c6d5	[AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guaranteed to come to the same point at the same time. This is the same optimization that was implemented for SelectionDAG in D31731. Differential Revision: https://reviews.llvm.org/D86609	2020-08-26 13:47:51 +01:00
Xing GUO	8daa3264a3	[DWARFYAML] Make the unit_length and header_length fields optional. This patch makes the unit_length and header_length fields of line tables optional. yaml2obj is able to infer them for us. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86590	2020-08-26 20:35:10 +08:00
QingShan Zhang	ebf3b188c6	[Scheduling] Implement a new way to cluster loads/stores Before calling target hook to determine if two loads/stores are clusterable, we put them into different groups to avoid fake cluster due to dependency. For now, we are putting the loads/stores into the same group if they have the same predecessor. We assume that, if two loads/stores have the same predecessor, it is likely that, they didn't have dependency for each other. However, one SUnit might have several predecessors and for now, we just pick up the first predecessor that has non-data/non-artificial dependency, which is too arbitrary. And we are struggling to fix it. So, I am proposing some better implementation. 1. Collect all the loads/stores that has memory info first to reduce the complexity. 2. Sort these loads/stores so that we can stop the seeking as early as possible. 3. For each load/store, seeking for the first non-dependency instruction with the sorted order, and check if they can cluster or not. Reviewed By: Jay Foad Differential Revision: https://reviews.llvm.org/D85517	2020-08-26 12:33:59 +00:00
David Green	677c1590c0	[ARM] Increase MVE gather/scatter cost by MVECostFactor. MVE Gather scatter codegeneration is looking a lot better than it used to, but still has some issues. The instructions we currently model as 1 cycle per element, which is a bit low for some cases. Increasing the cost by the MVECostFactor brings them in-line with our other instruction costs. This will have the effect of only generating then when the extra benefit is more likely to overcome some of the issues. Notably in running out of registers and vectorizing loops that could otherwise be SLP vectorized. In the short-term whilst we look at other ways of dealing with those more directly, we can increase the costs of gathers to make them more likely to be beneficial when created. Differential Revision: https://reviews.llvm.org/D86444	2020-08-26 13:03:46 +01:00
Sam Tebbs	85dd852a0d	[RDA] Don't visit the BB of the instruction in getReachingUniqueMIDef If the basic block of the instruction passed to getUniqueReachingMIDef is a transitive predecessor of itself and has a definition of the register, the function will return that definition even if it is after the instruction given to the function. This patch stops the function from scanning the instruction's basic block to prevent this. Differential Revision: https://reviews.llvm.org/D86607	2020-08-26 12:40:39 +01:00
Pierre Gousseau	cda6b09242	[X86] Make sure we do not clobber RBX with mwaitx when used as a base pointer. mwaitx uses EBX as one of its argument. Using this instruction clobbers RBX as it is defined to hold one of the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. This patch is adapted from @qcolombet patch for cmpxchg at r263325. This fixes PR43528. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D73475	2020-08-26 11:20:31 +01:00
Cullen Rhodes	1f44dfb640	[AArch64][AsmParser] Fix bug in operand printer The switch in AArch64Operand::print was changed in D45688 so the shift can be printed after printing the register. This is implemented with LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between the register and shift operands. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D86535	2020-08-26 09:31:36 +00:00
Sander de Smalen	5f47d4456d	[AArch64][SVE] Fix calculation restore point for SVE callee saves. This fixes an issue where the restore point of callee-saves in the function epilogues was incorrectly calculated when the basic block consisted of only a RET instruction. This caused dealloc instructions to be inserted in between the block of callee-save restore instructions, rather than before it. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86099	2020-08-26 10:02:31 +01:00
Jan Kratochvil	b20a4e293c	[Support] Speedup llvm-dwarfdump 3.9x Currently `strace llvm-dwarfdump x.debug >/tmp/file`: ioctl(1, TCGETS, 0x7ffd64d7f340) = -1 ENOTTY (Inappropriate ioctl for device) write(1, " DW_AT_decl_line\t(89)\n"..., 4096) = 4096 ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7ffd64d7f410) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device) After this patch: write(1, "0000000000001102 \"strlen\")\n "..., 4096) = 4096 write(1, "site\n DW_AT_low"..., 4096) = 4096 write(1, "d53)\n\n0x000e4d4d: DW_TAG_G"..., 4096) = 4096 The same speedup can be achieved by `--color=0` but that is not much convenient. This implementation has been suggested by Joerg Sonnenberger. Differential Revision: https://reviews.llvm.org/D86406	2020-08-26 10:29:46 +02:00
Jay Foad	b7e3599a22	[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT Differential Revision: https://reviews.llvm.org/D86449	2020-08-26 09:20:46 +01:00
Shinji Okumura	3050713798	[Attributor] Provide an edge-based interface in AAIsDead This patch produces an edge-based interface in AAIsDead. By this, we can query a set of basic blocks that are directly reachable from a given basic block. This is specifically useful for implementation of AAReachability. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85547	2020-08-26 16:57:52 +09:00
Roman Lebedev	1f90d45b9e	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) This is a reland of the original commit `fcb51d8c24`, because originally i forgot to ensure that the base aggregate types match. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:57:50 +03:00
Roman Lebedev	c295c6f2c0	Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad" This reverts commit `fcb51d8c24`. As buildbots report, there's apparently some missing check to ensure that the types of incoming values match the type of PHI. Let's revert for a moment.	2020-08-26 09:23:22 +03:00
Roman Lebedev	fcb51d8c24	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:08:24 +03:00
Jianzhou Zhao	4784987027	Fix a 32-bit overflow issue when reading LTO-generated bitcode files whose strtab are of size > 2^29 This happens when using -flto and -Wl,--plugin-opt=emit-llvm to create a linked LTO bitcode file, and the bitcode file has a strtab with size > 2^29. All the issues relate to a pattern like this size_t x64 = y64 + z32 * C When z32 is >= (2^32)/C, z32 * C overflows. Reviewed-by: MaskRay Differential Revision: https://reviews.llvm.org/D86500	2020-08-26 05:47:22 +00:00
Xing GUO	75e0b58668	[DWARFYAML] Use writeDWARFOffset() to write the prologue_length field. NFC. Use writeDWARFOffset() to simplify the logic. NFC.	2020-08-26 12:34:02 +08:00
Adrien Guinet	c6f7ac0071	[llvm-lipo] Add support for bitcode files A Mach-O universal binary may contain bitcode as a slice. This diff adds proper handling of such binaries to llvm-lipo. Test plan: make check-all Differential revision: https://reviews.llvm.org/D85740	2020-08-25 21:11:18 -07:00
Mikhail R. Gadelha	30967e51da	Add Z3 to system libraries list if enabled Without this trying to link static LLVM libraries (built with Z3 enabled) fails because `llvm-config` doesn't print `-lz3`. We are already using this patch at MSYS2: https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/0013-Add-Z3-to-system-libraries-list-if-enabled.patch Reviewed By: mikhail.ramalho Differential Revision: https://reviews.llvm.org/D85195	2020-08-25 22:32:36 -04:00
Craig Topper	1d1515a9e2	[X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG Since we can only copy to GR32 we had to EXTRACT from GR32, but we would first go to GR16 and then the truncate would extra again to GR8. This adds a special case to go directly from GR32 to GR8. This would eventually get cleaned up, but though maybe we should avoid doing it in the first place. Our k-register handling is weird and we could probably stand to have some more special ISD nodes for the conversions so the i32 type would be explicit.	2020-08-25 18:20:43 -07:00
Craig Topper	b8ec8f5776	[X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine. The IsExtractedElement already called getOperand(0) so Extract here is the source vector. We shouldn't call getOperand(0). This worked for the original test cases because the result was a bitcast so the getOperand(0) accidently peeked through the bitcast which is what we wanted. In the failing case here, the operand turns out to be undef so the getOperand(0) asserts because undef has no operands. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184 Differential Revision: https://reviews.llvm.org/D86428	2020-08-25 16:16:54 -07:00
Craig Topper	ba319ac47e	[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern. KMOVWkr produces VK16, there's no reason to copy it to VK16 again. Test changes are presumably because we were scheduling based on the COPY that is no longer there.	2020-08-25 15:19:27 -07:00
Mircea Trofin	7cfcecece0	[MLInliner] Simplify TFUTILS_SUPPORTED_TYPES We only need the C++ type and the corresponding TF Enum. The other parameter was used for the output spec json file, but we can just standardize on the C++ type name there. Differential Revision: https://reviews.llvm.org/D86549	2020-08-25 14:19:39 -07:00
Stanislav Mekhanoshin	b7760c3e5d	[AMDGPU] Remove unsound dependency on ISA version in waitcnt Differential Revision: https://reviews.llvm.org/D86566	2020-08-25 14:01:42 -07:00
Fangrui Song	82d0749749	[TargetLoweringObjectFileImpl] Make .llvmbc and .llvmcmd non-SHF_ALLOC There are two ways .llvmbc can be produced: * clang -c -fembed-bitcode=all (which also produces .llvmcmd) * LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode .llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by --gc-sections. This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This is conceptually correct: the two sections are not part of the process image, so SHF_ALLOC is not appropriate. `test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to `llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D86374	2020-08-25 13:37:29 -07:00
Stanislav Mekhanoshin	817c831f02	[AMDGPU] Switch to named simm16 in vscnt insertion Differential Revision: https://reviews.llvm.org/D86568	2020-08-25 13:05:27 -07:00
Ankit Aggarwal	2da1eefb58	[Hexagon] Check if EVT is simple type in HVX lowering	2020-08-25 15:02:44 -05:00
Juneyoung Lee	f753f5b050	[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands. Instead of special-casing llvm.assume, I think it is also a viable option to add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86477	2020-08-26 04:40:21 +09:00
Nikita Popov	3a54b6a4b7	[MemDep] Use BatchAA when computing pointer dependencies We're not changing IR while running a single MemDep query, so it's safe to cache alias analysis results using BatchAA. This adds BatchAA usage to getSimplePointerDependencyFrom(), which is non-intrusive -- covering larger parts (like a whole processNonLocalLoad query) is also possible, but requires threading BatchAA through a bunch of APIs. For the ThinLTO configuration, this is a 1% geomean improvement on CTMark. Differential Revision: https://reviews.llvm.org/D85583	2020-08-25 21:34:34 +02:00
Wei Wang	ae90df8e5a	[FIX] Avoid creating BFI when emitting remarks for dead functions Dead function has its body stripped away, and can cause various analyses to panic. Also it does not make sense to apply analyses on such function. Reviewed By: xazax.hun, MaskRay, wenlei, hoy Differential Revision: https://reviews.llvm.org/D84715	2020-08-25 11:12:38 -07:00
Krzysztof Parzyszek	dcef5e0c37	[Hexagon] Remove (redundant) HexagonISelLowering::isHvxOperation(SDValue) Use isHvxOperation(SDNode*) instead.	2020-08-25 11:45:08 -05:00
Ta-Wei Tu	abbd652dd6	[LoopNest] False negative of `arePerfectlyNested` with LCSSA loops Summary: The LCSSA pass (required for all loop passes) sometimes adds additional blocks containing LCSSA variables, and checkLoopsStructure may return false even when the loops are perfectly nested in this case. This is because the successor of the exit block of the inner loop now points to the LCSSA block instead of the latch block of the outer loop. Examples are shown in the test nests-with-lcssa.ll. To fix the issue, the successor of the exit block of the inner loop can now point to a block in which all instructions are LCSSA phi node (except the terminator), and the sole successor of that block should point to the latch block of the outer loop. Reviewed By: Whitney, etiotto Differential Revision: https://reviews.llvm.org/D86133	2020-08-25 16:20:52 +00:00
Sanjay Patel	c4f0a0896f	[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) The 1st attempt (rG557b890) was reverted because it caused miscompiles. That bug is avoided here by changing the order of folds and as verified in the new tests. Original commit message: InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-25 11:19:36 -04:00
Sjoerd Meijer	8d5f64c4ed	[Verifier] Additional check for intrinsic get.active.lane.mask This adapts the verifier checks for intrinsic get.active.lane.mask to the new semantics of it as described in D86147. I.e., the second argument %n, which corresponds to the loop tripcount, must be greater than 0 if it is a constant, so check that. Differential Revision: https://reviews.llvm.org/D86301	2020-08-25 15:44:33 +01:00
Xing GUO	1dc57ada0c	[DWARFYAML] Make the 'Attributes' field optional. This patch makes the 'Attributes' field optional. We don't need to explicitly specify the 'Attributes' field in the future. Reviewed By: jhenderson, grimar Differential Revision: https://reviews.llvm.org/D86537	2020-08-25 22:37:43 +08:00
Sjoerd Meijer	39522b1e10	[SelectionDAG] Legalize intrinsic get.active.lane.mask This adapts legalization of intrinsic get.active.lane.mask to the new semantics as described in D86147. Because the second argument is now the loop tripcount, we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the backedge-taken count. Differential Revision: https://reviews.llvm.org/D86302	2020-08-25 15:00:10 +01:00
Jeremy Morse	121a49d839	[LiveDebugValues] Add switches for using instr-ref variable locations This patch adds the -Xclang option "-fexperimental-debug-variable-locations" and same LLVM CodeGen option, to pick which variable location tracking solution to use. Right now all the switch does is pick which LiveDebugValues implementation to use, the normal VarLoc one or the instruction referencing one in rGae6f78824031. Over time, the aim is to add fragments of support in aid of the value-tracking RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html also controlled by this command line switch. That will slowly move variable locations to be defined by an instruction calculating a value, and a DBG_INSTR_REF instruction referring to that value. Thus, this is going to grow into a "use the new kind of variable locations" switch, rather than just "use the new LiveDebugValues implementation". Differential Revision: https://reviews.llvm.org/D83048	2020-08-25 14:58:48 +01:00
Matt Arsenault	0d2fe90063	AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge Most notably, we were incorrectly reporting <3 x s16> as a legal type for these. Make sure these aren't legal to help make progress on fixing the artifact combiner and vector legalizer rules. Unfortunately, this means spreading the -global-isel-abort=0 hack, although this doesn't change the legalizer result in any situation.	2020-08-25 09:40:20 -04:00
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Matt Arsenault	ef8f3b5a78	AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors The selection patterns will currently fail on these.	2020-08-25 09:37:41 -04:00
Sjoerd Meijer	ae366479e8	[LV] get.active.lane.mask consuming tripcount instead of backedge-taken count This adapts LV to the new semantics of get.active.lane.mask as discussed in D86147, which means that the LV now emits intrinsic get.active.lane.mask with the loop tripcount instead of the backedge-taken count as its second argument. The motivation for this is described in D86147. Differential Revision: https://reviews.llvm.org/D86304	2020-08-25 13:49:19 +01:00
David Green	5b7e27a4db	[ARM][CGP] Fix scalar condition selects for MVE The arm backend does not handle select/select_cc on vectors with scalar conditions, preferring to expand them in codegenprepare instead. This usually works except when optimizing for size, where the optsize check would end up overruling the backend isSelectSupported check. We could handle the selects in ISel too, but this seems like smaller code than trying to splat the condition to all lanes. Differential Revision: https://reviews.llvm.org/D86433	2020-08-25 12:09:06 +01:00
Mikael Holmen	59e1fbe557	[PowerPC] Fix gcc warning [NFC] Without the fix gcc 7.4 warns with ../lib/Target/PowerPC/PPCAsmPrinter.cpp: In member function 'void {anonymous}::PPCAsmPrinter::EmitTlsCall(const llvm::MachineInstr*, llvm::MCSymbolRefExpr::VariantKind)': ../lib/Target/PowerPC/PPCAsmPrinter.cpp:525:53: warning: enumeral and non-enumeral type in conditional expression [-Wextra] MCInstBuilder(Subtarget->isPPC64() ? Opcode : PPC::BL_TLS) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~	2020-08-25 12:58:38 +02:00
Shinji Okumura	05390440a2	[Attributor][NFC] Clang format	2020-08-25 19:32:58 +09:00
Paul Walker	73ac3c0ede	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Benjamin Kramer	c6fb72de4f	Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract" This reverts commit `557b890ff4`. Causing miscompiles, test case is on llvm-commits.	2020-08-25 11:31:31 +02:00
Hans Wennborg	6da4f1199e	Revert "[CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU" It broke Chromium's llvm build: CMake Error at lib/Support/CMakeLists.txt:13 (string): string sub-command REGEX, mode REPLACE: regex "^()" matched an empty string. Call Stack (most recent call first): lib/Support/CMakeLists.txt:223 (get_system_libname) This reverts commit `2b3807d822` / https://reviews.llvm.org/D86434	2020-08-25 11:22:50 +02:00
David Sherwood	7b64765cd1	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Florian Hahn	e19ef1aab5	[DSE,MemorySSA] Cache accesses with/without reachable read-clobbers. Currently we repeatedly check the same uses for read clobbers in some cases. We can avoid unnecessary checks by keeping track of the memory accesses we already found read clobbers for. To do so, we just add memory access causing read-clobbers to a set. Note that marking all visited accesses as read-clobbers would be to pessimistic, as that might include accesses not on any path to the actual read clobber. If we do not find any read-clobbers, we can add all visited instructions to another set and use that to skip the same accesses in the next call. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D75025	2020-08-25 08:48:46 +01:00
Roman Lebedev	cdd339c568	[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's As per statistic, this happens pretty exceedingly rare, but i have seen it in exactly the situations the Phi-aware aggregate reconstruction would have handled, eventually, and allowed invoke -> call fold later on. So while this might be something that other fold will have to learn about, i believe we should be doing this transform in general. Here, we are okay with adding two PHI's to get both the base aggregate, and the inserted value. I'm not sure it makes much sense to restrict it to a single phi (to just the inserted value?), because originally we'd be receiving the final aggregate already.. llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|--------------------------------------------\|-----------\|-----------\|-----:\|-------:\|------:\| \| instcombine.NumPHIsOfInsertValues \| 0 \| 12 \| 12 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 8926643 \| 8926595 \| -48 \| 0.00% \| 0.00% \| \| instcombine.NumCombined \| 3846614 \| 3846640 \| 26 \| 0.00% \| 0.00% \| \| instcombine.NumConstProp \| 24302 \| 24293 \| -9 \| -0.04% \| 0.04% \| \| instcombine.NumDeadInst \| 1620140 \| 1620112 \| -28 \| 0.00% \| 0.00% \| \| instcount.NumBrInst \| 898466 \| 898464 \| -2 \| 0.00% \| 0.00% \| \| instcount.NumCallInst \| 1760819 \| 1760875 \| 56 \| 0.00% \| 0.00% \| \| instcount.NumExtractValueInst \| 45659 \| 45649 \| -10 \| -0.02% \| 0.02% \| \| instcount.NumInsertValueInst \| 4991 \| 4981 \| -10 \| -0.20% \| 0.20% \| \| instcount.NumIntToPtrInst \| 27084 \| 27087 \| 3 \| 0.01% \| 0.01% \| \| instcount.NumPHIInst \| 371435 \| 371429 \| -6 \| 0.00% \| 0.00% \| \| instcount.NumStoreInst \| 906011 \| 906019 \| 8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1105520 \| 1105518 \| -2 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9795737 \| 9795776 \| 39 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 2784 \| 2786 \| 2 \| 0.07% \| 0.07% \| \| simplifycfg.NumSimpl \| 1001840 \| 1001850 \| 10 \| 0.00% \| 0.00% \| \| simplifycfg.NumSinkCommonInstrs \| 15174 \| 15170 \| -4 \| -0.03% \| 0.03% \| ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86306	2020-08-25 10:38:11 +03:00
Sam Parker	85a5c65f69	[NFC][RDA] Add explicit def check Explicitly check that there is a local def prior to the given instruction in getReachingLocalMIDef instead of just relying on a nullptr return from getInstFromId.	2020-08-25 08:37:45 +01:00
Freddy Ye	e02d081f2b	[X86] Support -march=sapphirerapids Support -march=sapphirerapids for x86. Compare with Icelake Server, it includes 14 more new features. They are amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote, enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86503	2020-08-25 14:21:21 +08:00
Petr Hosek	2b3807d822	[CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU For the Windows GNU platform, CMAKE_FIND_LIBRARY_PREFIXES is a list containing an empty string, which ended up in a regex capturing group, which is invalid in CMake's regex engine. With this change, we get the following: set(CMAKE_FIND_LIBRARY_PREFIXES "lib" "") set(CMAKE_FIND_LIBRARY_SUFFIXES ".dll.a" ".a" ".lib") get_system_libname(path/to/libz.dll.a zlib) message("${zlib}") outputs z, as expected. Patch By: haampie Differential Revision: https://reviews.llvm.org/D86434	2020-08-24 23:00:54 -07:00
Mircea Trofin	8c63df2416	[MLInliner] Support training that doesn't require partial rewards If we use training algorithms that don't need partial rewards, we don't need to worry about an ir2native model. In that case, training logs won't contain a 'delta_size' feature either (since that's the partial reward). Differential Revision: https://reviews.llvm.org/D86481	2020-08-24 17:36:29 -07:00
Venkataramanan Kumar	62e91bf563	[DAGCombine]: Fold X/Sqrt(X) to Sqrt(X) With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X). This is done after targets have the chance to produce a reciprocal sqrt estimate sequence because that expansion is probably more efficient than an expansion of a non-reciprocal sqrt. That is also why we deferred doing this transform in IR (D85709). Differential Revision: https://reviews.llvm.org/D86403	2020-08-24 18:16:13 -04:00
Matt Arsenault	77e5a195f8	AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands. We would still need to waterfall if the value were somehow an AGPR, and also need to explicitly copy to a VGPR.	2020-08-24 17:54:34 -04:00
Nemanja Ivanovic	075a92dea1	[PowerPC] Do not use FISel for calls and TOC-based accesses with PC-Rel PC-Relative addressing introduces a fair bit of complexity for correctly eliminating TOC accesses. FastISel does not include any of that handling so we miscompile code with -mcpu=pwr10 -O0 if it includes an external call that FastISel does not handle followed by any of the following: Floating point constant materialization Materialization of a GlobalValue Call that FastISel does handle This patch switches to SDISel for any of the above. Differential revision: https://reviews.llvm.org/D86343	2020-08-24 16:51:44 -05:00
Craig Topper	f7c87b7e37	[X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64". To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586. I updated one llvm-mca test to check a different CPU since generic has a scheduler model now Differential Revision: https://reviews.llvm.org/D86312	2020-08-24 14:47:10 -07:00
Nemanja Ivanovic	c485343c83	[PowerPC] Handle SUBFIC in reg+reg -> reg+imm transformation We initially missed the subtract-immediate in this transformation. This patch just adds that. Differential revision: https://reviews.llvm.org/D84659	2020-08-24 16:22:59 -05:00
Sanjay Patel	557b890ff4	[InstCombine] improve demanded element analysis for vector insert-of-extract InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-24 17:00:16 -04:00
Bjorn Pettersson	fce44ff5da	[Scalarizer] Avoid updating the name of globals The "takeName" logic at the end of ScalarizerVisitor::finish could end up renaming global variables when having simplified and extractelement instruction to simply pick a single vector element. If the input vector to the extractelement instruction held pointers to global variables we ended up renaming the global variable. The patch make sure we only take the name of the replaced Op when we have added new instructions that might need a useful name. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D86472	2020-08-24 21:55:03 +02:00
Roman Lebedev	56c529300e	[NFC][InstCombine] Adjust naming for some methods to match coding standards Requested as preparatory cleanup in https://reviews.llvm.org/D86306#inline-799065	2020-08-24 22:39:34 +03:00
Roland Froese	b6d7ed469f	[PowerPC] Extend custom lower of vector truncate to handle wider input Current custom lowering of truncate vector handles a source of up to 128 bits, but that only uses one of the two shuffle vector operands. Extend it to use both operands to handle 256 bit sources. Differential Revision: https://reviews.llvm.org/D68035	2020-08-24 15:33:43 -04:00
Fangrui Song	44ee9d070a	Revert D85812 "[coroutine] should disable inline before calling coro split" This reverts commit `2e43acfed8`. LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the library which contains SampleProfile.cpp). It is inappropriate for SampleProfile.cpp to depent on Coroutines.h (circular dependency). The test inverted dependencies as well: llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.	2020-08-24 11:41:05 -07:00
Matt Arsenault	75e6f0b3d4	AMDGPU: Add flag to disable promotion of uniform i16 ops This interferes with GlobalISel's much better handling of the situation. This should really be disable for GlobalISel. However, the fallback only re-runs the selection passes, and doesn't go back and rerun any codegen IR passes. I haven't come up with a good solution to this problem.	2020-08-24 14:39:27 -04:00
Craig Topper	43465a4375	[LegalizeTypes][X86] Add ROTL/ROTR to WidenVectorResult. We can widen these just like any other binary operation. Added test cases for v2i32 for X86 for coverage. Fixes failures seen after D77152.	2020-08-24 10:10:20 -07:00
Jay Foad	a522067692	[SDAG] Convert FSHL <--> FSHR if the target only supports one of them D77152 tried to do this but got it wrong in the shift-by-zero case. D86430 reverted the wrong code. Reimplement the optimization with different code depending on whether the shift amount is known to be non-zero (modulo bitwidth). This improves code quality for fshl tests on AMDGPU, which only has an fshr instruction. Differential Revision: https://reviews.llvm.org/D86438	2020-08-24 17:47:10 +01:00
Florian Hahn	d1a1cce5b1	[DSE,MemorySSA] Do not use callCapturesBefore in isReadClobber. Using callCapturesBefore potentially improves the precision and the number of stores we can remove. But in practice, it seems to have very little impact in terms of stores removed. For example, for SPEC2000/SPEC2006/MultiSource with -O3 -flto, ~50 more stores are removed (out of ~26900 stores removed). But in terms of compile-time, it is very expensive and the patch gives substantial compile-time improvements: Geomean O3 -0.24%, ReleaseThinLTO -0.47%, ReleaseLTO-g -0.39%. http://llvm-compile-time-tracker.com/compare.php?from=612a0bff88ed906c83b82f079d4c49e5fecfb9d0&to=e6c86b96d20d97dd88e903a409bd8d39b6114312&stat=instructions	2020-08-24 16:19:42 +01:00
Matt Arsenault	62d1fb828f	AMDGPU/GlobalISel: Use unmerge instead of extract in addrspace queries This is a bit more consistent with regular operation legalization.	2020-08-24 11:07:51 -04:00
Thomas Preud'homme	2c9131665d	Test all CHECK-NOT in a block even if one fails This commit makes FileCheck print all CHECK-NOT directive failure in a CHECK-NOT block even if one fails. Prior to that, it would stop trying to match CHECK-NOT directive as soon as one in the block fails. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86315	2020-08-24 15:45:05 +01:00
Baptiste Saleil	512e256c0d	[PowerPC] Add clang options to control MMA support This patch adds frontend and backend options to enable and disable the PowerPC MMA operations added in ISA 3.1. Instructions using these options will be added in subsequent patches. Differential Revision: https://reviews.llvm.org/D81442	2020-08-24 09:35:55 -05:00
dongAxis	2e43acfed8	[coroutine] should disable inline before calling coro split summary: When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue TestPlan: check-llvm Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D85812	2020-08-24 22:22:08 +08:00
Matt Arsenault	517caca359	GlobalISel: Improve dead instruction debug printing This was printing the "Is dead" on a separate line from the instruction, which was harder to follow.	2020-08-24 10:12:00 -04:00
Francesco Petrogalli	5a34b3ab95	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with `llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an ordering function for the `ElementCount` class. * Bits and pieces around printing the `ElementCount` to string streams. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Note that in some places the idiom ``` unsigned VF = ... auto VTy = FixedVectorType::get(ScalarTy, VF) ``` has been replaced with ``` ElementCount VF = ... assert(!VF.Scalable && ...); auto VTy = VectorType::get(ScalarTy, VF) ``` The assertion guarantees that the new code is (at least in debug mode) functionally equivalent to the old version. Notice that this change had been possible because none of the methods that are specific to `FixedVectorType` were used after the instantiation of `VTy`. Reviewed By: rengolin, ctetreau Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:54:03 +00:00
Matt Arsenault	70cd9f5b77	AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr Handle workitem intrinsics. There isn't really away to adequately test this right now, since none of the known bits users are fine grained enough to test the edge conditions. This triggers a number of instances of the new 64-bit to 32-bit shift combine in the existing tests.	2020-08-24 09:53:27 -04:00
Francesco Petrogalli	bad7d6b373	Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]" Reverting because the commit message doesn't reflect the one agreed on phabricator at https://reviews.llvm.org/D85794. This reverts commit `c8d2b065b9`.	2020-08-24 13:50:55 +00:00
Matt Arsenault	e1644a3779	GlobalISel: Reduce G_SHL width if source is extension shl ([sza]ext x, y) => zext (shl x, y). Turns expensive 64 bit shifts into 32 bit if it does not overflow the source type: This is a port of an AMDGPU DAG combine added in `5fa289f0d8`. InstCombine does this already, but we need to do it again here to apply it to shifts introduced for lowered getelementptrs. This will help matching addressing modes that use 32-bit offsets in a future patch. TableGen annoyingly assumes only a single match data operand, so introduce a reusable struct. However, this still requires defining a separate GIMatchData for every combine which is still annoying. Adds a morally equivalent function to the existing getShiftAmountTy. Without this, we would have to do try to repeatedly query the legalizer info and guess at what type to use for the shift.	2020-08-24 09:42:40 -04:00
Francesco Petrogalli	c8d2b065b9	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Add `<` operator to `ElementCount` to be able to use `llvm::SmallSetVector<ElementCount, ...>`. * Bits and pieces around printing the ElementCount to string streams. * Added a static method to `ElementCount` to represent a scalar. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:39:42 +00:00
Florian Hahn	b99a5eb659	[DSE,MemorySSA] Delay PointerMayBeCaptured calls until actually needed. Avoid computing InvisibleToCallerBefore/AfterRet up front. In most cases, this information is not really needed. Instead, introduce helper functions to compute and cache the result on demand. Notably, this also does not use PointerMayBeCapturedBefore for isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as starting instruction, making the caching ineffective. But it appears the use of PointerMayBeCapturedBefore has very limited benefits in practice (e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with -O3 -flto). Refrain from using it for now, to limit-compile-time. This gives some nice compile-time improvements: http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions	2020-08-24 14:05:44 +01:00
Anna Welker	8048068c3e	[ARM][MVE] Allow tail predication for strides !=1 with gather/scatters If gather/scatters are enabled, ARMTargetTransformInfo now allows tail predication for loops with a much wider range of strides, up to anything that is loop invariant. Differential Revision: https://reviews.llvm.org/D85410	2020-08-24 13:54:47 +01:00
Jonas Paulsson	8ac70694b9	[SystemZ] Preserve the MachineMemOperand in emitCondStore() in all cases. Review: Ulrich Weigand	2020-08-24 14:07:30 +02:00
Florian Hahn	2431b143ae	[DSE,MemorySSA] Limit elimination at end of function to single UO. Limit elimination of stores at the end of a function to MemoryDefs with a single underlying object, to save compile time. In practice, the case with multiple underlying objects seems not very important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006 this results in a total of 2 more stores being eliminated. We can always re-visit that in the future.	2020-08-24 13:00:17 +01:00
Sanjay Patel	6a44edb8da	[InstCombine] fold abs of select with negated op (PR39474) Similar to the existing transform - peek through a select to match a value and its negation. https://alive2.llvm.org/ce/z/MXi5KG define i8 @src(i1 %b, i8 %x) { %0: %neg = sub i8 0, %x %sel = select i1 %b, i8 %x, i8 %neg %abs = abs i8 %sel, 1 ret i8 %abs } => define i8 @tgt(i1 %b, i8 %x) { %0: %abs = abs i8 %x, 1 ret i8 %abs } Transformation seems to be correct!	2020-08-24 07:37:55 -04:00
Sam Parker	2e194fe73b	[SCEV] Still trying to fix windows buildbots	2020-08-24 10:26:48 +01:00
Julien Etienne	0f0be3fb8d	Add support for AVR attiny441 and attiny841 Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D85589 Patch by Julien Etienne	2020-08-24 20:28:32 +12:00
Sam Parker	8ce450da32	[NFCI][SimplifyCFG] Combine select costs and checks Combine the cost modelling and validity checks for the phi to select conversion in SpeculativelyExecuteBB, extracting the logic out into a function.	2020-08-24 09:16:11 +01:00
Bjorn Pettersson	7a4e26adc8	[SelectionDAG] Fix miscompile bug in expandFunnelShift This is a fixup of commit `0819a6416f` (D77152) which could result in miscompiles. The miscompile could only happen for targets where isOperationLegalOrCustom could return different values for FSHL and FSHR. The commit mentioned above added logic in expandFunnelShift to convert between FSHL and FSHR by swapping direction of the funnel shift. However, that transform is only legal if we know that the shift count (modulo bitwidth) isn't zero. Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if Z modulo bitwidth, could be zero. ``` $ ./alive-tv /tmp/test.ll ---------------------------------------- define i32 @src(i32 %x, i32 %y, i32 %z) { %0: %t0 = fshl i32 %x, i32 %y, i32 %z ret i32 %t0 } => define i32 @tgt(i32 %x, i32 %y, i32 %z) { %0: %t0 = sub i32 32, %z %t1 = fshr i32 %x, i32 %y, i32 %t0 ret i32 %t1 } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %x = #x00000000 (0) i32 %y = #x00000400 (1024) i32 %z = #x00000000 (0) Source: i32 %t0 = #x00000000 (0) Target: i32 %t0 = #x00000020 (32) i32 %t1 = #x00000400 (1024) Source value: #x00000000 (0) Target value: #x00000400 (1024) ``` It could be possible to add back the transform, given that logic is added to check that (Z % BW) can't be zero. Since there were no test cases proving that such a transform actually would be useful I decided to simply remove the faulty code in this patch. Reviewed By: foad, lebedev.ri Differential Revision: https://reviews.llvm.org/D86430	2020-08-24 09:52:11 +02:00
Fangrui Song	fd485673da	[LiveDebugVariables] Internalize class DbgVariableValue. NFC	2020-08-23 22:53:46 -07:00
Qiu Chaofan	1bc45b2fd8	[PowerPC] Support lowering int-to-fp on ppc_fp128 D70867 introduced support for expanding most ppc_fp128 operations. But sitofp/uitofp is missing. This patch adds that after D81669. Reviewed By: uweigand Differntial Revision: https://reviews.llvm.org/D81918	2020-08-24 11:18:16 +08:00
Qiu Chaofan	fed6107dcb	[PowerPC] Allow constrained FP intrinsics in mightUseCTR We may meet Invalid CTR loop crash when there's constrained ops inside. This patch adds constrained FP intrinsics to the list so that CTR loop verification doesn't complain about it. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81924	2020-08-24 11:09:58 +08:00
QingShan Zhang	960cbc53ca	[DAGCombine] Remove dead node when it is created by getNegatedExpression We hit the compiling time reported by https://bugs.llvm.org/show_bug.cgi?id=46877 and the reason is the same as D77319. So we need to remove the dead node we created to avoid increase the problem size of DAGCombiner. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D86183	2020-08-24 02:50:58 +00:00
Qiu Chaofan	41ba9d7723	[PowerPC] Support constrained vector fp/int conversion This patch makes these operations legal, and add necessary codegen patterns. There's still some issue similar to D77033 for conversion from v1i128 type. But normal type tests synced in vector-constrained-fp-intrinsic are passed successfully. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D83654	2020-08-24 10:10:27 +08:00
Roman Lebedev	f6decfa36d	[InstCombine] Negator: freeze is freely negatible if it's operand is negatible	2020-08-23 23:28:19 +03:00
Fangrui Song	bef684154d	[X86][FastISel] Support materializing floating-point constants for large code model & PIC The following program miscompiles because rL216012 added static relocation model support but not for PIC. ``` // clang -fpic -mcmodel=large -O0 a.cc double foo() { return 42.0; } ``` This patch adds PIC support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86024	2020-08-23 08:36:18 -07:00
Florian Hahn	2843c9fe0a	[DSE,MemorySSA] Keep single DL instance in DSEState (NFC). Small cleanup, also removes one instance of getting DataLayout without using it later.	2020-08-23 15:56:38 +01:00
Sanjay Patel	1d0fa79824	[DAGCombiner] restrict store merge of truncs to early combining The pattern matching does not account for truncating stores, so it is unlikely to work at later stages. So we are likely wasting compile-time with no hope of improvement by running this later.	2020-08-23 10:44:23 -04:00
Sanjay Patel	79cb289a95	[DAGCombiner] add early exit for store merging of truncs This should be NFC in terms of output because the endian check further down would bail out too, but we are wasting time by waiting to that point to give up. If we generalize that function to deal with more than i8 types, we should not have to deal with the degenerate case.	2020-08-22 16:25:16 -04:00
Jeremy Morse	93af37043b	Follow-up build fix for rGae6f78824031 One of the bots objects to brace-initializing a tuple: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/43595/steps/build%20stage%201/logs/stdio As the tuple constructor is apparently explicit. Fall back to the (not as pretty) explicit construction of a tuple. I'd thought this was permitted behaviour; will investigate why this fails later.	2020-08-22 19:09:30 +01:00
Fangrui Song	60bcec4eea	[LiveDebugValues] Delete unneeded copy constructor after D83047 It will suppress the implicitly-declared copy assignment operator in C++20.	2020-08-22 10:55:28 -07:00
Jeremy Morse	ae6f788240	[LiveDebugValues] Add instruction-referencing LDV implementation This patch imports the instruction-referencing implementation of LiveDebugValues proposed here: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html The new implementation is unreachable in this patch, it's the next patch that enables it behind a command line switch. Briefly, rather than tracking variable locations by just their location as the 'VarLoc' implementation does, this implementation does it by value: * Each value defined in a function is numbered, and propagated through dataflow, * Each DBG_VALUE reads a machine value number from a machine location, * Variable _values_ are propagated through dataflow, * Variable values are translated back into locations, DBG_VALUEs inserted to specify where those locations are. The ultimate aim of this is to enable referring to variable values throughout post-isel code, rather than locations. Those patches will build on top of this new LiveDebugValues implementation in later patches -- it can't be done with the VarLoc implementation as we don't have value information, only locations. Differential Revision: https://reviews.llvm.org/D83047	2020-08-22 18:31:08 +01:00
Matt Arsenault	901e3317fe	GlobalISel: Merge FewerElements for G_BUILD_VECTOR/G_CONCAT_VECTORS This switches from using G_EXTRACT in odd cases to widen with undef and unmerge.	2020-08-22 10:25:53 -04:00
Jeremy Morse	2d9be9e318	Fix some builds after `20bb9fe565` -Wsuggest-override indicates this VarLocBasedLDV method needs the override keyword.	2020-08-22 15:20:42 +01:00
Jeremy Morse	20bb9fe565	[LiveDebugValues] Install an implementation-picking LiveDebugValues pass This patch renames the current LiveDebugValues class to "VarLocBasedLDV" and removes the pass-registration code from it. It creates a separate LiveDebugValues class that deals with pass registration and management, that calls through to VarLocBasedLDV::ExtendRanges when runOnMachineFunction is called. This is done through the "LDVImpl" abstract class, so that a future patch can install the new instruction-referencing LiveDebugValues implementation and have it picked at runtime. No functional change is intended, just shuffling responsibilities. Differential Revision: https://reviews.llvm.org/D83046	2020-08-22 14:50:22 +01:00
Sanjay Patel	ec06b38130	[InstCombine] canonicalize 'not' ops before logical shifts This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen. The SCEV motivating case is discussed in: http://bugs.llvm.org/PR47136 There's an analysis motivating case at: http://bugs.llvm.org/PR38781 I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps. Alive proofs: https://rise4fun.com/Alive/BBV Name: shift right of 'not' Pre: C2 == (-1 u>> C1) %a = lshr i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = lshr i8 %n, C1 Name: shift left of 'not' Pre: C2 == (-1 << C1) %a = shl i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = shl i8 %n, C1 Name: ashr of 'not' %a = ashr i8 %x, C1 %r = xor i8 %a, -1 => %n = xor i8 %x, -1 %r = ashr i8 %n, C1 Differential Revision: https://reviews.llvm.org/D86243	2020-08-22 09:38:13 -04:00
Sanjay Patel	2fc7c85201	[DAGCombiner] clean up merge of truncated stores; NFC This code handles the special-case of i8 stores, but it could be generalized to deal with other types.	2020-08-22 09:23:32 -04:00
Jeremy Morse	fba06e3c85	[LiveDebugValues][NFC] Move LiveDebugValues source for refactor This is a pure file move of LiveDebugValues.cpp ahead of the pass being refactored, with an experimental new implementation to follow. The motivation for these changes can be found here: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html And the other related changes can be found in the phabricator stack for this revision: Differential Revision: https://reviews.llvm.org/D83304	2020-08-22 12:58:30 +01:00
Florian Hahn	5e7e2162d4	[DSE,MemorySSA] Use BatchAA for AA queries. We can use BatchAA to avoid some repeated AA queries. We only remove stores, so I think we will get away with using a single BatchAA instance for the complete run. The changes in AliasAnalysis.h mirror the changes in D85583. The change improves compile-time by roughly 1%. http://llvm-compile-time-tracker.com/compare.php?from=67ad786353dfcc7633c65de11601d7823746378e&to=10529e5b43809808e8c198f88fffd8f756554e45&stat=instructions This is part of the patches to bring down compile-time to the level referenced in http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86275	2020-08-22 08:36:35 +01:00
Sourabh Singh Tomar	f91d18eaa9	[DebugInfo][flang]Added support for representing Fortran assumed length strings This patch adds support for representing Fortran `character(n)`. Primarily patch is based out of D54114 with appropriate modifications. Test case IR is generated using our downstream classic-flang. We're in process of upstreaming flang PR's but classic-flang has dependencies on llvm, so this has to get in first. Patch includes functional test case for both IR and corresponding dwarf, furthermore it has been manually tested as well using GDB. Source snippet: ``` program assumedLength call sub('Hello') call sub('Goodbye') contains subroutine sub(string) implicit none character(len=), intent(in) :: string print , string end subroutine sub end program assumedLength ``` GDB: ``` (gdb) ptype string type = character (5) (gdb) p string $1 = 'Hello' ``` Reviewed By: aprantl, schweitz Differential Revision: https://reviews.llvm.org/D86305	2020-08-22 10:13:40 +05:30
Alina Sbirlea	f55ad3973d	[DomTree] Extend update API to allow a post CFG view. Extend the `applyUpdates` in DominatorTree to allow a post CFG view, different from the current CFG. This patch implements the functionality of updating an already up to date DT, to the desired PostCFGView. Combining a set of updates towards an up to date DT and a PostCFGView is not yet supported. Differential Revision: https://reviews.llvm.org/D85472	2020-08-21 17:23:08 -07:00
Paul C. Anagnostopoulos	196e6f9f18	Replace TableGen range piece punctuator with '...' The TableGen range piece punctuator is currently '-' (e.g., {0-9}), which interacts oddly with the fact that an integer literal's sign is part of the literal. This patch replaces the '-' with the new punctuator '...'. The '-' punctuator is deprecated. Differential Revision: https://reviews.llvm.org/D85585 Change-Id: I3d53d14e23f878b142d8f84590dd465a0fb6c09c	2020-08-21 23:33:57 +02:00
Roman Lebedev	503deec218	Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit `1d51dc38d8`.	2020-08-22 00:33:22 +03:00
Nicolai Hähnle	17cd34409a	Fix two bugs in TGParser::ParseValue TGParser::ParseValue contains two recursive calls, one to parse the RHS of a list paste operator and one to parse the RHS of a paste operator in a class/def name. Both of these calls neglect to check the return value to see if it is null (because of some error). This causes a crash in the next line of code, which uses the return value. The code now checks for null returns. Differential Revision: https://reviews.llvm.org/D85852	2020-08-21 23:19:36 +02:00
Arthur Eubanks	b79889c2b1	[opt][NewPM] Add basic-aa in legacy PM compatibility mode The legacy PM alias analysis pipeline by default includes basic-aa. When running `opt -foo-pass` under the NPM and -disable-basic-aa is not specified, use basic-aa. This decreases the number of check-llvm failures under NPM from 913 to 752. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86167	2020-08-21 14:05:07 -07:00
Nicolai Hähnle	b37db11d95	MachineSSAUpdater: Allow initialization with just a register class The register class is required for inserting PHIs, but the "current virtual register" isn't actually used for anything, so let's remove it while we're at it. Differential Revision: https://reviews.llvm.org/D85602 Change-Id: I1e647f31570ef21a7ea8e20db3454178e98a6a8b	2020-08-21 23:04:35 +02:00
kuterd	65fcc0ee31	[Attributor] Function seed allow list - Adds a command line option to seed only selected functions. - Makes seed allow listing exclusive to assertions enabled builds. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D86129	2020-08-21 23:55:26 +03:00
Shinji Okumura	e21a22a7a8	[Attributor] fix AANoUndef initialization Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position. This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86361	2020-08-22 05:06:14 +09:00
Amy Huang	5e3fd471ac	[Cloning] Fix to cloning DISubprograms. When trying to enable -debug-info-kind=constructor there was an assert that occurs during debug info cloning ("mismatched subprogram between llvm.dbg.value variable and !dbg attachment"). It appears that during llvm::CloneFunctionInto, a DISubprogram could be duplicated when MapMetadata is called, and then added to the MD map again when DIFinder gets a list of subprograms. This results in two different versions of the DISubprogram. This patch switches the order so that the DIFinder subprograms are added before MapMetadata is called. Fixes https://bugs.llvm.org/show_bug.cgi?id=46784 Differential Revision: https://reviews.llvm.org/D86185	2020-08-21 11:54:56 -07:00
Stanislav Mekhanoshin	9a9a092e61	[AMDGPU] Avoid sorting stalls in regbank-reassign This is the slowest operation in the already slow pass. Instead of sorting just put a stall list into an ordered map. Differential Revision: https://reviews.llvm.org/D86253	2020-08-21 11:49:41 -07:00
Serguei Katkov	9e362bb0eb	[InstCombine] Remove unused entries in gc-live bundle of statepoint If some of gc live value are not used in gc.relocate we can remove them from gc-live bundle of statepoint instruction. Also the CL removes duplicated Values in gc-live bundle. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85959	2020-08-22 01:36:22 +07:00
Fangrui Song	06cad825cd	PrintStackTrace: don't symbolize if LLVM_DISABLE_SYMBOLIZATION is set See http://lists.llvm.org/pipermail/llvm-dev/2017-June/113975.html for a related previous discussion. Many tools install signal handlers to print stack traces and optionally symbolize the addresses with an external program 'llvm-symbolizer' (when searching for 'llvm-symbolizer', the directory containg the executable is preferred over PATH). 'llvm-symbolizer' can be slow if the executable is large and/or if llvm-symbolizer' itself is under-optimized. For example, my 'llvm-lto2' from a -DCMAKE_BUILD_TYPE=Debug build is 443MiB. The 'llvm-symbolizer' from the same build takes ~2s to symbolize it. (An optimized 'llvm-symbolizer' takes 0.34s). A crashed clang may take more than 5s to symbolize a stack trace. If a test file has several `not --crash` RUN lines. It can be very slow in a Debug build. This patch makes `not --crash` set an environment variable to suppress symbolization. This is similar to D33804 which uses a command line option. I pick 'symbolization' instead of 'symbolication' because the former is used much more commonly and its stem matches 'llvm-symbolizer'. Also set LLVM_DISABLE_CRASH_REPORT=1, which is currently only applicable on `__APPLE__`. Reviewed By: dblaikie, aganea Differential Revision: https://reviews.llvm.org/D86170	2020-08-21 11:27:13 -07:00
Qiu Chaofan	a5b7b8cce0	[PowerPC] Support constrained scalar sitofp/uitofp This patch adds support for constrained scalar int to fp operations on PowerPC. Besides, this also fixes the FP exception bit of FCFID* instructions. Reviewed By: steven.zhang, uweigand Differential Revision: https://reviews.llvm.org/D81669	2020-08-22 02:10:29 +08:00
Serguei Katkov	63d9d56a55	[InstCombine] Move handling of gc.relocate in a gc.statepoint The only def for gc.relocate is a gc.statepoint. But real dependency is not described by def-use chain. Instead this dependency is encoded by indecies of operands in gc-live bundle of statepoint as integer constants in gc.relocate. InstCombine operates by def-use chain. As a result when value in gc-live bundle is simplified the gc.statepoint itself is not simplified but it might simplify dependent gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger check of all dependent gc.relocates by adding them to worklist. This CL handles of gc.relocates as process of gc.statepoint optimization considering gc.statepoint and related gc.relocate as whole entity. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85954	2020-08-21 23:44:23 +07:00
Florian Hahn	bc72a3ab94	[Constants] Handle FNeg in getWithOperands. Currently ConstantExpr::getWithOperands does not handle FNeg and subsequently treats FNeg as binary operator, leading to an assertion failure or segmentation fault if built without assertions. Originally I reproduced this with llvm-dis on a bitcode file, which I unfortunately cannot share and also cannot really reduce. But PR45426 describes the same issue and has a reproducer with Clang, so I'll go with that. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86274	2020-08-21 16:50:56 +01:00
Kamau Bridgeman	365f861c45	[PowerPC][PCRelative] Thread Local Storage Support for Initial Exec This patch is the initial support for the Intial Exec Thread Local Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D81947	2020-08-21 10:13:11 -05:00
diggerlin	a081868921	[AIX][XCOFF] emit symbol visibility for xcoff object file. SUMMARY: Reviewers: Jason liu Differential Revision: https://reviews.llvm.org/D84265	2020-08-21 11:00:56 -04:00
Florian Hahn	8eded24bf4	Recommit "[SCEVExpander] Add helper to clean up instrs inserted while expanding." Recommit the patch after fixing an issue reported caused by the fact that re-used values are also added to InsertedValues. Additional tests have been added in `88818491b9` This reverts the revert commit `38884641f2`.	2020-08-21 15:04:17 +01:00
Cameron McInally	36dbb8fc97	[SVE] Lower fixed length UDIV to scalable Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement. Differential Revision: https://reviews.llvm.org/D86316	2020-08-21 09:01:25 -05:00
Sam Parker	bfc6d8b59b	[NFC][SimplifyCFG] Formatting and variable rename	2020-08-21 13:11:17 +01:00
Xing GUO	f5643dc3dc	Recommit: [DWARFYAML] Add support for referencing different abbrev tables. The original commit (7ff0ace96db9164dcde232c36cab6519ea4fce8) was causing build failure and was reverted in `6d242a7326` ==================== Original Commit Message ==================== This patch adds support for referencing different abbrev tables. We use 'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly assign an abbrev table to compilation units. The syntax is: ``` debug_abbrev: - ID: 0 Table: ... - ID: 1 Table: ... debug_info: - ... AbbrevTableID: 1 ## Reference the second abbrev table. - ... AbbrevTableID: 0 ## Reference the first abbrev table. ``` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D83116	2020-08-21 19:02:10 +08:00
lewis-revill	9e6c09c0d9	[RISCV] Fix inaccurate annotations on PseudoBRIND PseudoBRIND had seemingly inherited incorrect annotations denoting it as a call instruction and that it defines X1/ra. This caused excess save/restore code to be emitted for ra. Differential Revision: https://reviews.llvm.org/D86286	2020-08-21 11:38:42 +01:00
Mirko Brkusanin	0654ff703d	[AMDGPU] Use ds_read/write_b96/b128 when possible for SDag Do not break down local loads and stores so ds_read/write_b96/b128 in ISelLowering can be selected on subtargets that support them and if align requirements allow them. Differential Revision: https://reviews.llvm.org/D84403	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	5bd1febe21	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Florian Hahn	9f7350672e	[DSE,MemorySSA] Handle atomicrmw/cmpxchg conservatively. This adds conservative handling of AtomicRMW/AtomicCmpXChg to isDSEBarrier, similar to atomic loads and stores.	2020-08-21 10:42:42 +01:00
Roman Lebedev	5d7c5a5e99	[NFC] Port InstCount pass to new pass manager	2020-08-21 12:39:42 +03:00
Jay Foad	0819a6416f	[SelectionDAG] Better legalization for FSHL and FSHR In SelectionDAGBuilder always translate the fshl and fshr intrinsics to FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and ORs. Improve the legalization of FSHL and FSHR to avoid code quality regressions. Differential Revision: https://reviews.llvm.org/D77152	2020-08-21 10:32:49 +01:00
Jay Foad	98de0d22f5	[AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy	2020-08-21 10:14:35 +01:00
Yevgeny Rouban	18bc400f97	[NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks Both AfterPass and AfterPassInvalidated pass instrumentation callbacks get additional parameter of type PreservedAnalyses. This patch was created by @fedor.sergeev. I have just slightly changed it. Reviewers: fedor.sergeev Differential Revision: https://reviews.llvm.org/D81555	2020-08-21 16:10:42 +07:00
Sam Parker	47251582f5	[SimplifyCFG] Cost required selects Before we speculatively execute a basic block, query the cost of inserting the necessary select instructions against the phi folding threshold. For non-trivial insertions, a more accurate decision can probably be made during machine if-conversion. With minsize we query the CodeSize cost, otherwise we use SizeAndLatency. Differential Revision: https://reviews.llvm.org/D82438	2020-08-21 09:52:52 +01:00
Florian Hahn	a0e92ffd0d	[DSE,MemorySSA] Split off partial tracking from isOverwite. When traversing memory uses to look for aliasing reads/writes, we only care about complete overwrites. This patch splits off the partial overwrite tracking from isOverwrite This avoids some unnecessary work when checking for read/write clobbers with MemorySSA-DSE. isOverwrite, which skips the partial overwrite tracking. This gives a relatively small improvement http://llvm-compile-time-tracker.com/compare.php?from=ef2a2f77f87553a0a4a39f518eb9ac86b756bda6&to=658f3905dd96d3415f3782adc712c79fa59a4665&stat=instructions This is part of the patches to bring down compile-time to the level referenced in http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86280	2020-08-21 09:13:59 +01:00
Sam Parker	acf0bb41e4	[ARM][CostModel] Select instruction costs. Modify the ARM getCmpSelInstrCost implementation for the code size costs of selects. Now consider the legalization cost and increase the cost of i1 because those values wouldn't live in a general purpose register. We also make selects +1 more expensive to account for the IT instruction. Differential Revision: https://reviews.llvm.org/D82091	2020-08-21 08:49:56 +01:00
David Green	2b69efded0	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
Mehdi Amini	927da43ade	Allow multiple calls to InitLLVM() (NFC) In `e99dee82b0`, the "out_of_memory_new_handler" was changed to be explicitly initialized instead of relying on a global static constructor. However before this change, install_out_of_memory_new_handler could be called multiple times while it asserts right now. We can be more tolerant to calling multiple time InitLLVM without reintroducing a global constructor for this handler. Differential Revision: https://reviews.llvm.org/D86330	2020-08-21 06:13:00 +00:00
Xing GUO	6d242a7326	Revert "[DWARFYAML] Add support for referencing different abbrev tables." This reverts commit `f7ff0ace96`. This change is causing build failure. http://lab.llvm.org:8011/builders/clang-cmake-armv7-global-isel/builds/10400	2020-08-21 12:15:54 +08:00
Xing GUO	f7ff0ace96	[DWARFYAML] Add support for referencing different abbrev tables. This patch adds support for referencing different abbrev tables. We use 'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly assign an abbrev table to compilation units. The syntax is: ``` debug_abbrev: - ID: 0 Table: ... - ID: 1 Table: ... debug_info: - ... AbbrevTableID: 1 ## Reference the second abbrev table. - ... AbbrevTableID: 0 ## Reference the first abbrev table. ``` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D83116	2020-08-21 11:44:25 +08:00
Xing GUO	290e399f96	[DWARFYAML] Add support for emitting multiple abbrev tables. This patch adds support for emitting multiple abbrev tables. Currently, compilation units will always reference the first abbrev table. Reviewed By: jhenderson, labath Differential Revision: https://reviews.llvm.org/D86194	2020-08-21 10:12:08 +08:00
Michael Liao	5257a60ee0	[amdgpu] Add codegen support for HIP dynamic shared memory. Summary: - HIP uses an unsized extern array `extern __shared__ T s[]` to declare the dynamic shared memory, which size is not known at the compile time. Reviewers: arsenm, yaxunl, kpyzhov, b-sumner Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82496	2020-08-20 21:29:18 -04:00
Kang Zhang	95e18b2d9d	[PowerPC] Fix a typo for InstAlias of mfsprg D77531 has a type for mfsprg, it should be mtsprg. This patch is to fix this typo.	2020-08-21 01:10:52 +00:00
Justin Bogner	1283dca007	[GISel] Correct the known bits of G_ANYEXT Known bits for G_ANYEXT was incorrectly using KnownBits::zext, causing us to treat the high bits as zero even though they're (by definition) unknown. Differential Revision: https://reviews.llvm.org/D86323	2020-08-20 17:17:04 -07:00
Jon Roelofs	74ca5275e9	Fix a couple of typos. NFC	2020-08-20 14:56:57 -06:00
Matt Arsenault	79ce9bb380	CodeGen: Don't drop AA metadata when splitting MachineMemOperands Assuming this is used to split a memory access into smaller pieces, the new access should still have the same aliasing properties as the original memory access. As far as I can tell, this wasn't intentionally dropped. It may be necessary to drop this if you are moving the operand outside of the bounds of the original object in such a way that it may alias another IR object, but I don't think any of the existing users are doing this. Some of the uses widen into unused alignment padding, which I think is OK.	2020-08-20 16:17:30 -04:00
Matt Arsenault	18b218007d	AMDGPU/GlobalISel: Legalize odd sized loads with widening Custom lower and widen odd sized loads up to the alignment. The default set of legalization actions doesn't have a way to represent this. This fixes naturally aligned <3 x s8> and <3 x s16> loads. This also starts moving towards eliminating the buggy and overcomplicated legalization rules for narrowing. All the memory size changes should be done in the lower or custom action, not NarrowScalar / FewerElements. These currently have redundant and ambiguous code with the lower action.	2020-08-20 16:15:53 -04:00
vnalamot	54d8ded4b1	allSGPRSpillsAreDead() should use actual FP/BP frame indices The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead() make sure there are no SGPR spills pending during PEI. But the FP/BP spills happen during PEI and are exceptions. Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to accommodate the exceptions. Differential Revision: https://reviews.llvm.org/D86291	2020-08-20 16:15:53 -04:00
Kamau Bridgeman	b74b80bb2d	[PowerPC][PCRelative] Thread Local Storage Support for General Dynamic This patch is the initial support for the General Dynamic Thread Local Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Patch by: NeHuang Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D82315	2020-08-20 15:08:13 -05:00
Cameron McInally	ac63959460	[SVE] Lower fixed length vXi8/vXi16 SDIV to scalable There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32. Differential Revision: https://reviews.llvm.org/D86114	2020-08-20 13:47:01 -05:00
Jessica Clarke	3149ec07c0	[RISCV] Enable MCCodeEmitter instruction predicate verifier This ensures that we never encode an instruction which is unavailable, such as if we explicitly insert a forbidden instruction when lowering. This is particularly important on RISC-V given its high degree of modularity, and will become increasingly important as new standard extensions appear. Reviewed By: asb, lenary Differential Revision: https://reviews.llvm.org/D85015	2020-08-20 18:36:54 +01:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Mircea Trofin	364cd768a2	[NFC] Expose the -Oz module optimization pipeline to opt This exposes the module optimization pipeline as a pass that can be applied stand-alone when using 'opt'. This helps ml inliner training scenarios, where we start with IR captured right before inlining, perform the inlining (-scc-oz-module-inliner) and then want to continue and observe the final IR (where this patch comes into play). We can then apply llc on the resulting IR to continue compilation down to native. Differential Revision: https://reviews.llvm.org/D86224	2020-08-20 09:28:58 -07:00
Jay Foad	4aaf772542	[PeepholeOptimizer] Remove dead code At this point we have already ruled out all def operands, so we can't possibly see a dead implicit def operand.	2020-08-20 16:48:57 +01:00
David Green	816097e4e5	[LV] Allow tail folded reduction selects to remain in the loop The normal scheme for tail folding reductions is to use: loop: p = phi(0, a) mask = ... x = masked_load(..., mask) a = add(x, p) s = select(mask, a, p) This means we need to keep the register p and a alive out of the loop, plus the mask. On a target with predicated operations we can instead generate the phi as p = phi(0, s). This ensures the select in the loop and we can fold select(m, add(a, b), c) to something like a vaddt c, a, b using the m predicate. This in turn allows us to tail predicate the entire loop. Differential Revision: https://reviews.llvm.org/D84741	2020-08-20 14:31:14 +01:00
Bjorn Pettersson	ff107eed15	[AArch64] Update a code comment incorrectly referring to zero_reg. NFC The getSrcFromCopy helper nowadays return a MachineOperand pointer, so talking about zero_reg was incorrect as it nowadays return a nullptr when not finding a copy like instruction.	2020-08-20 14:36:59 +02:00
Shinji Okumura	835cfa5def	[Attributor] Handle CallBase case in AAValueConstantRange::initialize Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case. That is problematic when we propagate a range from call site returned position to floating position. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86196	2020-08-20 20:15:19 +09:00
Paul Walker	0015b8db8e	[SVE] Add ISEL patterns for predicated shifts by an immediate. For scalable vector shifts the prediacte is typically all active, which gets selected to an unpredicated shift by immediate. When code generating for fixed length vectors the predicate is based on the vector length and so additional patterns are required to make use of SVE's predicated shift by immediate instructions. Differential Revision: https://reviews.llvm.org/D86204	2020-08-20 11:47:20 +01:00
David Stenberg	8206257cb8	[GlobalOpt] Fix an incorrect Modified status When removing a non-constant store to a global in CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return false. This was caught using the check introduced by D80916. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86149	2020-08-20 11:52:09 +02:00
David Stenberg	7a1029fd1e	Reland "[LoopUnswitch] Fix incorrect Modified status" Relanded since the buildbot issue was unrelated to this commit. When hoisting simple values out from a loop, and an optsize attribute, a convergent call, or an invoke instruction hindered the pass from unswitching the loop, the pass would return an incorrect Modified status. This was caught using the check introduced by D80916. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86085	2020-08-20 11:52:09 +02:00
Bjorn Pettersson	b43235a76c	[DebugInfo] Fix DwarfExpression::addConstantFP for float on big-endian The byte swapping, when dealing with 4 byte (float) FP constants in DwarfExpression::addConstantFP, added in commit `ef8992b9f0` was not correct. It always performed byte swapping using an uint64_t value. When dealing with 4 byte values the 4 interesting bytes ended up in the big end of the uint64_t, but later we emitted the 4 bytes at the little end. So we ended up with zeroes being emitted and faulty debug information. This patch simplifies things a bit, IMHO. Using the APInt representation throughout the function, instead of looking at the internal representation using getRawBytes and without using reinterpret_cast etc. And using API.byteSwap() should result in correct byte swapping independent of APInt being 4 or 8 bytes. Differential Revision: https://reviews.llvm.org/D86272	2020-08-20 11:48:05 +02:00
David Stenberg	ca688ae497	Revert "[LoopUnswitch] Fix incorrect Modified status" This reverts commit `dfd447c220`. After I pushed this commit, llvm-sphinx-docs started failing, due to: Warning, treated as error: extension 'recommonmark' has no setup() function; is it really a Sphinx extension module? I don't see how this commit may have caused that, but I'm still reverting it since I don't know how to proceed with that troubleshooting.	2020-08-20 11:14:23 +02:00
Evgeny Leviant	d5b701b972	[ThinLTO] Import globals recursively Differential revision: https://reviews.llvm.org/D73698	2020-08-20 12:13:43 +03:00
Sebastian Neubauer	b8d1994778	[AMDGPU] Add A16/G16 to InstCombine When sampling from images with coordinates that only have 16 bit accuracy, convert the image intrinsic call to use a16 or g16. This does only happen if the target hardware supports it. An alternative would be to always apply this combination, independent of the target hardware and extend 16 bit arguments to 32 bit arguments during legalization. To me, this sounds like an unnecessary roundtrip that could prevent some further InstCombine optimizations. Differential Revision: https://reviews.llvm.org/D85887	2020-08-20 10:51:49 +02:00
Konstantin Schwarz	7497b861f4	[GlobalISel][IRTranslator] Support PHI instructions in landingpad blocks The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear before the landingpad instructions, resulting in a fallback to SelectionDAG. This change relaxes the check to allow PHI nodes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86141	2020-08-20 10:49:31 +02:00
Georgii Rymar	a6436b0b3a	[yaml2obj] - Make the 'Machine' key optional. Currently we have to set 'Machine' to something in our YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets and 'EM_386' for 32-bit targets. At the same time, in fact, in most cases our tests do not need a machine type and we can use 'EM_NONE'. This is cleaner, because avoids the need of using a particular machine. In this patch I've made the 'Machine' key optional (the default value, when it is not specified is `EM_NONE`) and removed it (where possible) from yaml2obj, obj2yaml and llvm-readobj tests. There are few tests left where I decided not to remove it, because I didn't want to touch CHECK lines or doing anything more complex than a removing a "Machine: *" line and formatting lines around. Differential revision: https://reviews.llvm.org/D86202	2020-08-20 11:40:51 +03:00
Bevin Hansson	1a995a0af3	[ADT] Move FixedPoint.h from Clang to LLVM. This patch moves FixedPointSemantics and APFixedPoint from Clang to LLVM ADT. This will make it easier to use the fixed-point classes in LLVM for constructing an IR builder for fixed-point and for reusing the APFixedPoint class for constant evaluation purposes. RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html Reviewed By: leonardchan, rjmccall Differential Revision: https://reviews.llvm.org/D85312	2020-08-20 10:29:45 +02:00
dfukalov	33e2f69a24	[AMDGPU][LoopUnroll] Increase BB size to analyze for complete unroll. The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no information about a loop body BB cost. In some cases, e.g. for simple loop ``` for(int i=0; i<32; ++i){ D = Arr2[i8 + C1]; Arr1[i64 + C2] += C3 * D; Arr1[i64 + C2 + 2048] += C4 D; } ``` current default parameter value is not enough to run deeper cost analyze so the loop is not completely unrolled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D86248	2020-08-20 10:41:47 +03:00
Yvan Roux	0459f29e8b	[ARM][MachineOutliner] Add default mode. Use the stack to save and restore the link register when there is no available register to do it. Differential Revision: https://reviews.llvm.org/D76069	2020-08-20 09:25:33 +02:00
David Stenberg	dfd447c220	[LoopUnswitch] Fix incorrect Modified status When hoisting simple values out from a loop, and an optsize attribute, a convergent call, or an invoke instruction hindered the pass from unswitching the loop, the pass would return an incorrect Modified status. This was caught using the check introduced by D80916. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86085	2020-08-20 09:04:16 +02:00
Johannes Doerfert	012819f301	[Attributor][FIX] Update the call graph properly when internalizing functions The internal version is now part of the SCC, make sure to perform this update.	2020-08-20 01:44:58 -05:00
Johannes Doerfert	3edea15f9a	[Attributor] Simplify comparison against constant null pointer Comparison against null is a common pattern that usually is followed by error handling code and the likes. We now use AANonNull to simplify these comparisons optimistically in order to make more code dead early on. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D86145	2020-08-20 01:44:58 -05:00
Johannes Doerfert	d01ad217ba	[Attributor][FIX] Do not use cyclic arguments for `nonnull` `AADereferenceable::getAssumedDereferenceableBytes()` is actually deducing `dereferenceable_or_null`. We should not use that information to deduce `nonnull`, since it doesn't imply `nonnull`.	2020-08-20 01:44:58 -05:00
Johannes Doerfert	a49dae0e38	[Attributor][AAIsDead][NFC] Skip uninteresting instructions early	2020-08-20 01:44:58 -05:00
Johannes Doerfert	08f33756e6	[Attributor][NFC] Extract functionality into own member	2020-08-20 01:44:58 -05:00
Qiu Chaofan	131b3b9ed4	[PowerPC] Support constrained scalar fptosi/fptoui This patch adds support for constrained scalar fp to int operations on PowerPC. Besides, this fixes the FP exception bit of quad-precision convert & truncate instructions. Reviewed By: steven.zhang, uweigand Differential Revision: https://reviews.llvm.org/D81537	2020-08-20 13:29:43 +08:00
Johannes Doerfert	1de70a724e	Revert "[OpenMPOpt] ICV tracking for calls" This commits breaks certain OpenMP codes (on power) because it expanded the Attributor scope without telling the Attributor about the SCC extend. See: https://reviews.llvm.org/D85544#2227611 This reverts commit `b0b32e6490`.	2020-08-20 00:00:35 -05:00
Craig Topper	8750d54cea	[X86][AutoUpgrade] Simplify string management in UpgradeDataLayoutString a bit. NFCI We don't need a std::string for a literal string, we can use a StringRef. The addition of StringRefs produces a Twine that we can just call str() without converting to a SmallString ourselves. Twine will do that internally.	2020-08-19 17:48:11 -07:00
Matt Arsenault	31adc28d24	GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.	2020-08-19 18:53:24 -04:00
Petr Hosek	1ed1e16ab8	[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1119478 Patch By: haampie Differential Revision: https://reviews.llvm.org/D86245	2020-08-19 14:33:52 -07:00
Kyungwoo Lee	7a028fe702	Force Remove Attribute -force-attribute adds an attribute to function via command-line. However, there was no counter-part to remove an attribute. This patch adds -force-remove-attribute that removes an attribute from function. Differential Revision: https://reviews.llvm.org/D85586	2020-08-19 17:30:13 -04:00
Sanjay Patel	6f3511a01a	[ValueTracking] define/use max recursion depth in header There's a potential motivating case to increase this limit in PR47191: http://bugs.llvm.org/PR47191 But first we should make it less hacky. The limit in InstCombine is directly tied to this value because an increase there can cause asserts in the underlying value tracking calls if not changed together. The usage in VectorUtils is independent, but the comment suggests that we should use the same value unless there's a known reason to diverge. There are similar limits in codegen analysis, but I think we should leave those independent in case we intentionally want the optimization power/cost to be different there. Differential Revision: https://reviews.llvm.org/D86113	2020-08-19 16:56:59 -04:00
Hiroshi Yamauchi	28ccc52c40	[X86] Add feature for Fast Short REP MOV (FSRM) for Icelake or newer. Differential Revision: https://reviews.llvm.org/D85989	2020-08-19 13:39:42 -07:00
Sourabh Singh Tomar	ef8992b9f0	Re-apply "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants" This patch was reverted in `7c182663a8` due to some failures observed on PCC based machines. Failures were due to Endianness issue and long double representation issues. Patch is revised to address Endianness issue. Furthermore, support for emission of `DW_OP_implicit_value` for `long double` has been removed (since it was unclean at the moment). Planning to handle this in a clean way soon! For more context, please refer to following review link. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83560	2020-08-20 01:39:42 +05:30
Sourabh Singh Tomar	9937872c02	Revert "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants" This reverts commit `15801f1619`. arc's land messed up! It removed the new commit message and took it from revision.	2020-08-20 01:28:03 +05:30
Raul Tambre	e887d0e89b	[AArch64][GlobalISel] Handle rtcGPR64RegClassID in AArch64RegisterBankInfo::getRegBankFromRegClass() TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16 and X17, as it's the last matching class. This in turn gets passed to AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable. It seems sensible to handle this case, so copies from X16 and X17 work. Copying from X17 is used in inline assembly in libunwind for pointer authentication. Differential Revision: https://reviews.llvm.org/D85720	2020-08-19 12:52:30 -07:00
Sourabh Singh Tomar	15801f1619	[DebugInfo] Emit DW_OP_implicit_value for Floating point constants llvm is missing support for DW_OP_implicit_value operation. DW_OP_implicit_value op is indispensable for cases such as optimized out long double variables. For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions Consider the following example: ``` int main() { long double ld = 3.14; printf("dummy\n"); ld *= ld; return 0; } ``` when compiled with tunk `clang` as `clang test.c -g -O1` produces following location description of variable `ld`: ``` DW_AT_location (0x00000000: [0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value) DW_AT_name ("ld") ``` Here one may notice that this representation is incorrect(DWARF4 stack could only hold integers(and only up to the size of address)). Here the variable size itself is `128` bit. GDB and LLDB confirms this: ``` (gdb) p ld $1 = <invalid float value> (lldb) frame variable ld (long double) ld = <extracting data from value failed> ``` GCC represents/uses DW_OP_implicit_value in these sort of situations. Based on the discussion with Jakub Jelinek regarding GCC's motivation for using this, I concluded that DW_OP_implicit_value is most appropriate in this case. Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html GDB seems happy after this patch:(LLDB doesn't have support for DW_OP_implicit_value) ``` (gdb) p ld p ld $1 = 3.14000000000000012434 ``` Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83560	2020-08-20 01:20:40 +05:30
Hiroshi Yamauchi	ab401a8c8a	[PGO][PGSO][LV] Fix loop not vectorized issue under profile guided size opts. D81345 appears to accidentally disables vectorization when explicitly enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to the vectorization with versioning-for-unit-stride for PGSO. Differential Revision: https://reviews.llvm.org/D85784	2020-08-19 12:13:34 -07:00
Matt Arsenault	adbcc8e733	GlobalISel: Add TargetLowering member to LegalizerHelper	2020-08-19 14:50:35 -04:00
Florian Hahn	c0cbe6453a	[DSE] Remove dead argument from removePartiallyOverlappedStores (NFC). The argument is unused and can be removed.	2020-08-19 19:33:52 +01:00
Matt Arsenault	d64ad3f051	GlobalISel: Don't check for verifier enforced constraint Loads are always required to have a single memory operand.	2020-08-19 14:15:38 -04:00
Matt Arsenault	9e8d59a9b8	AMDGPU/GlobalISel: Remove hack for combines forming illegal extloads Previously we weren't adding the LegalizerInfo to the post-legalizer combiner. Since that's fixed, we don't need to try to filter out the one case that was breaking.	2020-08-19 14:15:38 -04:00
Matt Arsenault	e95c08432a	GlobalISel: Use Register	2020-08-19 13:45:31 -04:00
Petr Hosek	8e4acb82f7	[CMake] Fix OCaml build failure because of absolute path in system libs D85820 introduced a full path in the LLVM_SYSTEM_LIBS property of the LLVMSupport target, which made the OCaml bindings fail to build, since they use -l [system_lib] flags for every lib in LLVM_SYSTEM_LIBS, which cannot work with absolute paths. This patch solves the issue in a similar vain as ZLIB does it: it adds the full library path to imported_libs, and adds a stripped down version without directories, lib prefix and lib suffix to system_libs In the future we should probably make some changes to LLVM_SYSTEM_LIBS, since both zlib and ncurses do not necessarily have to be system libs anymore due to the find_package / find_library bits introduced in D85820 and D79219. Patch By: haampie Differential Revision: https://reviews.llvm.org/D86134	2020-08-19 10:33:03 -07:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Jessica Paquette	d25b12bdc3	[GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x If we have a mask, and a value x, where (x & mask) == x, we can drop the AND and just use x. This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64. In AArch64, this is most useful post-legalization. Patterns like this often show up when legalizing s1s, which must be extended to larger types. e.g. ``` %cmp:_(s32) = G_ICMP ... %and:_(s32) = G_AND %cmp, 1 ``` Since G_ICMP only produces a single bit, there's no reason to mask it with the G_AND. Differential Revision: https://reviews.llvm.org/D85463	2020-08-19 10:20:57 -07:00
Hamilton Tobon Mosquera	bd2fa1819b	[OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of __tgt_target_data_begin_mapper canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point. Differential Revision: https://reviews.llvm.org/D86155	2020-08-19 11:42:22 -05:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Sanjay Patel	c8d711adae	[InstCombine] reduce code duplication; NFC	2020-08-19 12:05:12 -04:00
Benjamin Kramer	b98e25b6d7	Make helpers static. NFC.	2020-08-19 16:00:03 +02:00
Roman Lebedev	3d76a133c7	Revert "[InstCombine] Lower infinite combine loop detection thresholds" And as being reported by Florian Hahn, there's a hit in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto, so reverting until addressed. This reverts commit `71e0b82c9f`.	2020-08-19 16:53:30 +03:00
Simon Pilgrim	057bdd63a4	[X86][AVX] lowerShuffleWithVPMOV - minor refactor to more closely match lowerShuffleAsVTRUNC Replace isBuildVectorAllZeros check by using the Zeroable bitmask instead.	2020-08-19 14:34:32 +01:00
Simon Pilgrim	9fee2bad6d	[X86] lowerShuffleWithVPMOV - remove unnecessary shuffle commutation. NFCI. canonicalizeShuffleMaskWithCommute should have already ensured the lower elements are from V1, we do have test coverage for this already.	2020-08-19 13:28:59 +01:00
Simon Pilgrim	b61cef3a92	[X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths. Thanks to @fhahn for the test case.	2020-08-19 13:00:26 +01:00
Roman Lebedev	71e0b82c9f	[InstCombine] Lower infinite combine loop detection thresholds It's been a month since `2f3862eb9f`, and no new bug reports about the threshold were filled, so let's bump it again and wait again.	2020-08-19 14:37:57 +03:00
Simon Pilgrim	80a0dc59b7	[X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling. Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.	2020-08-19 11:39:27 +01:00
Simon Pilgrim	46fc9a0dfc	[X86][AVX] Fold store(extract_element(vtrunc)) to truncated store Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly. Differential Revision: https://reviews.llvm.org/D86158	2020-08-19 11:10:20 +01:00
sstefan1	b0b32e6490	[OpenMPOpt] ICV tracking for calls Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a function can have a unique ICV value after it is finished, and AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a call site. This enables us to check the value of a call and if it changes the ICV. This also changes the approach in `getReplacementValues()` to a worklist-based approach so we can explore all relevant BBs. Differential Revision: https://reviews.llvm.org/D85544	2020-08-19 11:43:12 +02:00
Meera Nakrani	545de56f87	[ARM] Enabled VMLAV and Add instructions to use VMLAVA Used InstCombine to enable VMLAV and Add instructions to generate VMLAVA instead with tests.	2020-08-19 08:36:49 +00:00

... 5 6 7 8 9 ...

138742 Commits