llvm-project

Commit Graph

Author	SHA1	Message	Date
Wang, Pengfei	8dcf8d1da5	[msan] Fix bugs when instrument x86.avx512_cvt intrinsics. Scalar intrinsics x86.avx512_cvt have an extra rounding mode operand. We can directly ignore it to reuse the SSE/AVX math. This fix the bug https://bugs.llvm.org/show_bug.cgi?id=48298. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92206	2020-11-27 16:33:14 +08:00
Markus Lavin	808fcfe594	Revert "[DebugInfo] Improve dbg preservation in LSR." This reverts commit `06758c6a61`. Bug: https://bugs.llvm.org/show_bug.cgi?id=48166 Additional discussion in: https://reviews.llvm.org/D91711	2020-11-27 08:52:32 +01:00
Max Kazantsev	faf183874c	[IndVars] LCSSA Phi users should not prevent widening When widening an IndVar that has LCSSA Phi users outside the loop, we can safely widen it as usual and then truncate the result outside the loop without hurting the performance. Differential Revision: https://reviews.llvm.org/D91593 Reviewed By: skatkov	2020-11-27 11:19:54 +07:00
Roman Lebedev	f3abd54958	Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions" Many bots are unhappy, at the very least missed a few codegen tests, and possibly this has a logic hole inducing a miscompile (will be really awesome to have ready reproducer..) Need to investigate. This reverts commit `2245fb8aaa`.	2020-11-26 23:13:43 +03:00
Roman Lebedev	2245fb8aaa	[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-26 22:51:22 +03:00
Roman Lebedev	65db7d38e0	[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold	2020-11-26 22:51:21 +03:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Florian Hahn	bd0b1311db	[VPlan] Turn VPReplicateRecipe into a VPValue. Update VPReplicateRecipe to inherit from VPValue. This still does not update scalarizeInstruction to set the result for the VPValue of VPReplicateRecipe, because this first requires tracking scalar values in VPTransformState. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D91500	2020-11-26 13:50:24 +00:00
David Stenberg	384996f9e1	[IndVarSimplify] Fix Modified status when handling dead PHI nodes When bailing out in rewriteLoopExitValues() you could be left with PHI nodes in the DeadInsts vector. Those would be not handled by the use of RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This resulted in the IndVarSimplify pass returning an incorrect modified status. This was caught by the expensive check introduced in D86589. This patches changes IndVarSimplify so that it deletes those PHI nodes, using RecursivelyDeleteDeadPHINode(). This fixes PR47486. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91153	2020-11-26 14:28:21 +01:00
Zhengyang Liu	345fcccb33	Fix use-of-uninitialized-value in rG75f50e15bf8f Differential Revision: https://reviews.llvm.org/D71126	2020-11-26 01:39:22 -07:00
Max Kazantsev	664e1da485	[LoopLoadElim] Make sure all loops are in simplify form. PR48150 LoopLoadElim may end up expanding an AddRec from a loop which is not the current loop. This loop may not be in simplify form. We figure it out after the no-return point, so cannot bail in this case. AddRec requires simplify form to expand. The only way to ensure this does not crash is to simplify all loops beforehand. The issue only exists in new PM. Old PM requests LoopSimplify required pass and it simplifies all loops before the opt begins. Differential Revision: https://reviews.llvm.org/D91525 Reviewed By: asbirlea, aeubanks	2020-11-26 10:51:11 +07:00
Roman Lebedev	a8d74517dc	[PassManager] Run Induction Variable Simplification pass after Recognize loop idioms pass, not before Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does. I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand, but i'm very sure that `-indvars` requires `-loop-idiom` to run afterwards, as it can be seen in the phase-ordering test. LoopIdiom runs on two types of loops: countable ones, and uncountable ones. For uncountable ones, IndVars obviously didn't make any change to them, since they are uncountable, so for them the order should be irrelevant. For countable ones, well, they should have been countable before IndVars for IndVars to make any change to them, and since SCEV is used on them, it shouldn't matter if IndVars have already canonicalized them. So i don't really see why we'd want the current ordering. Should this cause issues, it will give us a reproducer test case that shows flaws in this logic, and we then could adjust accordingly. While this is quite likely beneficial in-the-wild already, it's a required part for the full motivational pattern behind `left-shift-until-bittest` loop idiom (D91038). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91800	2020-11-25 19:20:07 +03:00
Cullen Rhodes	1ba4b82f67	[LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits MaxSafeRegisterWidth is a misnomer since it actually returns the maximum safe vector width. Register suggests it relates directly to a physical register where it could be a vector spanning one or more physical registers. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91727	2020-11-25 13:06:26 +00:00
Florian Hahn	ad5b83ddcf	[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs. This is a follow-up to `00a6601136` to make isa<VPReductionRecipe> work and unifies the VPValue ID names, by making sure they all consistently start with VPV*.	2020-11-25 11:08:25 +00:00
David Green	e0c479cd0e	[VPlan] Switch VPWidenRecipe to be a VPValue Similar to other patches, this makes VPWidenRecipe a VPValue. Because of the way it interacts with the reduction code it also slightly alters the way that VPValues are registered, removing the up front NeedDef and using getOrAddVPValue to create them on-demand if needed instead. Differential Revision: https://reviews.llvm.org/D88447	2020-11-25 08:25:06 +00:00
David Green	00a6601136	[VPlan] Turn VPReductionRecipe into a VPValue This converts the VPReductionRecipe into a VPValue, like other VPRecipe's in preparation for traversing def-use chains. It also makes it a VPUser, now storing the used VPValues as operands. It doesn't yet change how the VPReductionRecipes are created. It will need to call replaceAllUsesWith from the original recipe they replace, but that is not done yet as VPWidenRecipe need to be created first. Differential Revision: https://reviews.llvm.org/D88382	2020-11-25 08:25:05 +00:00
Kazu Hirata	1c82d32089	[CHR] Use pred_size (NFC)	2020-11-24 22:52:30 -08:00
Max Kazantsev	28d7ba1543	[IndVars] Use more precise context when eliminating narrowing When deciding to widen narrow use, we may need to prove some facts about it. For proof, the context is used. Currently we take the instruction being widened as the context. However, we may be more precise here if we take as context the point that dominates all users of instruction being widened. Differential Revision: https://reviews.llvm.org/D90456 Reviewed By: skatkov	2020-11-25 11:47:39 +07:00
Philip Reames	10ddb927c1	[SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC] Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.	2020-11-24 18:47:49 -08:00
Sanjay Patel	678b9c5dde	[InstCombine] try difference-of-shifts factorization before negator We need to preserve wrapping flags to allow better folds. The cases with geps may be non-intuitive, but that appears to agree with Alive2: https://alive2.llvm.org/ce/z/JQcqw7 We create 'nsw' ops independent from the original wrapping on the sub.	2020-11-24 13:56:30 -05:00
Philip Reames	075468621c	[LoopVec] Add a minor clarifying comment	2020-11-24 10:45:06 -08:00
Teresa Johnson	6e4c1cf293	[ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends Previously this option could be used to skip devirtualizations of the given functions in regular LTO and in the ThinLTO indexing step. This change allows them to be skipped in the backend as well, which is useful when debugging WPD in a distributed ThinLTO backend. Differential Revision: https://reviews.llvm.org/D91812	2020-11-24 09:35:07 -08:00
Ayal Zaks	32d9a386bf	[LV] Keep Primary Induction alive when folding tail by masking Fix PR47390. The primary induction should be considered alive when folding tail by masking, because it will be used by said masking; even when it may otherwise appear useless: feeding only its own 'bump', which is correctly considered dead, and as the 'bump' of another induction variable, which may wrongfully want to consider its bump = the primary induction, dead. Differential Revision: https://reviews.llvm.org/D92017	2020-11-24 15:12:54 +02:00
Arthur Eubanks	932e4f8815	[FunctionAttrs][NPM] Fix handling of convergent The legacy pass didn't properly detect indirect calls. We can still remove the convergent attribute when there are indirect calls. The LangRef says: > When it appears on a call/invoke, the convergent attribute indicates that we should treat the call as though we’re calling a convergent function. This is particularly useful on indirect calls; without this we may treat such calls as though the target is non-convergent. So don't skip handling of convergent when there are unknown calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89826	2020-11-23 21:09:41 -08:00
Philip Reames	1a9c72f8a8	[LoopVec] Reuse a lambda [NFC] Minor code refactor to improve readability.	2020-11-23 21:07:34 -08:00
Philip Reames	b06a2ad94f	[LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE) A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane. This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic. In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs. The discussion on that item remains unsettled and is pending larger architectural discussion. We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled. Differential Revision: https://reviews.llvm.org/D91398	2020-11-23 15:32:17 -08:00
Sanjay Patel	ab29f091eb	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps This is a retry of `324a53205`. I cautiously reverted that at `6aa3fc4` because the rules about gep math were not clear. Since then, we have added this line to LangRef for gep inbounds: "The successive addition of offsets (without adding the base address) does not wrap the pointer index type in a signed sense (nsw)." See D90708 and post-commit comments on the revert patch for more details.	2020-11-23 16:50:09 -05:00
Arthur Eubanks	3c811ce4f3	[NPM] Share pass building options with legacy PM We should share options when possible. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91741	2020-11-23 13:04:05 -08:00
Sjoerd Meijer	33b2c88fa8	[LoopFlatten] Widen IV, support ZExt. I disabled the widening in `fa5cb4b` because it run in an assert, which was related to replacing values with different types. I forgot that an extend could also be a zero-extend, which I have added now. This means that the approach now is to create and insert a trunc value of the outerloop for each user, and use that to replace IV values. Differential Revision: https://reviews.llvm.org/D91690	2020-11-23 08:57:19 +00:00
Kazu Hirata	df73b8c174	[ValueMapper] Remove unused declaration remapFunction (NFC) The function declaration with two parameters was introduced on Apr 16 2016 in commit `f0d73f95c1` without a corresponding definition.	2020-11-22 21:52:03 -08:00
Kazu Hirata	186d129320	[hwasan] Remove unused declaration shadowBase (NFC) The function was introduced on Jan 23, 2019 in commit `73078ecd38`. Its definition was removed on Oct 27, 2020 in commit `0930763b4b`, leaving the declaration unused.	2020-11-22 20:08:51 -08:00
Kazu Hirata	def7cfb7ff	[InstCombine] Use is_contained (NFC)	2020-11-21 15:47:11 -08:00
Alexey Bataev	0b420d674a	[SLP][NFC]Fix assert condition in newTreeEntry, NFC.	2020-11-20 13:25:21 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Jamie Schmeiser	7f6360cdc6	Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug. Summary: Expand existing loopsink testing to also test loopsinking using new pass manager. Enable memoryssa for loopsink with new pass manager. This combination exposed a bug that was previously fixed for loopsink without memoryssa. When sinking an instruction into a loop, the source block may not be part of the loop but still needs to be checked for pointer invalidation. This is the fix for bugzilla #39695 (PR 54659) expanded to also work with memoryssa. Respond to review comments. Enable Memory SSA in legacy Loop Sink pass under EnableMSSALoopDependency option control. Update tests accordingly. Respond to review comments. Add options controlling whether memoryssa is used for loop sink, defaulting to off. Expand testing based on these options. Respond to review comments. Properly indicated preserved analyses. This relanding addresses a compile-time performance problem by moving test for profile data earlier to avoid unnecessary computations. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90249	2020-11-20 10:26:33 -05:00
Arthur Eubanks	b77436047a	[PGO] Make -disable-preinline work with NPM Fixes cspgo_profile_summary.ll under NPM. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D91826	2020-11-19 22:58:55 -08:00
Arthur Eubanks	513d165b80	Port -lower-matrix-intrinsics-minimal to NPM This reuses the existing lower-matrix-intrinsics pass rather than going the legacy pass route of creating a new pass. Use this new variant in the NPM -O0 pipeline. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91811	2020-11-19 17:42:48 -08:00
Florian Hahn	7fa14a7c69	[ConstraintElimination] Decompose GEP with arbitrary offsets. This patch decomposes `GEP %x, %offset` as 0 + 1 * %x + 1 * %off.	2020-11-19 22:49:21 +00:00
Geoffrey Martin-Noble	b156514f8d	Remove unused private fields Unused since https://reviews.llvm.org/D91762 and triggering -Wunused-private-field ``` llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field] Constant GetArgTLS; ^ llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field] Constant GetRetvalTLS; ``` Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D91820	2020-11-19 13:54:54 -08:00
Roman Lebedev	a91e96702a	[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)` One less instruction and reducing use count of zext. As alive2 confirms, we're fine with all the weird combinations of undef elts in constants, but unless the shift amount was undef for a lane, we must sanitize undef mask to zero, since sign bits are no longer zeros. https://rise4fun.com/Alive/d7r ``` ---------------------------------------- Optimization: zz Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2)) %o0 = zext %x %o1 = shl %o0, C1 %r = and %o1, C2 => %n0 = sext %x %r = and %n0, C2 Done: 2016 Optimization is correct! ```	2020-11-20 00:31:27 +03:00
Jianzhou Zhao	6c1c308c0e	Remove deadcode from DFSanFunction::getTLS() clean more deadcode after D84704 Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91762	2020-11-19 21:10:37 +00:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Sander de Smalen	41c9f4c1ce	[LoopVectorize] NFC: Fix unused variable warning for MaxSafeDepDist rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused variable for MaxSafeDepDist (written but not used). It seems this variable and assignment can be safely removed.	2020-11-19 17:41:35 +00:00
Joseph Huber	da8bec47ab	[OpenMP] Add Location Fields to Libomptarget Runtime for Debugging Summary: Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values. Reviewers: jdoerfert grokos Differential Revision: https://reviews.llvm.org/D87946	2020-11-19 12:01:53 -05:00
Simon Moll	a1de391dae	[LV][NFC-ish] Allow vector widths over 256 elements The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91518	2020-11-19 10:58:29 +01:00
Max Kazantsev	515105f46b	[NFC] Remove comment (commited ahead of time by mistake)	2020-11-19 16:28:34 +07:00
Max Kazantsev	7c601d09a7	[NFC] Move code earlier as preparation for further changes	2020-11-19 16:27:23 +07:00
Andrew Wei	ea7ab5a42c	[IndVarSimplify] Notify top most loop to drop cached exit counts Some nested loops may share the same ExitingBB, so after we finishing FoldExit, we need to notify OuterLoop and SCEV to drop any stored trip count. Patched by: guopeilin Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91325	2020-11-19 15:37:54 +08:00
Kazu Hirata	43c0e4f665	[Transforms] Use llvm::is_contained (NFC)	2020-11-18 20:42:22 -08:00
Jamie Schmeiser	cff479b145	Revert "Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.""" This reverts commit `e29292969b`. This apparently causes a regression in compile time (ie, it slows down).	2020-11-18 16:07:16 -05:00
Roman Lebedev	7bf89c2174	[NFC][Reassociate] Delay checking isLoadCombineCandidate() until after ShouldConvertOrWithNoCommonBitsToAdd() but before haveNoCommonBitsSet() This appears to improve -O3 compile-time performance somewhat: https://llvm-compile-time-tracker.com/compare.php?from=87369c626114ae17f4c637635c119e6de0856a9a&to=c04b8271e1609b0dfb20609b40844b0c4324517e&stat=instructions It doesn't look like delaying it until after haveNoCommonBitsSet() is better: https://llvm-compile-time-tracker.com/compare.php?from=c04b8271e1609b0dfb20609b40844b0c4324517e&to=b2943d450eaf41b5f76d2dc7350f0a279f64cd99&stat=instructions	2020-11-18 23:57:12 +03:00
Jamie Schmeiser	e29292969b	Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."" This reverts commit `562addba65`. Reverted change too quickly, the failing test cases passed on the next build. So reverting revert (to include the changes).	2020-11-18 15:33:02 -05:00
Florian Hahn	2fead1ac61	[ConstraintElimination] Decompose add nuw/sub nuw. Make use of the more flexible constraint handling added in `a8a79c9069` to decompose add nuw/sub nuw.	2020-11-18 20:29:30 +00:00
Joseph Huber	97e55cfef5	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;" Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D89802	2020-11-18 15:28:39 -05:00
Nikita Popov	f4a3969bff	[Inline] Fix incorrectly dropped noalias metadata This is the same fix as `23aeadb89d`, just for CloneScopedAliasMetadata rather than PropagateCallSiteMetadata. In this case the previous outcome was incorrectly dropped metadata, as it was not part of the computed metadata map. The real change in the test is that the first load now retains metadata, the rest of the changes are due to changes in metadata numbering.	2020-11-18 21:22:50 +01:00
Jamie Schmeiser	562addba65	Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug." This reverts commit `d4ba28bddc`.	2020-11-18 15:17:53 -05:00
Nikita Popov	23aeadb89d	[Inline] Fix incorrect noalias metadata application (PR48209) The VMap also contains a mapping from Argument => Instruction, where the instruction is part of the original function, not the inlined one. The code was assuming that all the instructions in the VMap were inlined. This was a pre-existing problem for the loop access metadata, but was extended to the more common noalias metadata by `27f647d117`, thus causing miscompiles. There is a similar assumption inside CloneAliasScopeMetadata(), so that one likely needs to be fixed as well.	2020-11-18 20:52:58 +01:00
Jamie Schmeiser	d4ba28bddc	Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug. Summary: Expand existing loopsink testing to also test loopsinking using new pass manager. Enable memoryssa for loopsink with new pass manager. This combination exposed a bug that was previously fixed for loopsink without memoryssa. When sinking an instruction into a loop, the source block may not be part of the loop but still needs to be checked for pointer invalidation. This is the fix for bugzilla #39695 (PR 54659) expanded to also work with memoryssa. Respond to review comments. Enable Memory SSA in legacy Loop Sink pass under EnableMSSALoopDependency option control. Update tests accordingly. Respond to review comments. Add options controlling whether memoryssa is used for loop sink, defaulting to off. Expand testing based on these options. Respond to review comments. Properly indicated preserved analyses. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90249	2020-11-18 14:08:42 -05:00
Piotr Sobczak	b3b9be4ae7	SpeculativeExecution: Allow speculating more instruction types Support more instructions in SpeculativeExecution pass: - ExtractValue - InsertValue - Trunc - Freeze Differential Revision: https://reviews.llvm.org/D91688	2020-11-18 17:00:19 +01:00
Roman Lebedev	34ff90ad5d	[Reassociate] Don't convert add-like-or's into add's if they appear to be part of load-combining idiom As Wei Mi is reporting in post-commit review https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html teaching -reassociate about add-like-or's (`70472f3`) results in breaking apart load widening patterns, and reassociating them. For now, simply exclude any such `or` that appears to be a root of load widening idiom from the or->add transformation. Note that the heuristic is greedy, it doesn't ensure that loads can actually be widened into a single load.	2020-11-18 17:55:02 +03:00
Florian Hahn	a8a79c9069	[ConstraintElimination] Refactor constraint extraction (NFC). This patch generalizes the extraction of a constraint for a given condition. It allows decompose to return a vector of c * X pairs, which allows de-composing multiple instructions in the future. It also adds more clarifying comments.	2020-11-18 13:59:18 +00:00
Benjamin Kramer	4dbe12e866	[SLP] Use the minimum alignment of the load bundle when forming a masked.gather Instead of the first load. That works when vectorizing contiguous loads, but not for gathers. Fixes a miscompile introduced in `fcad8d3635`.	2020-11-18 12:53:39 +01:00
Max Kazantsev	f33118c61c	[IndVars] Support different types of ExitCount when optimizing exit conds In some cases we can handle IV and iter count of different types. It's a typical situation after IV have been widened. This patch adds support for such cases, when legal. Differential Revision: https://reviews.llvm.org/D88528 Reviewed By: skatkov	2020-11-18 18:20:05 +07:00
Piotr Sobczak	c173f1b8eb	SpeculativeExecution: Allow speculating more instruction types Support more instructions in SpeculativeExecution pass: - ExtractElement - InsertElement - ShuffleVector Differential Revision: https://reviews.llvm.org/D91633	2020-11-18 09:46:43 +01:00
Arthur Eubanks	9e3b4f4941	[JumpThreading] Make -print-lvi-after-jump-threading work with NPM	2020-11-17 23:15:20 -08:00
Arthur Eubanks	ee7d315cd9	[DCE] Always get TargetLibraryInfo I don't see any reason not to unconditionally retrieve TLI, it's fairly cheap. Fixes calls-errno.ll under NPM. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91476	2020-11-17 20:41:05 -08:00
Nick Desaulniers	f4c6080ab8	Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch" This reverts commit `b7926ce6d7`. Going with a simpler approach.	2020-11-17 17:27:14 -08:00
Sanjay Patel	08834979e3	[SLP] avoid unreachable code crash/infloop Example based on the post-commit comments for D88735.	2020-11-17 15:10:23 -05:00
Sanjay Patel	4a66a1d17a	[InstCombine] allow vectors for masked-add -> xor fold https://rise4fun.com/Alive/I4Ge Name: add with pow2 mask Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %n = and i8 %x, C2 %r = xor i8 %n, C2	2020-11-17 13:36:08 -05:00
Simon Pilgrim	f7ebdec987	[InstCombine] visitAnd - remove unnecessary Value X, Y shadow variables. NFCI. Fixes a number of Wshadow warnings.	2020-11-17 17:59:21 +00:00
Simon Pilgrim	abf29d9862	[InstCombine] visitAnd - use m_SpecificInt instead of m_APInt + comparison. NFCI. m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold. Noticed while fixing a Wshadow warning about shadow Value X, Y variables.	2020-11-17 17:37:10 +00:00
Sanjay Patel	f791ad7e1e	[InstCombine] remove scalar constraint for mask-of-add fold https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2	2020-11-17 12:13:45 -05:00
Sanjay Patel	433696911a	[InstCombine] relax constraints on mask-of-add There are 2 changes: 1. Remove the unnecessary one-use check. 2. Remove the unnecessary power-of-2 check. https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2	2020-11-17 12:13:44 -05:00
Florian Hahn	52f3714dae	[VPlan] Add VPDef class. This patch introduces a new VPDef class, which can be used to manage VPValues defined by recipes/VPInstructions. The idea here is to mirror VPUser for values defined by a recipe. A VPDef can produce either zero (e.g. a store recipe), one (most recipes) or multiple (VPInterleaveRecipe) result VPValues. To traverse the def-use chain from a VPDef to its users, one has to traverse the users of all values defined by a VPDef. VPValues now contain a pointer to their corresponding VPDef, if one exists. To traverse the def-use chain upwards from a VPValue, we first need to check if the VPValue is defined by a VPDef. If it does not have a VPDef, this means we have a VPValue that is not directly defined iniside the plan and we are done. If we have a VPDef, it is defined inside the region by a recipe, which is a VPUser, and the upwards def-use chain traversal continues by traversing all its operands. Note that we need to add an additional field to to VPVAlue to link them to their defs. The space increase is going to be offset by being able to remove the SubclassID field in future patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D90558	2020-11-17 16:18:11 +00:00
Matt Arsenault	c5ce6036c1	Linker: Fix linking of byref types This wasn't properly remapping the type like with the other attributes, so this would end up hitting a verifier error after linking different modules using byref.	2020-11-17 11:02:04 -05:00
Anton Afanasyev	0a1d315f9f	[SLPVectorizer] Fix assert	2020-11-17 18:46:31 +03:00
Anton Afanasyev	fcad8d3635	[SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic For the scattered operands of load instructions it makes sense to use gathering load intrinsic, which can lower to native instruction for X86/AVX512 and ARM/SVE. This also enables building vectorization tree with entries containing scattered operands. The next step is to add scattered store. Fixes PR47629 and PR47623 Differential Revision: https://reviews.llvm.org/D90445	2020-11-17 18:11:45 +03:00
Florian Hahn	13042da5cb	[ConstraintElimination] Add support for And. When processing conditional branches, if the condition is an AND of 2 compares and the true successor only has the current block as predecessor, queue both conditions for the true successor.	2020-11-17 14:12:15 +00:00
Sander de Smalen	f571fe6df5	Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This relands https://reviews.llvm.org/D91059 and reverts commit `30fded75b4`. GetRegUsage now returns 0 when Ty is not a valid vector element type.	2020-11-17 13:45:10 +00:00
Yevgeny Rouban	a57fe210ff	[JumpThreading] Fix branch probabilities in DuplicateCondBranchOnPHIIntoPred() When instructions are cloned from block BB to PredBB in the method DuplicateCondBranchOnPHIIntoPred() number of successors of PredBB changes from 1 to number of successors of BB. So we have to copy branch probabilities from BB to PredBB. Reviewed By: Kazu Hirata Differential Revision: https://reviews.llvm.org/D90841	2020-11-17 14:40:50 +07:00
Max Kazantsev	63dd1734b2	[NFC] Collect ext users into vector instead of finding them twice	2020-11-17 14:01:43 +07:00
Kazu Hirata	1da60f1d44	[Transforms] Use pred_empty (NFC)	2020-11-16 22:09:14 -08:00
Kazu Hirata	5935952c31	[SanitizerCoverage] Use [&] for lambdas (NFC)	2020-11-16 21:45:21 -08:00
Arthur Eubanks	7de6dcd246	[Debugify] Skip debugifying on special/immutable passes With a function pass manager, it would insert debuginfo metadata before getting to function passes while processing the pass manager, causing debugify to skip while running the function passes. Skip special passes + verifier + printing passes. Compared to the legacy implementation of -debugify-each, this additionally skips verifier passes. Probably no need to update the legacy version since it will be obsolete soon. This fixes 2 instcombine tests using -debugify-each under NPM. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D91558	2020-11-16 20:39:46 -08:00
Sjoerd Meijer	fa5cb4b936	[LoopFlatten] Disable IV widening Disable widening of the IV in LoopFlatten while I investigate an assertion failures. Please note that the pass is also not yet enabled by default.	2020-11-16 22:30:52 +00:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Florian Hahn	5a4ca8b550	[ConstraintElimination] Add support for Or. When processing conditional branches, if the condition is an OR of 2 compares and the false successor only has the current block as predecessor, queue both negated conditions for the false successor	2020-11-16 21:48:38 +00:00
Philip Reames	2240d3d054	[LoopVec] Introduce an api for detecting uniform memory ops Split off D91398 at request of reviewer.	2020-11-16 13:30:48 -08:00
Arnold Schwaighofer	d861cc0e43	[coro] Async coroutines: Make sure we can handle control flow in suspend point dispatch function Create a valid basic block with a terminator before we call InlineFunction. Differential Revision: https://reviews.llvm.org/D91547	2020-11-16 11:59:02 -08:00
Sanjay Patel	4e68bc0999	Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask" This reverts commit `e56103d250`. There is a stage2 msan failure blamed on this commit: http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio	2020-11-16 14:48:09 -05:00
Arthur Eubanks	aeb0fdff35	[SimplifyCFG] Respect optforfuzzing in NPM pass Regression caused by refactoring in `cdd006eec9`. See discussion in https://reviews.llvm.org/D89917. Reviewed By: arsenm, morehouse Differential Revision: https://reviews.llvm.org/D91473	2020-11-16 09:56:37 -08:00
Xun Li	985c524001	[Coroutine] Allocas used by StoreInst does not always escape In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped. This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away. These used should be handled without conservatively marking them escaped. This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still consider the original alloca not escaping and keep it on the stack instead of putting it on the frame. Differential Revision: https://reviews.llvm.org/D91305	2020-11-16 09:14:44 -08:00
Florian Hahn	8dbe44cb29	Add pass to add !annotate metadata from @llvm.global.annotations. This patch adds a new pass to add !annotation metadata for entries in @llvm.global.anotations, which is generated using __attribute__((annotate("_name"))) on functions in Clang. This has been discussed on llvm-dev as part of RFC: Combining Annotation Metadata and Remarks http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D91195	2020-11-16 14:57:11 +00:00
Benjamin Kramer	2e7455f00a	[LoopFlatten] Fold variable into assert. NFC.	2020-11-16 11:51:39 +01:00
Sjoerd Meijer	9aa773381b	[LoopFlatten] Widen the IV Widen the IV to the widest available and legal integer type, which makes this transformations always safe so that we can skip overflow checks. Motivation is to let this pass trigger on 64-bit targets too, and this is the last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just before IndVarSimplify so that IVs are not already widened, D90421 factors out widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use it in LoopFlatten. Differential Revision: https://reviews.llvm.org/D90640	2020-11-16 10:20:13 +00:00
Max Kazantsev	b4624f65cf	Recommit "[NFC] Move code between functions as a preparation step for further improvement" The bug should be fixed now.	2020-11-16 14:30:34 +07:00
Kazu Hirata	147ccc848a	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. This patch is one of the steps before we can safely re-apply D91017. Differential Revision: https://reviews.llvm.org/D91511	2020-11-15 22:29:30 -08:00
Kazu Hirata	0888eaf3fd	[Loop Fusion] Use pred_empty and succ_empty (NFC)	2020-11-15 20:32:57 -08:00
Kazu Hirata	0c03d1328c	[ADCE] Use succ_empty (NFC)	2020-11-15 19:52:59 -08:00
Kazu Hirata	43a6a1e928	[TRE] Use successors(BB) (NFC)	2020-11-15 19:12:49 -08:00
Kazu Hirata	918e3439e2	[SanitizerCoverage] Use llvm::all_of (NFC)	2020-11-15 19:01:20 -08:00
Serguei Katkov	400f6edce7	[IRCE] Use the same min runtime iteration threshold for BPI and BFI checks In the last change to IRCE the BPI is ignored if BFI is present, however BFI and BPI have a different thresholds. Specifically BPI approach checks only latch exit probability so it is expected if the loop has only one exit block (latch) the behavior with BFI and BPI should be the same, BPI approach by default uses threshold 10, so it considers the loop with estimated number of iterations less then 10 should not be considered for IRCE optimization. BFI approach uses the default value 3 and this is inconsistent. The CL modifies the code to use the same threshold for both approaches.. The test is updated due to it has two side-exits (except latch) and each of them has a probability 1/16, so BFI estimates the number of runtime iteration is about to 7 (1/16 + 1/16 + some for latch) and test fails. Reviewers: mkazantsev, ebrevnov Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D91230	2020-11-16 09:21:50 +07:00
Sanjay Patel	6ddc237766	[InstCombine] reduce code for flip of masked bit; NFC There are 1-2 potential follow-up NFC commits to reduce this further on the way to generalizing this for vectors. The operand replacing path should be dead code because demanded bits handles that more generally (D91415).	2020-11-15 15:43:34 -05:00
Sanjay Patel	e56103d250	[InstCombine] add multi-use demanded bits fold for add with low-bit mask I noticed an add example like the one from D91343, so here's a similar patch. The logic is based on existing code for the single-use demanded bits fold. But I only matched a constant instead of using compute known bits on the operands because that was the motivating patterni that I noticed. I think this will allow removing a special-case (but incomplete) dedicated fold within visitAnd(), but I need to untangle the existing code to be sure. https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2 Differential Revision: https://reviews.llvm.org/D91415	2020-11-15 15:09:49 -05:00
Florian Hahn	0c119ba8a8	[VPlan] Use VPValue def for VPWidenGEPRecipe. This patch turns VPWidenGEPRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84683	2020-11-15 15:12:47 +00:00
Arthur Eubanks	6e04da0a5a	[DCE] Port -redundant-dbg-inst-elim to NPM This is used to test RemoveRedundantDbgInstrs(), which is used by other passes. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D91477	2020-11-14 16:55:20 -08:00
Florian Hahn	a70b511e78	Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts the revert commit `c8d73d939f`. It includes a fix for cases where we missed inserting VPValues for some selects, which should fix PR48142.	2020-11-14 20:00:25 +00:00
Arnold Schwaighofer	8fb73cecfd	[Coroutines] Make sure that async coroutine context size is a multiple of the alignment requirement This simplifies the code the allocator has to executed Differential Revision: https://reviews.llvm.org/D91471	2020-11-14 04:56:56 -08:00
Roman Lebedev	6861d938e5	Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485 the implementation is known-broken for certain inputs, the bugreport was up for a significant amount of timer, and there has been no activity to address it. Therefore, just completely rip out all of misexpect handling. I suspect, fixing it requires redesigning the internals of MD_misexpect. Should anyone commit to fixing the implementation problem, starting from clean slate may be better anyways. This reverts commit `7bdad08429`, and some of it's follow-ups, that don't stand on their own.	2020-11-14 13:12:38 +03:00
Akira Hatanaka	2ed3a76745	[ObjC][ARC] Add and use a function which finds and returns the single dependency. NFC Use findSingleDependency in place of FindDependencies and stop passing a set of Instructions around. Modify FindDependencies to return a boolean flag which indicates whether the dependencies it has found are all valid.	2020-11-13 14:02:58 -08:00
Akira Hatanaka	00d0974e62	Move variable declarations to functions in which they are used. NFC	2020-11-13 14:02:58 -08:00
Guozhi Wei	a20220d25b	[AlwaysInliner] Call mergeAttributesForInlining after inlining Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining should be called to merge callee's attributes to caller. But it is not called in AlwaysInliner, causes caller's attributes inconsistent with inlined code. Attached test case demonstrates that attribute "min-legal-vector-width"="512" is not merged into caller without this patch, and it causes failure in SelectionDAG when lowering the inlined AVX512 intrinsic. Differential Revision: https://reviews.llvm.org/D91446	2020-11-13 12:01:35 -08:00
Jianzhou Zhao	06c9b4aaa9	Extend the dfsan store/load callback with write/read address This helped debugging. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91236	2020-11-13 19:46:32 +00:00
Nikita Popov	02dda1c659	[Local] Clean up EmitGEPOffset Handle the emission of the add in a single place, instead of three different ones. Don't emit an unnecessary add with zero to start with. It will get dropped by InstCombine, but we may as well not create it in the first place. This also means that InstCombine does not need to specially handle this extra add. This is conceptually NFC, but can affect worklist order etc.	2020-11-13 18:30:56 +01:00
David Zarzycki	5a327f3337	Revert "[NFC] Move code between functions as a preparation step for further improvement" This reverts commit `08016ac32b`. A bunch of tests are failing my local two stage builder.	2020-11-13 10:52:49 -05:00
serge-sans-paille	95537f4508	llvmbuildectomy - compatibility with ocaml bindings Use exact component name in add_ocaml_library. Make expand_topologically compatible with new architecture. Fix quoting in is_llvm_target_library. Fix LLVMipo component name. Write release note.	2020-11-13 14:35:52 +01:00
Florian Hahn	8bb6347939	Add !annotation metadata and remarks pass. This patch adds a new !annotation metadata kind which can be used to attach annotation strings to instructions. It also adds a new pass that emits summary remarks per function with the counts for each annotation kind. The intended uses cases for this new metadata is annotating 'interesting' instructions and the remarks should provide additional insight into transformations applied to a program. To motivate this, consider these specific questions we would like to get answered: * How many stores added for automatic variable initialization remain after optimizations? Where are they? * How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated? Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) Reviewed By: thegameg, jdoerfert Differential Revision: https://reviews.llvm.org/D91188	2020-11-13 13:24:10 +00:00
Max Kazantsev	08016ac32b	[NFC] Move code between functions as a preparation step for further improvement	2020-11-13 18:12:45 +07:00
Max Kazantsev	185cface2e	[NFC] Refactor lambda into static function	2020-11-13 17:42:23 +07:00
Max Kazantsev	68490aec4e	[NFC] Move lambdae into static functions	2020-11-13 17:07:25 +07:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Max Kazantsev	9224d322a2	[IndVars] Fix branches exiting by true with invariant conditions Forgot to invert the condition for them.	2020-11-13 15:52:00 +07:00
Max Kazantsev	0a1d394bf3	[NFC] Refactor loop-invariant getters to return Optional	2020-11-13 15:03:10 +07:00
Arthur Eubanks	b9406121a0	[NFC] Removed unused variable Obsolete as of https://reviews.llvm.org/D91046.	2020-11-12 22:24:57 -08:00
Akira Hatanaka	09266e4af0	[ObjC][ARC] Clear the lists of basic blocks and instructions before continuing the loop This fixes a bug introduced in `c6f1713c46`.	2020-11-12 22:20:02 -08:00
Max Kazantsev	77efb73c67	[IndVars] Replace checks with invariants if we cannot remove them If we cannot prove that the check is trivially true, but can prove that it either fails on the 1st iteration or never fails, we can replace it with first iteration check. Differential Revision: https://reviews.llvm.org/D88527 Reviewed By: skatkov	2020-11-13 12:23:12 +07:00
Sanjay Patel	0abde4bc92	[InstCombine] fold sub of low-bit masked value from offset of same value There might be some demanded/known bits way to generalize this, but I'm not seeing it right now. This came up as a regression when I was looking at a different demanded bits improvement. https://rise4fun.com/Alive/5fl Name: general Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0 %a1 = add i8 %x, C1 %a2 = and i8 %x, C2 %r = sub i8 %a1, %a2 => %r = and i8 %a1, ~C2 Name: test 1 %a1 = add i8 %x, 192 %a2 = and i8 %x, 10 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -11 Name: test 2 %a1 = add i8 %x, -108 %a2 = and i8 %x, 3 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -4	2020-11-12 20:10:28 -05:00
Jianzhou Zhao	2d96859ea6	[msan] Break the getShadow loop after matching an argument Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D91320	2020-11-12 19:48:59 +00:00
Alexander Kornienko	76b6cb515b	Fix unused variable warning in release builds	2020-11-12 18:14:06 +01:00
Jamie Schmeiser	f79b483385	[NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals Summary: Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other classes to construct flags with meaningful defaults while not exposing LICM internal details. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90482	2020-11-12 15:06:59 +00:00
Xun Li	94a45a8098	Revert "[Coroutine] Allocas used by StoreInst does not always escape" This reverts commit `8bc7b9278e`, which landed by accident.	2020-11-11 21:09:39 -08:00
Max Kazantsev	d6dd938589	[IndVars] IV user should not prevent use widening Sometimes the an instruction we are trying to widen is used by the IV (which means the instruction is the IV increment). Currently this may prevent its widening. We should ignore such user because it will be dead once the transform is done anyways. Differential Revision: https://reviews.llvm.org/D90920 Reviewed By: fhahn	2020-11-12 12:02:01 +07:00
Xun Li	8bc7b9278e	[Coroutine] Allocas used by StoreInst does not always escape In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped. This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away. These used should be handled without conservatively marking them escaped. This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still consider the original alloca not escaping and keep it on the stack instead of putting it on the frame. Differential Revision: https://reviews.llvm.org/D91305	2020-11-11 20:53:51 -08:00
Max Kazantsev	2e01ceafaa	[IndVars] Recognize 'sub nuw' expressed as 'add' for widening InstCombine canonicalizes 'sub nuw' instructions to 'add' without the `nuw` flag. The typical case where we see it is decrementing induction variables. For them, IndVars fails to prove that it's legal to widen them, and inserts unprofitable `zext`'s. This patch adds recognition of such pattern using SCEV. Differential Revision: https://reviews.llvm.org/D89550 Reviewed By: fhahn, skatkov	2020-11-12 10:51:29 +07:00
Arnold Schwaighofer	431337662e	[coro] Async coroutines: Allow more than 3 arguments in the dispatch function We need to be able to call function pointers. Inline the dispatch function. Also inline the context projection function. Transfer debug locations from the suspend point to the inlined functions. Use the function argument index instead of the function argument in coro.id.async. This solves any spurious use issues. Coerce the arguments of the tail call function at a suspend point. The LLVM optimizer seems to drop casts leading to a vararg intrinsic. rdar://70097093 Differential Revision: https://reviews.llvm.org/D91098	2020-11-11 15:25:28 -08:00
Arthur Eubanks	d9cbceb041	[CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass Previously the inliner did a bit of a hack by adding ref edges for all new edges introduced by performing an inline before calling updateCGAndAnalysisManagerForPass(). This was because updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call edges. This adds handling of non-trivial call edges to updateCGAndAnalysisManagerForPass(). The inliner called updateCGAndAnalysisManagerForFunctionPass() since it was handling adding newly introduced edges (so updateCGAndAnalysisManagerForPass() would only have to handle promotion), but now it needs to call updateCGAndAnalysisManagerForCGSCCPass() since updateCGAndAnalysisManagerForPass() is now handling the new call edges and function passes cannot add new edges. We follow the previous path of adding trivial ref edges then letting promotion handle changing the ref edges to call edges and the CGSCC updates. So this still does not allow adding call edges that result in an addition of a non-trivial ref edge. This is in preparation for better detecting devirtualization. Previously since the inliner itself would add ref edges, updateCGAndAnalysisManagerForPass() would think that promotion and thus devirtualization had happened after any sort of inlining. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91046	2020-11-11 13:43:49 -08:00
Jianzhou Zhao	0dd87825db	Add a flag to control whether to propagate labels from condition values to results Before the change, DFSan always does the propagation. W/o origin tracking, it is harder to understand such flows. After the change, the flag is off by default. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91234	2020-11-11 20:41:42 +00:00
Akira Hatanaka	5e85d00ed6	Move variable declarations to functions in which they are used. NFC	2020-11-11 10:58:43 -08:00
Sander de Smalen	30fded75b4	Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost." This reverts commits: * [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. `b873aba394`. * [LoopVectorizer] Silence warning in GetRegUsage. `9ff701100a`.	2020-11-11 14:41:55 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Sander de Smalen	9ff701100a	[LoopVectorizer] Silence warning in GetRegUsage. This patch silences the warning: error: lambda capture 'DL' is not used [-Werror,-Wunused-lambda-capture] auto GetRegUsage = [&DL, &TTI=TTI](Type *Ty, ElementCount VF) { ~^~~ 1 error generated. Introduced in: https://reviews.llvm.org/rGb873aba3943c067a5efd5303cbdf5aeb0732cf88	2020-11-11 10:54:20 +00:00
Sander de Smalen	b873aba394	[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This is more accurate than dividing the bitwidth based on the element count by the maximum register size, as it can just reuse whatever has been calculated for legalization of these types. This change is also necessary when calculating register usage for scalable vectors, where the legalization of these types cannot be done based on the widest register size, because that does not take the 'vscale' component into account. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91059	2020-11-11 10:18:50 +00:00
Sander de Smalen	0141f5a49d	[LoopVectorizer] NFC: Return ElementCount from compute[Feasible]MaxVF Interfaces changed to return `ElementCount`: * LoopVectorizationCostModel::computeMaxVF * LoopVectorizationCostModel::computeFeasibleMaxVF This is NFC for fixed-width vectors. Reviewed By: dmgreen, ctetreau Differential Revision: https://reviews.llvm.org/D90880	2020-11-11 09:55:06 +00:00
Chen Zheng	4eb8359e74	[EarlyCSE] delete abs/nabs handling delete abs/nabs handling in earlycse pass to avoid bugs related to hashing values. After abs/nabs is canonicalized to intrinsics in D87188, we should get CSE ability for abs/nabs back. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D90734	2020-11-10 21:10:58 -05:00
Florian Hahn	c8d73d939f	Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts commit `a8e50f1c6e`. This reportedly breaks building the Linux kernel. https://bugs.llvm.org/show_bug.cgi?id=48142	2020-11-10 22:50:46 +00:00
Bruno Cardoso Lopes	dc14542a71	[Coroutines] Add missing llvm.dbg.declare's to cover for more allocas Tracking local variables across suspend points is still somewhat incomplete. Consider this coroutine snippet: ``` resumable foo() { int x[10] = {}; int a = 3; co_await std::experimental::suspend_always(); a++; x[0] = 1; a += 2; x[1] = 2; a += 3; x[2] = 3; } ``` Can't manage to print `a` or `x` if they turn out to be allocas during CoroSplit (which happens if you build this code with `-O0` prior to this commit): ``` * thread #1, queue = 'com.apple.main-thread', stop reason = step over frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5 40 co_await std::experimental::suspend_always(); 41 a++; 42 x[0] = 1; -> 43 a += 2; 44 x[1] = 2; 45 a += 3; 46 x[2] = 3; (lldb) p x error: <user expression 21>:1:1: use of undeclared identifier 'x' x ^ ``` The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of `x` uses and we lose debugging quality. Add `llvm.dbg.value`s to all relevant basic blocks such that if later transformations break the dominance the reliable debug info is already in place. For instance, this BB: ``` await.ready: ... %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760 ... %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763 ... %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766 ``` becomes: ``` await.ready: ... call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753 ... %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760 ... %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763 ... %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766 ``` Differential Revision: https://reviews.llvm.org/D90772	2020-11-10 12:36:07 -08:00
Sjoerd Meijer	2ef47910d5	[LoopFlatten] Run it earlier, just before IndVarSimplify This is a prep step for widening induction variables in LoopFlatten if this is posssible (D90640), to avoid having to perform certain overflow checks. Since IndVarSimplify may already widen induction variables, we want to run LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both passes were already close after each other. Differential Revision: https://reviews.llvm.org/D90402	2020-11-10 20:22:41 +00:00
Sjoerd Meijer	706ead0e87	[LoopFlatten] Make it a FunctionPass This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't run into problems of a loop pass deleting a (inner)loop. Differential Revision: https://reviews.llvm.org/D90940	2020-11-10 20:03:31 +00:00
Florian Hahn	a8e50f1c6e	[VPlan] Use VPValue def for VPWidenSelectRecipe. This patch turns VPWidenSelectRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84682	2020-11-10 19:39:37 +00:00
Jonas Paulsson	89a1042b6a	Make inferLibFuncAttributes() add SExt attribute on second arg to ldexp. This was missing as discovered by the SystemZ multistage bot: http://lab.llvm.org:8011/#/builders/8, where wrong code resulted when this extension was not performed. Thanks for review by Ulrich Weigand and Roman Lebedev. Differential Revision: https://reviews.llvm.org/D90760	2020-11-10 18:32:15 +01:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
Sanne Wouda	dd03881bd5	Add loop distribution to the LTO pipeline The LoopDistribute pass is missing from the LTO pipeline, so -enable-loop-distribute has no effect during post-link. The pre-link loop distribution doesn't seem to survive the LTO pipeline either. With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43% uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is unaffected. Differential Revision: https://reviews.llvm.org/D89896	2020-11-10 12:04:32 +00:00
Sander de Smalen	f47573f9bf	[LoopVectorizer] NFC: Propagate ElementCount to more interfaces. Interfaces changed to take `ElementCount` as parameters: * LoopVectorizationPlanner::buildVPlans * LoopVectorizationPlanner::buildVPlansWithVPRecipes * LoopVectorizationCostModel::selectVectorizationFactor This patch is NFC for fixed-width vectors. Reviewed By: dmgreen, ctetreau Differential Revision: https://reviews.llvm.org/D90879	2020-11-10 11:11:02 +00:00
Max Kazantsev	25755a0159	[NFC] Add flag to disable IV widening in indvar instance This allows us to have control over IV widening in the pipeline.	2020-11-10 15:10:44 +07:00
Arthur Eubanks	1cbf8e89b5	[NewPM] Port -separate-const-offset-from-gep Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91095	2020-11-09 17:42:36 -08:00
Michael Kruse	e5dba2d7e5	[OMPIRBuilder] Start 'Create' methods with lower case. NFC. For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews. This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers. I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller. Reviewed By: mehdi_amini, fghanim Differential Revision: https://reviews.llvm.org/D91109	2020-11-09 19:35:11 -06:00
Xun Li	c2cb093d9b	[Coroutine] Move all used local allocas to the .resume function Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function. However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function. When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used. This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry. Differential Revision: https://reviews.llvm.org/D90977	2020-11-09 17:24:49 -08:00
Sjoerd Meijer	e2dcea4489	[LoopFlatten] FlattenInfo bookkeeping. NFC. Introduce struct FlattenInfo to group some of the bookkeeping. Besides this being a bit of a clean-up, it is a prep step for next additions (D90640). I could take things a bit further, but thought this was a good first step also not to make this change too large. Differential Revision: https://reviews.llvm.org/D90408	2020-11-09 14:50:26 +00:00
Florian Hahn	f0d76275cb	[VPlan] Print result value for loads in VPWidenMemoryInst (NFC). For loads, print the result value.	2020-11-09 14:01:29 +00:00
Florian Hahn	537829f2a7	[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC). Move logic to check if the recipe is a store to a helper for easier reuse.	2020-11-09 14:01:29 +00:00
Florian Hahn	fec64de261	[VPlan] Use VPValue def for VPWidenCall. This patch turns VPWidenCall into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84681	2020-11-09 13:29:41 +00:00
Florian Hahn	091c5c9a18	[VPlan] Add printOperands helper to VPUser (NFC). Factor out the code for printing operands of a VPUser so it can be re-used when printing other recipes.	2020-11-09 12:30:57 +00:00
LemonBoy	42732d33cc	[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated. This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D89628	2020-11-09 14:41:07 +03:00
Tim Northover	f7fe7ea24d	[MergeFunctions] fix function attribute comparison in FunctionComparator The comparison of AttributeSets stopped after seeing a matching type attribute. Subsequent mismatching attributes were not detected causing a crash.	2020-11-09 09:19:11 +00:00
Simon Pilgrim	b11eaf5617	[DSE] Don't dereference a dyn_cast<> result - use cast<> instead. NFCI. We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.	2020-11-08 13:07:45 +00:00
Simon Pilgrim	0fe91ad463	[InstCombine] foldSelectFunnelShift - block poison in funnel shift value As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.	2020-11-08 12:58:30 +00:00
Florian Hahn	e8dc17a2b7	[LoopInterchange] Skip non SCEV-able operands in cost function. This fixes a crash when trying to get a SCEV expression for operands that are not SCEV-able.	2020-11-08 11:41:19 +00:00
Pedro Tammela	5e8ecff0d8	[Reg2Mem] add support for the new pass manager This patch refactors the pass to accomodate the new pass manager boilerplate. Differential Revision: https://reviews.llvm.org/D91005	2020-11-08 11:14:05 +00:00
Kazu Hirata	75e46c6328	[Mem2Reg] Use llvm::count instead of std::count (NFC)	2020-11-07 20:18:47 -08:00
Kazu Hirata	c95fff5be7	[JumpThreading] Fix function names (NFC)	2020-11-07 19:35:03 -08:00
Atmn Patel	04a0896487	Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops" This reverts commit `0b17c6e447`. This patch causes a compile-time error in SCEV.	2020-11-07 00:32:12 -05:00
Atmn Patel	0b17c6e447	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2020-11-06 22:06:58 -05:00
Atmn Patel	babc224c5d	[LoopDeletion] Remove dead loops with no exit blocks Currently, LoopDeletion refuses to remove dead loops with no exit blocks because it cannot statically determine the control flow after it removes the block. This leads to miscompiles if the loop is an infinite loop and should've been removed. Differential Revision: https://reviews.llvm.org/D90115	2020-11-06 17:08:34 -05:00
Quentin Colombet	a585228027	Prevent LICM and machineLICM from hoisting convergent operations Results of convergent operations are implicitly affected by the enclosing control flows and should not be hoisted out of arbitrary loops. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D90361	2020-11-06 10:26:39 -08:00
Arnold Schwaighofer	c6543cc6b8	llvm.coro.id.async lowering: Parameterize how-to restore the current's continutation context and restart the pipeline after splitting The `llvm.coro.suspend.async` intrinsic takes a function pointer as its argument that describes how-to restore the current continuation's context from the context argument of the continuation function. Before we assumed that the current context can be restored by loading from the context arguments first pointer field (`first_arg->caller_context`). This allows for defining suspension points that reuse the current context for example. Also: llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic Blocks inlining until after the async coroutine was split. Also, change the async function pointer's context size position struct async_function_pointer { uint32_t relative_function_pointer_to_async_impl; uint32_t context_size; } And make the position of the `async context` argument configurable. The position is specified by the `llvm.coro.id.async` intrinsic. rdar://70097093 Differential Revision: https://reviews.llvm.org/D90783	2020-11-06 06:22:46 -08:00
Florian Hahn	d8d1cc647d	[SLP] Also try to vectorize incoming values of PHIs . Currently we do not consider incoming values of PHIs as roots for SLP vectorization. This means we miss scenarios like the one in the test case and PR47670. It appears quite straight-forward to consider incoming values of PHIs as roots for vectorization, but I might be missing something that makes this problematic. In terms of vectorized instructions, this applies to quite a few benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto Same hash: 185 (filtered out) Remaining: 52 Metric: SLP.NumVectorInstructions Program base patch diff test-suite...ProxyApps-C++/HPCCG/HPCCG.test 9.00 27.00 200.0% test-suite...C/CFP2000/179.art/179.art.test 8.00 22.00 175.0% test-suite...T2006/458.sjeng/458.sjeng.test 14.00 30.00 114.3% test-suite...ce/Benchmarks/PAQ8p/paq8p.test 11.00 18.00 63.6% test-suite...s/FreeBench/neural/neural.test 12.00 18.00 50.0% test-suite...rimaran/enc-3des/enc-3des.test 65.00 95.00 46.2% test-suite...006/450.soplex/450.soplex.test 63.00 89.00 41.3% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 177.00 250.00 41.2% test-suite...nchmarks/McCat/18-imp/imp.test 13.00 18.00 38.5% test-suite.../Applications/sgefa/sgefa.test 26.00 35.00 34.6% test-suite...pplications/oggenc/oggenc.test 100.00 133.00 33.0% test-suite...6/482.sphinx3/482.sphinx3.test 103.00 134.00 30.1% test-suite...oxyApps-C++/miniFE/miniFE.test 169.00 213.00 26.0% test-suite.../Benchmarks/Olden/tsp/tsp.test 59.00 73.00 23.7% test-suite...TimberWolfMC/timberwolfmc.test 503.00 622.00 23.7% test-suite...T2006/456.hmmer/456.hmmer.test 65.00 79.00 21.5% test-suite...libquantum/462.libquantum.test 58.00 68.00 17.2% test-suite...ternal/HMMER/hmmcalibrate.test 84.00 98.00 16.7% test-suite...ications/JM/ldecod/ldecod.test 351.00 401.00 14.2% test-suite...arks/VersaBench/dbms/dbms.test 52.00 57.00 9.6% test-suite...ce/Benchmarks/Olden/bh/bh.test 118.00 128.00 8.5% test-suite.../Benchmarks/Bullet/bullet.test 6355.00 6880.00 8.3% test-suite...nsumer-lame/consumer-lame.test 480.00 519.00 8.1% test-suite...000/183.equake/183.equake.test 226.00 244.00 8.0% test-suite...chmarks/Olden/power/power.test 105.00 113.00 7.6% test-suite...6/471.omnetpp/471.omnetpp.test 92.00 99.00 7.6% test-suite...ications/JM/lencod/lencod.test 1173.00 1261.00 7.5% test-suite...0/253.perlbmk/253.perlbmk.test 55.00 59.00 7.3% test-suite...oxyApps-C/miniAMR/miniAMR.test 92.00 98.00 6.5% test-suite...chmarks/MallocBench/gs/gs.test 446.00 473.00 6.1% test-suite.../CINT2006/403.gcc/403.gcc.test 464.00 491.00 5.8% test-suite...6/464.h264ref/464.h264ref.test 998.00 1055.00 5.7% test-suite...006/453.povray/453.povray.test 5711.00 6007.00 5.2% test-suite...FreeBench/distray/distray.test 102.00 107.00 4.9% test-suite...:: External/Povray/povray.test 4184.00 4378.00 4.6% test-suite...DOE-ProxyApps-C/CoMD/CoMD.test 112.00 117.00 4.5% test-suite...T2006/445.gobmk/445.gobmk.test 104.00 108.00 3.8% test-suite...CI_Purple/SMG2000/smg2000.test 789.00 819.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 233.00 241.00 3.4% test-suite...marks/7zip/7zip-benchmark.test 417.00 428.00 2.6% test-suite...arks/mafft/pairlocalalign.test 627.00 643.00 2.6% test-suite.../Benchmarks/nbench/nbench.test 259.00 265.00 2.3% test-suite...006/447.dealII/447.dealII.test 4641.00 4732.00 2.0% test-suite...lications/ClamAV/clamscan.test 106.00 108.00 1.9% test-suite...CFP2000/177.mesa/177.mesa.test 1639.00 1664.00 1.5% test-suite...oxyApps-C/RSBench/rsbench.test 66.00 65.00 -1.5% test-suite.../CINT2000/252.eon/252.eon.test 3416.00 3444.00 0.8% test-suite...CFP2000/188.ammp/188.ammp.test 1846.00 1861.00 0.8% test-suite.../CINT2000/176.gcc/176.gcc.test 152.00 153.00 0.7% test-suite...CFP2006/444.namd/444.namd.test 3528.00 3544.00 0.5% test-suite...T2006/473.astar/473.astar.test 98.00 98.00 0.0% test-suite...frame_layout/frame_layout.test NaN 39.00 nan% On ARM64, there appears to be a slight regression on SPEC2006, which might be interesting to investigate: test-suite...T2006/473.astar/473.astar.test 0.9% Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D88735	2020-11-06 12:50:32 +00:00
Sander de Smalen	4a3bb9ea6c	[VPlan] NFC: Change VFRange to take ElementCount This patch changes the type of Start, End in VFRange to be an ElementCount instead of `unsigned`. This is done as preparation to make VPlans for scalable vectors, but is otherwise NFC. Reviewed By: dmgreen, fhahn, vkmr Differential Revision: https://reviews.llvm.org/D90715	2020-11-06 09:50:20 +00:00
Roman Lebedev	8d0fdd36a3	[IR] CmpInst: Add getFlippedSignednessPredicate() And refactor a few places to use it	2020-11-06 11:31:09 +03:00
Giorgis Georgakoudis	700d2417d8	[CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers CodeExtractor handles bitcasts in the extracted region that have lifetime markers users in the outer region as outputs. That creates unnecessary alloca/reload instructions and extra lifetime markers. The patch identifies those cases, and replaces uses in out-of-region lifetime markers with new bitcasts in the outer region. Example ``` define void @foo() { entry: %0 = alloca i32 br label %extract extract: %1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) call void @use(i32* %0) br label %exit exit: call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %1) ret void } ``` Current extraction ``` define void @foo() { entry: %.loc = alloca i8, align 8 %0 = alloca i32, align 4 br label %codeRepl codeRepl: ; preds = %entry %lt.cast = bitcast i8* %.loc to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast) %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0, i8** %.loc) %.reload = load i8, i8* %.loc, align 8 call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast) br label %exit exit: ; preds = %codeRepl call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload) ret void } define internal void @foo.extract(i32* %0, i8** %.out) { newFuncRoot: br label %extract exit.exitStub: ; preds = %extract ret void extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* store i8* %1, i8** %.out, align 8 call void @use(i32* %0) br label %exit.exitStub } ``` Extraction with patch ``` define void @foo() { entry: %0 = alloca i32, align 4 br label %codeRepl codeRepl: ; preds = %entry %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0) br label %exit exit: ; preds = %codeRepl call void @use(i32* %0) %lt.cast = bitcast i32* %0 to i8* call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast) ret void } define internal void @foo.extract(i32* %0) { newFuncRoot: br label %extract exit.exitStub: ; preds = %extract ret void extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* call void @use(i32* %0) br label %exit.exitStub } ``` Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D90689	2020-11-05 17:01:08 -08:00
Sjoerd Meijer	7eb70158e4	[IndVarSimplify][SimplifyIndVar] Move WidenIV to Utils/SimplifyIndVar. NFCI. This moves WidenIV from IndVarSimplify to Utils/SimplifyIndVar so that we have createWideIV available as a generic helper utility. I.e., this is not only useful in IndVarSimplify, but could be useful for loop transformations. For example, motivation for this refactoring is the loop flatten transformation: if induction variables in a loop nest can be widened, we can avoid having to perform certain overflow checks, enabling this transformation. Differential Revision: https://reviews.llvm.org/D90421	2020-11-05 16:52:47 +00:00
Florian Hahn	be0578f0b4	[GVN] Fix MemorySSA update when replacing assume(false) with stores. When replacing an assume(false) with a store, we have to be more careful with the order we insert the new access. This patch updates the code to look at the accesses in the block to find a suitable insertion point. Alterantively we could check the defining access of the assume, but IIRC there has been some discussion about making assume() readnone, so looking at the access list might be more future proof. Fixes PR48072. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90784	2020-11-05 12:09:32 +00:00
Simon Pilgrim	4b2be681f4	[InstCombine] Remove orphan InstCombinerImpl method declarations. NFCI.	2020-11-05 10:13:16 +00:00
Arnold Schwaighofer	ea5989b43a	Start of an llvm.coro.async implementation This patch adds the `async` lowering of coroutines. This will be used by the Swift frontend to lower async functions. In contrast to the `retcon` lowering the frontend needs to be in control over control-flow at suspend points as execution might be suspended at these points. This is very much work in progress and the implementation will change as it evolves with the frontend. As such the documentation is lacking detail as some of it might change. rdar://70097093 Reapply with fix for memory sanitizer failure and sphinx failure. Differential Revision: https://reviews.llvm.org/D90612	2020-11-04 10:29:21 -08:00
Arnold Schwaighofer	42f1916640	Revert "Start of an llvm.coro.async implementation" This reverts commit `ea606cced0`. This patch causes memory sanitizer failures sanitizer-x86_64-linux-fast.	2020-11-04 08:26:20 -08:00
Arnold Schwaighofer	ea606cced0	Start of an llvm.coro.async implementation This patch adds the `async` lowering of coroutines. This will be used by the Swift frontend to lower async functions. In contrast to the `retcon` lowering the frontend needs to be in control over control-flow at suspend points as execution might be suspended at these points. This is very much work in progress and the implementation will change as it evolves with the frontend. As such the documentation is lacking detail as some of it might change. rdar://70097093 Differential Revision: https://reviews.llvm.org/D90612	2020-11-04 07:32:29 -08:00
Roman Lebedev	93f3d7f7b3	[Reassociate] Guard `add`-like `or` conversion into an `add` with profitability check This is slightly better compile-time wise, since we avoid potentially-costly knownbits analysis that will ultimately not allow us to actually do anything with said `add`.	2020-11-04 16:10:34 +03:00
Martin Storsjö	36cf1e7d0e	Revert "[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts" This reverts commit `59b22e495c`. That commit broke building for ARM and AArch64, reproducible like this: $ cat apedec-reduced.c a; b(e) { int c; unsigned d = f(); c = d >> 32 - e; return c; } g() { int h = i(); if (a) h = h << a \| b(a); return h; } $ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed. Same thing for e.g. an armv7-linux-gnueabihf target.	2020-11-04 08:39:32 +02:00
Xun Li	7f34aca083	[musttail] Unify musttail call preceding return checking There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction. Use it instead of manually checking in each place. Differential Revision: https://reviews.llvm.org/D90693	2020-11-03 11:39:27 -08:00
Roman Lebedev	70472f34b2	[Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation InstCombine is quite aggressive in doing the opposite transform, folding `add` of operands with no common bits set into an `or`, and that not many things support that new pattern.. In this case, teaching Reassociate about it is easy, there's preexisting art for `sub`/`shl`: just convert such an `or` into an `add`: https://rise4fun.com/Alive/Xlyv	2020-11-03 22:30:51 +03:00
Sanne Wouda	2ec26d3a23	Revert "Add loop distribution to the LTO pipeline" This reverts commit `6e80318eec`.	2020-11-03 19:29:27 +00:00
Sanne Wouda	6e80318eec	Add loop distribution to the LTO pipeline The LoopDistribute pass is missing from the LTO pipeline, so -enable-loop-distribute has no effect during post-link. The pre-link loop distribution doesn't seem to survive the LTO pipeline either. With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43% uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is unaffected. Differential Revision: https://reviews.llvm.org/D89896	2020-11-03 18:54:24 +00:00
Jameson Nash	59a6ab28c4	[GVN] small improvements to comments	2020-11-03 13:21:48 -05:00
Roman Lebedev	c009d11bda	[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator In particular, it makes it fire for C=0, because negator doesn't want to perform that fold since in general it's not beneficial.	2020-11-03 16:06:52 +03:00
Roman Lebedev	e465f9c303	[InstCombine] Negator: - (C - %x) --> %x - C (PR47997) This relaxes one-use restriction on that `sub` fold, since apparently the addition of Negator broke preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.	2020-11-03 16:06:51 +03:00
Florian Hahn	d68bed0fa9	[SCCP] Handle bitcast of vector constants. Vectors where all elements have the same known constant range are treated as a single constant range in the lattice. When bitcasting such vectors, there is a mis-match between the width of the lattice value (single constant range) and the original operands (vector). Go to overdefined in that case. Fixes PR47991.	2020-11-03 12:58:39 +00:00
Simon Pilgrim	59b22e495c	[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns. This should allow us to begin resolving PR46896 et al. Differential Revision: https://reviews.llvm.org/D90625	2020-11-03 10:49:49 +00:00
Florian Hahn	d9cbf39a37	[SLP] Pass VecPred argument to getCmpSelInstrCost. Check if all compares in VL have the same predicate and pass it to getCmpSelInstrCost, to improve cost-modeling on targets that only support compare/select combinations for certain uniform predicates. This leads to additional vectorization in some cases ``` Same hash: 217 (filtered out) Remaining: 19 Metric: SLP.NumVectorInstructions Program base slp2 diff test-suite...marks/SciMark2-C/scimark2.test 11.00 26.00 136.4% test-suite...T2006/445.gobmk/445.gobmk.test 79.00 135.00 70.9% test-suite...ediabench/gsm/toast/toast.test 54.00 71.00 31.5% test-suite...telecomm-gsm/telecomm-gsm.test 54.00 71.00 31.5% test-suite...CI_Purple/SMG2000/smg2000.test 426.00 542.00 27.2% test-suite...ch/g721/g721encode/encode.test 30.00 24.00 -20.0% test-suite...000/186.crafty/186.crafty.test 116.00 138.00 19.0% test-suite...ications/JM/ldecod/ldecod.test 697.00 765.00 9.8% test-suite...6/464.h264ref/464.h264ref.test 822.00 886.00 7.8% test-suite...chmarks/MallocBench/gs/gs.test 154.00 162.00 5.2% test-suite...nsumer-lame/consumer-lame.test 621.00 651.00 4.8% test-suite...lications/ClamAV/clamscan.test 223.00 231.00 3.6% test-suite...marks/7zip/7zip-benchmark.test 680.00 695.00 2.2% test-suite...CFP2000/177.mesa/177.mesa.test 2121.00 2129.00 0.4% test-suite...:: External/Povray/povray.test 2406.00 2412.00 0.2% test-suite...TimberWolfMC/timberwolfmc.test 634.00 634.00 0.0% test-suite...CFP2006/433.milc/433.milc.test 1036.00 1036.00 0.0% test-suite.../Benchmarks/nbench/nbench.test 321.00 321.00 0.0% test-suite...ctions-flt/Reductions-flt.test NaN 5.00 nan% ``` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D90124	2020-11-03 10:16:43 +00:00
Max Kazantsev	46b2e85f0f	[NFC] Refactor code in IndVars, preparing for further improvement	2020-11-03 15:08:12 +07:00
Max Kazantsev	a44b7322a2	[NFC] Split lambda into 2 parts for further reuse	2020-11-03 14:13:55 +07:00
Max Kazantsev	f847094c24	[IndVars] Use knowledge about execution on last iteration when removing checks If we know that some check will not be executed on the last iteration, we can use this fact to eliminate its check. Differential Revision: https://reviews.llvm.org/D88210 Reviwed By: ebrevnov	2020-11-03 13:38:58 +07:00
Alina Sbirlea	f514b32a89	[LICM] Add assert of AST/MSSA exclusiveness. The API `canSinkOrHoistInst` may be called by LoopSink. Add assert to avoid having two analyses passed in.	2020-11-02 18:04:43 -08:00
Akira Hatanaka	b0f1d7d562	Remove unused parameter	2020-11-02 17:40:06 -08:00
Ettore Tiotto	4274cbba1c	[PartialInliner]: Handle code regions in a switch stmt cases This patch enhances computeOutliningColdRegionsInfo() to allow it to consider regions containing a single basic block and a single predecessor as candidate for partial inlining. Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D89911	2020-11-02 14:32:45 -05:00
Simon Pilgrim	55f15f99cb	[AggressiveInstCombine] foldGuardedRotateToFunnelShift - generalize rotation to funnel shift matcher. Replace matchRotate with a more general matchFunnelShift - at the moment this is still just used for rotation patterns.	2020-11-02 17:09:17 +00:00
Fangrui Song	98b9338588	[Debugify] Port -debugify-each to NewPM Preemptively switch 2 tests to the new PM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D90365	2020-11-02 08:16:43 -08:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Teresa Johnson	0949f96dc6	[MemProf] Pass down memory profile name with optional path from clang Similar to -fprofile-generate=, add -fmemory-profile= which takes a directory path. This is passed down to LLVM via a new module flag metadata. LLVM in turn provides this name to the runtime via the new __memprof_profile_filename variable. Additionally, always pass a default filename (in $cwd if a directory name is not specified vi the = form of the option). This is also consistent with the behavior of the PGO instrumentation. Since the memory profiles will generally be fairly large, it doesn't make sense to dump them to stderr. Also, importantly, the memory profiles will eventually be dumped in a compact binary format, which is another reason why it does not make sense to send these to stderr by default. Change the existing memprof tests to specify log_path=stderr when that was being relied on. Depends on D89086. Differential Revision: https://reviews.llvm.org/D89087	2020-11-01 17:38:23 -08:00
Florian Hahn	ca38652b9a	[VPlan] Assert no users remaining when deleting a VPValue. When deleting a VPValue, all users must already by deleted. Add an assertion to make sure and catch violations.	2020-11-01 17:44:53 +00:00
Florian Hahn	aab71d4443	[DSE] Use same logic as legacy impl to check if free kills a location. This patch updates DSE + MemorySSA to use the same check as the legacy implementation to determine if a location is killed by a free call. This changes the existing behavior so that a free does not kill locations before the start of the freed pointer. This should fix PR48036.	2020-10-31 20:09:25 +00:00
Florian Hahn	799033d8c5	Reland "[SLP] Consider alternatives for cost of select instructions." This reverts the revert commit `a1b53db324`. This patch includes a fix for a reported issue, caused by matchSelectPattern returning UMIN for selects of pointers in some cases by looking to some connected casts. For now, ensure integer instrinsics are only returned for selects of ints or int vectors.	2020-10-31 16:52:36 +00:00
Simon Pilgrim	538fdb0189	[InstCombine] foldSelectRotate - generalize to foldSelectFunnelShift This is the last of the rotate->funnel shift InstCombine generalizations for PR46896 We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine Differential Revision: https://reviews.llvm.org/D90382	2020-10-31 12:32:34 +00:00
Simon Pilgrim	4da6a48399	[CSE] Make some basic EarlyCSE::StackNode helper methods const. NFCI. Fixes a number of cppcheck remarks.	2020-10-31 12:16:48 +00:00
Nikita Popov	27f647d117	[Inliner] Consistently apply callsite noalias metadata Previously, !noalias and !alias.scope metadata on the call site was applied as part of CloneAliasScopeMetadata(), which short-circuits if the callee does not use any noalias metadata itself. However, these two things have no relation to each other. Consistently apply !noalias and !alias.scope metadata by integrating this into an existing function that handled !llvm.access.group and !llvm.mem.parallel_loop_access metadata. The handling for all of these metadata kinds essentially the same.	2020-10-31 10:54:45 +01:00
Arthur Eubanks	5c31b8b94f	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `10f2a0d662`. More uint64_t overflows.	2020-10-31 00:25:32 -07:00
Florian Hahn	a1b53db324	Revert "[SLP] Consider alternatives for cost of select instructions." This reverts commit `1922570489`. This appears to cause a crash in the following example a, b, c; l() { int e = a, f = l, g, h, i, j; float d = c, k = b; for (;;) for (; g < f; g++) { k[h] = d[i]; k[h - 1] = d[j]; h += e << 1; i += e; } } clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.	2020-10-30 21:26:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Peter Collingbourne	3d049bce98	hwasan: Support for outlined checks in the Linux kernel. Add support for match-all tags and GOT-free runtime calls, which are both required for the kernel to be able to support outlined checks. This requires extending the access info to let the backend know when to enable these features. To make the code easier to maintain introduce an enum with the bit field positions for the access info. Allow outlined checks to be enabled with -mllvm -hwasan-inline-all-checks=0. Kernels that contain runtime support for outlined checks may pass this flag. Kernels lacking runtime support will continue to link because they do not pass the flag. Old versions of LLVM will ignore the flag and continue to use inline checks. With a separate kernel patch [1] I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time before 92824064 6.18s after 38822400 6.65s [1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Depends on D90425 Differential Revision: https://reviews.llvm.org/D90426	2020-10-30 14:25:40 -07:00
Peter Collingbourne	0930763b4b	hwasan: Move fixed shadow behind opaque no-op cast as well. This is a workaround for poor heuristics in the backend where we can end up materializing the constant multiple times. This is particularly bad when using outlined checks because we materialize it for every call (because the backend considers it trivial to materialize). As a result the field containing the shadow base value will always be set so simplify the code taking that into account. Differential Revision: https://reviews.llvm.org/D90425	2020-10-30 13:23:52 -07:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Pedro Tammela	86e0c1acdb	[NFC][Reg2Mem] modernize loops iterators This patch updates the Reg2Mem loops to use more modern iterators. Differential Revision: https://reviews.llvm.org/D90122	2020-10-30 16:50:07 +00:00
Pedro Tammela	70a495c7f0	[NFC][LoopSimplify] modernize for loops over LoopInfo This patch modifies two for loops to use the range based syntax. Since they are equivalent, this patch is tagged NFC. Differential Revision: https://reviews.llvm.org/D90069	2020-10-30 16:50:07 +00:00
Michael Liao	c82403d025	[gvn] PRE needs to skip convergent intrinsics/calls. - As convergent intrinsics/calls could only be moved to control-equivalent blocks, or more precisely the same divergent branch, PRE needs to skip them. Differential Revision: https://reviews.llvm.org/D90391	2020-10-30 11:24:40 -04:00
Evgeniy Brevnov	3d31adaec4	[DSE] Improve partial overlap detection Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities. Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think? Reviewed By: fhahn, asbirlea Differential Revision: https://reviews.llvm.org/D90371	2020-10-30 22:23:20 +07:00
Simon Pilgrim	ed577892cf	Use cast<> instead of dyn_cast<> as we dereference the pointers immediately. NFCI. Fix clang static analyzer warnings - we're better off relying on cast<> asserting on failure rather than a null dereference crash.	2020-10-30 15:20:40 +00:00
Florian Hahn	aa1a198a64	[VPlan] Use isa<> instead getVPRecipeID in getFirstNonPhi (NFC). As per the comment in VPRecipeBase, clients should not rely on getVPRecipeID, as it may change in the future. It should only be used in classof implementations. Use isa instead in getFirstNonPhi.	2020-10-30 14:56:06 +00:00
Simon Pilgrim	b7c91a9b8e	[SCEV] SCEVExpander::InsertNoopCastOfTo - reduce scope of pointer type. NFCI. By reducing the scope of the dyn_cast<PointerType> we can make this a cast<PointerType> and avoid clang static analyzer null deference warnings.	2020-10-30 14:55:09 +00:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Simon Pilgrim	9e154f1aca	[SROA] Pass Twine by const reference. NFCI. Fixes clang-tidy warnings.	2020-10-30 11:36:58 +00:00
Max Kazantsev	bd341bafbf	[NFC] Simplify code in IndVars	2020-10-30 17:49:32 +07:00
Florian Hahn	05e4f7bde9	[DSE] Remove noop stores after killing stores for a MemoryDef. Currently we fail to eliminate some noop stores if there is a kill-able store between the starting def and the load. This is because we eliminate noop stores first. In practice it seems like eliminating noop stores after the main elimination for a def covers slightly more cases. This patch improves the number of stores slightly in 2 cases for X86 -O3 -flto Same hash: 235 (filtered out) Remaining: 2 Metric: dse.NumRedundantStores Program base patch diff test-suite...ce/Benchmarks/PAQ8p/paq8p.test 2.00 3.00 50.0% test-suite...006/453.povray/453.povray.test 18.00 21.00 16.7% There might be other phase ordering issues, but it appears that they do not show up in the test-suite/SPEC2000/SPEC2006. We can always tune the ordering later. Partly fixes PR47887. Reviewed By: asbirlea, zoecarver Differential Revision: https://reviews.llvm.org/D89650	2020-10-30 09:40:15 +00:00
Roman Lebedev	81fc53a36a	[SCEV] Introduce SCEVPtrToIntExpr (PR46786) And use it to model LLVM IR's `ptrtoint` cast. This is essentially an alternative to D88806, but with no chance for all the problems it caused due to having the cast as implicit there. (see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3) As we've established by now, there are at least two reasons why we want this: * It will allow SCEV to actually model the `ptrtoint` casts and their operands, instead of treating them as `SCEVUnknown` * It should help with initial problem of PR46786 - this should eventually allow us to not loose pointer-ness of an expression in more cases As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 \| PR46786 ]], in principle, we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()` should sink the cast as far down into the expression as possible, so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`. But i think that it isn't the best solution, because it doesn't really matter from memory consumption side - there probably won't be that many `SCEVPtrToIntExpr`s for it to matter, and it allows for much better discoverability. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89456	2020-10-30 11:13:35 +03:00
Vitaly Buka	36fa658db5	[NFC] Fix "ambiguous overload for ‘operator=’" From D89768	2020-10-30 00:43:32 -07:00
Vitaly Buka	1455259546	[NFC] Fix "ambiguous overload for ‘operator=’"	2020-10-30 00:36:50 -07:00
Xun Li	9f5a2beadc	[Coroutine] Properly determine whether an alloca should live on the frame The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame. Differential Revision: https://reviews.llvm.org/D89768	2020-10-29 23:56:05 -07:00
Stefanos Baziotis	a3345300b6	[LCSSA] Doc for special treatment of PHIs Differential Revision: https://reviews.llvm.org/D89739	2020-10-29 22:50:07 +02:00
Nikita Popov	20b386aae0	[LoopUtils] Fix neutral value for vector.reduce.fadd Use -0.0 instead of 0.0 as the start value. The previous use of 0.0 was fine for all existing uses of this function though, as it is always generated with fast flags right now, and thus nsz.	2020-10-29 21:45:13 +01:00
Florian Hahn	1922570489	[SLP] Consider alternatives for cost of select instructions. Some architectures do not have general vector select instructions (e.g. AArch64). But some cmp/select patterns can be vectorized using other instructions/intrinsics. One example is using min/max instructions for certain patterns. This patch updates the cost calculations for selects in the SLP vectorizer to consider using min/max intrinsics. This patch does not change SLP vectorizer's codegen itself to actually generate those intrinsics, but relies on the backends to lower the vector cmps & selects. This keeps things simple on the SLP side and works well in practice for AArch64. This exposes additional SLP vectorization opportunities in some benchmarks on AArch64 (-O3 -flto). Metric: SLP.NumVectorInstructions Program base slp diff test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8% test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2% test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1% test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2% test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5% test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2% Reviewed By: RKSimon, samparker Differential Revision: https://reviews.llvm.org/D89969	2020-10-29 20:39:50 +00:00
Dávid Bolvanský	7a2abf5aca	[InferAttrs] Add nocapture/writeonly to string/mem libcalls One step closer to fix PR47644. Differential Revision: https://reviews.llvm.org/D89645	2020-10-29 20:06:43 +01:00
Simon Pilgrim	dcb3dc101d	[InstCombine] visitShl - ensure inner shifts have inrange amounts Noticed when fixing OSS Fuzz #26716	2020-10-29 15:28:15 +00:00
Max Kazantsev	a5b2e795c3	[NFC][SCEV] Refactor monotonic predicate checks to return enums instead of bools This patch gets rid of output parameter which is not needed for most users and prepares this API for further refactoring.	2020-10-29 16:01:25 +07:00
Johannes Doerfert	d39f574dcc	[Attributor][FIX] Properly promote arguments pointers to arrays When we promote pointer arguments we did compute a wrong offset and use a wrong type for the array case. Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.	2020-10-29 00:45:32 -05:00
Fangrui Song	39856d5d0b	[Debugify] Move global namespace functions into llvm:: Also move exportDebugifyStats from tools/opt to Debugify.cpp	2020-10-28 19:11:41 -07:00
Florian Hahn	53f4c4b2cc	[InstCombine] Do not introduce bitcasts for swifterror arguments. The following constraints hold for swifterror values: A swifterror value (either the parameter or the alloca) can only be loaded and stored from, or used as a swifterror argument. This patch updates instcombine to not try to convert a bitcast of a function into a bitcast of a swifterror argument. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D90258	2020-10-28 21:52:12 +00:00
Benjamin Kramer	207cf71fa9	Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" This reverts commit `d981c7b758` and `a87d7b3d44`. Test fails under msan.	2020-10-28 13:58:14 +01:00
Max Kazantsev	160a453138	Return "[IndVars] Remove monotonic checks with unknown exit count" This reverts commit `e038b60d91`. This reverts commit `a0d84d8031`. This revert was a mistake. The reason of the failures was "Use uint64_t for branch weights instead of uint32_t" Differential Revision: https://reviews.llvm.org/D87832	2020-10-28 18:51:40 +07:00
Florian Hahn	b82f80057d	[DSE] Use walker to skip noalias stores between current & clobber def. Instead of getting the defining access we should be able to use getClobberingMemoryAccess to skip non-aliasing MemoryDefs. No additional checks should be needed, because we only remove the starting def if it matches the defining access of the load. All we need to worry about is that there are no (may)alias stores between the starting def and the load and getClobberingMemoryAccess should guarantee that. Partly fixes PR47887. This improves the number of redundant stores removed in some cases (numbers below for MultiSource, SPEC2000, SPEC2006 on X86 with -flto -O3). Same hash: 226 (filtered out) Remaining: 11 Metric: dse.NumRedundantStores Program base patch1 diff test-suite...:: External/Povray/povray.test 1.00 5.00 400.0% test-suite...chmarks/MallocBench/gs/gs.test 1.00 3.00 200.0% test-suite...0/253.perlbmk/253.perlbmk.test 21.00 37.00 76.2% test-suite...0.perlbench/400.perlbench.test 24.00 37.00 54.2% test-suite.../Applications/SPASS/SPASS.test 3.00 4.00 33.3% test-suite...006/453.povray/453.povray.test 15.00 18.00 20.0% test-suite...T2006/445.gobmk/445.gobmk.test 27.00 29.00 7.4% test-suite.../CINT2006/403.gcc/403.gcc.test 136.00 137.00 0.7% test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0% test-suite.../Benchmarks/Bullet/bullet.test NaN 3.00 nan% test-suite.../Benchmarks/Ptrdist/bc/bc.test NaN 1.00 nan% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89647	2020-10-28 11:01:25 +00:00
Luqman Aden	4c0a016927	Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC. The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining vs table-based exception handling. x86-32 is the only arch for which Windows uses the former. 32-bit ARM would use what is called Win64SEH today, which is a bit confusing so instead let's just rename it to be a bit more clear. Reviewed By: compnerd, rnk Differential Revision: https://reviews.llvm.org/D90117	2020-10-27 23:22:13 -07:00
Kazu Hirata	b2f05fae80	[JumpThreading] Remove extraneous calls to setEdgeProbability This patch removes extraneous calls to setEdgeProbability introduced in `c91487769d`. The follow-up patch, `a7b662d0f4`, has since fixed BranchProbabilityInfo::eraseBlock, so we don't need to worry about getting stale values from getEdgeProbability. Also, since getEdgeProbability(BB, BB->getSingleSuccessor()) returns edge probability 1/1 by default for BB with exactly one successor edge, we don't need to explicitly call setEdgeProbability. This patch introduces almost no functional change, but we do end up reducing debug messages from setEdgeProbability. Differential Revision: https://reviews.llvm.org/D90284	2020-10-27 21:12:54 -07:00
Johannes Doerfert	d13daa4018	[Attributor] Finalize the CGUpdater after each SCC This matches the new PM model.	2020-10-27 22:07:56 -05:00
Johannes Doerfert	50d34958df	[Attributor][NFC] Introduce a debug counter for `AA::manifest` This will simplify debugging and tracking down problems.	2020-10-27 22:07:56 -05:00
Johannes Doerfert	1d57b7f503	[Attributor][NFC] Print the right value in debug output	2020-10-27 22:07:55 -05:00
Johannes Doerfert	1c2531c9e1	[Attributor][FIX] Delete all unreachable static functions Before we used to only mark unreachable static functions as dead if all uses were known dead. Now we optimistically assume uses to be dead until proven otherwise.	2020-10-27 22:07:55 -05:00
Johannes Doerfert	bfe05b1aff	[Attributor][FIX] Do not attach range metadata to the wrong Instruction If we are looking at a call site argument it might be a load or call which is in a different context than the call site argument. We cannot simply use the call site argument range for the call or load. Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.	2020-10-27 22:07:55 -05:00
Johannes Doerfert	724fcce109	[Attributor][NFC] Clang-format	2020-10-27 22:07:55 -05:00
Johannes Doerfert	d504f7b91a	[Attributor][NFC] Hoist call out of a lambda The call is not free, unsure if this is needed but it does not make it worse either.	2020-10-27 22:07:54 -05:00
Johannes Doerfert	30e5a1f0be	[Attributor][FIX] Properly check uses in the call not uses of the call In the AANoAlias logic we determine if a pointer may have been captured before a call. We need to look at other uses in the call not uses of the call. The new code is not perfect as it does not allow trivial cases where the call has multiple arguments but it is at least not unsound and a TODO was added.	2020-10-27 22:07:54 -05:00
Johannes Doerfert	cb813ab66a	[Attributor][NFC] Improve time trace output	2020-10-27 22:07:54 -05:00
Kazu Hirata	c91487769d	[JumpThreading] Set edge probabilities when creating basic blocks This patch teaches the jump threading pass to set edge probabilities whenever the pass creates new basic blocks. Without this patch, the compiler sometimes produces non-deterministic results. The non-determinism comes from the jump threading pass using stale edge probabilities in BranchProbabilityInfo. Specifically, when the jump threading pass creates a new basic block, we don't initialize its outgoing edge probability. Edge probabilities are maintained in: DenseMap<Edge, BranchProbability> Probs; in class BranchProbabilityInfo, where Edge is an ordered pair of BasicBlock * and a successor index declared as: using Edge = std::pair<const BasicBlock *, unsigned>; Probs maps edges to their corresponding probabilities. Now, we rarely remove entries from this map, so if we happen to allocate a new basic block at the same address as a previously deleted basic block with an edge probability assigned, the newly created basic block appears to have an edge probability, albeit a stale one. This patch fixes the problem by explicitly setting edge probabilities whenever the jump threading pass creates new basic blocks. Differential Revision: https://reviews.llvm.org/D90106	2020-10-27 16:07:27 -07:00
Joseph Huber	a87d7b3d44	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;". See clang/test/OpenMP/target_map_names.cpp for an example of the generated output for a given map clause. Reviewers: jdoervert Differential Revision: https://reviews.llvm.org/D89802	2020-10-27 16:09:19 -04:00
Nicolai Hähnle	e025d09b21	Revert multiple patches based on "Introduce CfgTraits abstraction" These logically belong together since it's a base commit plus followup fixes to less common build configurations. The patches are: Revert "CfgInterface: rename interface() to getInterface()" This reverts commit `a74fc48158`. Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5" This reverts commit `f2a06875b6`. Revert "Try to make GCC5 happy about the CfgTraits thing" This reverts commit `03a5f7ce12`. Revert "Introduce CfgTraits abstraction" This reverts commit `c0cdd22c72`.	2020-10-27 20:33:30 +01:00
Nicolai Hähnle	ce6900c6cb	Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes" This reverts commit `848a68a032`.	2020-10-27 20:33:29 +01:00
Vedant Kumar	5a3ef55a52	[Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746) This patch changes MergeBlockIntoPredecessor to skip the call to RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to some compile-time issues spotted in LoopUnroll and SimplifyCFG. The call to RemoveRedundantDbgInstrs appears to have changed the worst-case behavior of the merging utility. Loosely speaking, it seems to have gone from O(#phis) to O(#insts). It might not be possible to mitigate this by scanning a block to determine whether there are any debug intrinsics to remove, since such a scan costs O(#insts). So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly little fallout from this, and most of it can be addressed by doing RemoveRedundantDbgInstrs later. The exception is (the block-local version of) SimplifyCFG, where it might just be too expensive to call RemoveRedundantDbgInstrs. Differential Revision: https://reviews.llvm.org/D88928	2020-10-27 10:12:59 -07:00
Raphael Isemann	e038b60d91	Revert "[IndVars] Remove monotonic checks with unknown exit count" This reverts commit `c6ca26c0bf`. This breaks stage2 builds due to hitting this assert: ``` Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights ``` when compiling AArch64RegisterBankInfo.cpp in LLVM.	2020-10-27 15:31:37 +01:00
Raphael Isemann	a0d84d8031	Revert "[NFC] Factor away lambda's redundant parameter" This reverts commit `fdc845b361`. It seems to be a follow-up to c6372b3fb495 which will be reverted.	2020-10-27 15:30:52 +01:00
Simon Pilgrim	bce770ffa6	Revert rG0905bd5c2fa42bd4c "[InstCombine] collectBitParts - add trunc support." This reverts commit `0905bd5c2f`. Causing failures in multistage buildbots that I need to investigate	2020-10-27 13:43:54 +00:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `e5766f25c6`. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Simon Pilgrim	0905bd5c2f	[InstCombine] collectBitParts - add trunc support. This should allow us to remove the rather limited matchOrConcat fold and just use recognizeBSwapOrBitReverseIdiom.	2020-10-27 13:14:54 +00:00
Roman Lebedev	0ac56e8eaa	[InstCombine] Fold `(X >>? C1) << C2` patterns to shift+bitmask (PR37872) This is essentially finalizes a revert of rL155136, because nowadays the situation has improved, SCEV can model all these patterns well, and we canonicalize rotate-like patterns into a funnel shift intrinsics in InstCombine. So this should not cause any pessimization. I've verified the canonicalize-{a,l}shr-shl-to-masking.ll transforms with alive, which confirms that we can freely preserve exact-ness, and no-wrap flags. Profs: * base: https://rise4fun.com/Alive/gPQ * exact-ness preservation: https://rise4fun.com/Alive/izi * nuw preservation: https://rise4fun.com/Alive/DmD * nsw preservation: https://rise4fun.com/Alive/SLN6N * nuw nsw preservation: https://rise4fun.com/Alive/Qp7 Refs. https://reviews.llvm.org/D46760	2020-10-27 14:42:53 +03:00
Florian Hahn	f067bc3c0a	[LoopRotation] Allow loop header duplication if vectorization is forced. -Oz normally does not allow loop header duplication so this loop wouldn't be vectorized. However the vectorization pragma should override this and allow for loop rotation. rdar://problem/49281061 Original patch by Adam Nemet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D59832	2020-10-27 09:28:01 +00:00
Max Kazantsev	fdc845b361	[NFC] Factor away lambda's redundant parameter	2020-10-27 12:56:52 +07:00
Serguei Katkov	b69919b537	[GVN LoadPRE] Add an option to disable splitting backedge GVN Load PRE can split the backedge causing breaking the loop structure where the latch contains the conditional branch with for example induction variable. Different optimizations expect this form of the loop, so it is better to preserve it for some time. This CL adds an option to control an ability to split backedge. Default value is true so technically it is NFC and current behavior is not changed. Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn Reviewed By: mkazasntsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D89854	2020-10-27 11:59:52 +07:00
Max Kazantsev	c6ca26c0bf	[IndVars] Remove monotonic checks with unknown exit count Even if the exact exit count is unknown, we can still prove that this exit will not be taken. If we can prove that the predicate is monotonic, fulfilled on first & last iteration, and no overflow happened in between, then the check can be removed. Differential Revision: https://reviews.llvm.org/D87832 Reviewed By: apilipenko	2020-10-27 11:35:16 +07:00
Arthur Eubanks	42f76e193b	Reland [AlwaysInliner] Pass callee AAResults to InlineFunction() Test copied from noalias-calls.ll with small changes. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89609	2020-10-26 20:40:46 -07:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Arthur Eubanks	4af5ba1726	Revert "[AlwaysInliner] Pass callee AAResults to InlineFunction()" This reverts commit `504fbec7a6`. Test failure.	2020-10-26 20:23:38 -07:00
Arthur Eubanks	504fbec7a6	[AlwaysInliner] Pass callee AAResults to InlineFunction() Test copied from noalias-calls.ll with small changes. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89609	2020-10-26 20:10:09 -07:00
Arthur Eubanks	3dd1c72458	Port -objc-arc-expand to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90182	2020-10-26 20:05:10 -07:00
Arthur Eubanks	90c0b0d3d6	Port -objc-arc-apelim to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90181	2020-10-26 20:01:46 -07:00
Chen Zheng	00e573cadb	[LSR] fix typo in comments and rename for a new added hook.	2020-10-26 22:29:22 -04:00
TaWeiTu	0efbfa38ae	[NPM] Port -slsr to NPM `-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D90149	2020-10-27 09:21:40 +08:00
Sriraman Tallam	ad1b9daa4b	Prepend "__uniq" to symbol names hash with -funique-internal-linkage-names. Prepend the module name hash with a fixed string ".__uniq." which helps tools that consume sampled profiles and attribute it to functions to understand that this symbol belongs to a unique internal linkage type symbol. Symbols with suffixes can result from various optimizations in the compiler. Function Multiversioning, function splitting, parameter constant propogation, unique internal linkage names. External tools like sampled profile aggregators combine profiles from multiple runs of a binary. They use various heuristics with symbols that have suffixes to try and attribute the profile to the right function instance. For instance multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different should be attributed to the same source function if a single function is versioned, using attribute target_clones (supported in GCC but yet to land in LLVM). Similarly, functions that are split (split part having a .cold suffix) could have profiles for both the original and split symbols but would be aggregated and attributed to the original function that was split. Unique internal linkage functions however have different source instances and the aggregator must not put them together but attribute it to the appropriate function instance. To be sure that we are dealing with a symbol of a unique internal linkage function, we would like to prepend the hash with a known string ".__uniq." which these tools can check to understand the suffix type. Differential Revision: https://reviews.llvm.org/D89617	2020-10-26 14:24:28 -07:00
Sanjay Patel	5a6e66ec72	[InstCombine] add folds for icmp+ctpop https://alive2.llvm.org/ce/z/XjFPQJ define void @src(i64 %value) { %t0 = call i64 @llvm.ctpop.i64(i64 %value) %gt = icmp ugt i64 %t0, 63 %lt = icmp ult i64 %t0, 64 call void @use(i1 %gt, i1 %lt) ret void } define void @tgt(i64 %value) { %eq = icmp eq i64 %value, -1 %ne = icmp ne i64 %value, -1 call void @use(i1 %eq, i1 %ne) ret void } declare i64 @llvm.ctpop.i64(i64) #1 declare void @use(i1, i1)	2020-10-26 16:48:56 -04:00
Sanjay Patel	437d7551c5	[InstCombine] reduce code duplication in icmp intrinsic folds; NFC	2020-10-26 16:48:56 -04:00
Stanislav Mekhanoshin	00928a1956	Fix SROA with a PHI mergig values from a same block This fixes the bug 47945. It is legal to have a PHI with values from from the same block, but values must stay the same. In this case it is illegal to merge different values. Differential Revision: https://reviews.llvm.org/D89978	2020-10-26 12:58:27 -07:00
Joe Ellis	0f83505593	[SVE][InstCombine] Fix TypeSize warning in canReplaceGEPIdxWithZero The warning would fire when calling canReplaceGEPIdxWithZero on a GEP whose source element type is a scalable vector. The size of scalable vector types is not known, so this optimization cannot be performed. This patch fixes the issue by: - bailing out early in this routine if the GEP instruction's source element type is a scalable vector. - making use of getFixedSize -- this removes the dependency on the deprecated interface. Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D89968	2020-10-26 17:40:26 +00:00
Joe Ellis	467e5cf40f	[SVE][AArch64] Fix TypeSize warning in loop vectorization legality The warning would fire when calling isDereferenceableAndAlignedInLoop with a scalable load. Calling isDereferenceableAndAlignedInLoop with a scalable load would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes this issue by: - no longer considering vector loads as candidates in canVectorizeWithIfConvert. This doesn't make sense in the context of identifying scalar loads to vectorize. - making use of getFixedSize inside isDereferenceableAndAlignedInLoop -- this removes the dependency on the deprecated interface, and will trigger an assertion error if the function is ever called with a scalable type. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89798	2020-10-26 17:40:04 +00:00
Simon Pilgrim	532f3bec3e	[InstCombine] collectBitParts - add bitreverse intrinsic support.	2020-10-26 14:36:36 +00:00
Simon Pilgrim	6b2eb31e1e	[InstCombine] Add support for zext(and(neg(amt),width-1)) rotate shift amount patterns Alive2: https://alive2.llvm.org/ce/z/bCvvHd	2020-10-26 11:22:41 +00:00
Max Kazantsev	bfabd7878b	Fix broken build after previous commit	2020-10-26 14:55:46 +07:00
Max Kazantsev	cdccc82f48	[NFC] Remove unused funciton param	2020-10-26 14:53:22 +07:00
Max Kazantsev	4b5e848bef	[NFC] Factor out common code into lambda for further improvement	2020-10-26 14:50:45 +07:00
Max Kazantsev	c019099053	[IndVars] Use contextual knowledge when proving trivial conds No exact example where it would help, but it's a generally a more powerful way to prove predicates.	2020-10-26 13:48:32 +07:00
Simon Pilgrim	3052e474ec	[InstCombine] matchBSwapOrBitReversem - recognise or(fshl(),fshl()) bswap patterns. I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).	2020-10-25 10:17:45 +00:00
TaWeiTu	65a36bbc3d	[NPM] Port -loop-versioning-licm to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89371	2020-10-24 21:51:18 +08:00
TaWeiTu	060a4fccf1	[LoopVersioning] Form dedicated exits for versioned loop to preserve simplify form The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form. Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89569	2020-10-24 21:40:46 +08:00
Simon Pilgrim	310f62b4ff	[InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155) As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns. Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types. Differential Revision: https://reviews.llvm.org/D89542	2020-10-24 12:42:43 +01:00
Hongtao Yu	a16cbdd676	[AutoFDO] Remove a broken assert in merging inlinee samples Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way. To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like ``` A() { B(); // with nested profile B1 and C1 B(); // duplicated, with nested profile B1 and C1 B(); // with nested profile B2 and C2 } ``` For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into ``` A() { C(); // with nested profile C1 B(); // duplicated, with nested profile B1 and C1 B(); // with nested profile B2 and C2. } ``` Here is what happens next: 1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite. 2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count. 3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on. 4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off. It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes. The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D90056	2020-10-23 17:42:21 -07:00
Arthur Eubanks	baffd052b0	[StructurizeCFG][NewPM] Port -structurizecfg to NPM This doesn't support -structurizecfg-skip-uniform-regions since that would require porting LegacyDivergenceAnalysis. The NPM doesn't support adding a non-analysis pass as a dependency of another, so I had to add -lowerswitch to some tests or pin them to the legacy PM. This is the only RegionPass in tree, so I simply copied the logic for finding all Regions from the legacy PM's RGManager into StructurizeCFG::run(). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89026	2020-10-23 15:54:03 -07:00
Arthur Eubanks	ba22c403b2	[Inliner][NPM] Properly pass callee AAResults Fixes noalias-calls.ll under NPM. Differential Revision: https://reviews.llvm.org/D89592	2020-10-23 15:37:18 -07:00
Artur Pilipenko	6ec2c5e402	GC-parseable element atomic memcpy/memmove This change introduces a GC parseable lowering for element atomic memcpy/memmove intrinsics. This way runtime can provide an implementation which can take a safepoint during copy operation. See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev for the background and details: https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ Differential Revision: https://reviews.llvm.org/D88861	2020-10-23 14:06:09 -07:00
Nick Desaulniers	b7926ce6d7	[IR] add fn attr for no_stack_protector; prevent inlining on mismatch It's currently ambiguous in IR whether the source language explicitly did not want a stack a stack protector (in C, via function attribute no_stack_protector) or doesn't care for any given function. It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an __attribute__((__no_stack_protector__)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u Typically, when inlining a callee into a caller, the caller will be upgraded in its level of stack protection (see adjustCallerSSPLevel()). By adding an explicit attribute in the IR when the function attribute is used in the source language, we can now identify such cases and prevent inlining. Block inlining when the callee and caller differ in the case that one contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`. Fixes pr/47479. Reviewed By: void Differential Revision: https://reviews.llvm.org/D87956	2020-10-23 11:55:39 -07:00
Chen Zheng	1e0b6c1df0	[LSR] ignore profitable chain when reg num is not major cost. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D89665	2020-10-23 09:35:48 -04:00
Simon Pilgrim	1cab3bf004	[InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matching flags. matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.	2020-10-23 12:35:28 +01:00
Simon Pilgrim	19a13bf538	[InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitReverse. NFCI. This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.	2020-10-23 12:35:27 +01:00
OCHyams	fea067bdfd	[mem2reg] Remove dbg.values describing contents of dead allocas This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The motivation and rationale are exactly the same: When mem2reg removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been silently dropped instead of being made `undef`, so we're just returning to previous behaviour with these patches. Testing: `llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added 3 tests, each of which covers a dbg.value deletion path in mem2reg: mem2reg-promote-alloca-1.ll mem2reg-promote-alloca-2.ll mem2reg-promote-alloca-3.ll The first is based on the dexter test inlining.c from D89543. This patch also improves the debugging experience for loop.c from D89543, which suffers similarly after arg promotion instead of inlining.	2020-10-23 04:46:56 +00:00
Caroline Concatto	2415636475	[SVE]Clarify TypeSize comparisons in llvm/lib/Transforms Use isKnownXY comparators when one of the operands can be with scalable vectors or getFixedSize() for all the other cases. This patch also does bug fixes for getPrimitiveSizeInBits by using getFixedSize() near the places with the TypeSize comparison. Differential Revision: https://reviews.llvm.org/D89703	2020-10-23 09:15:17 +01:00
Max Kazantsev	6e574abf61	[SCEV][NFC] Cache symbolic max exit count We want to have a caching version of symbolic BE exit count rather than recompute it every time we need it. Differential Revision: https://reviews.llvm.org/D89954 Reviewed By: nikic, efriedma	2020-10-23 12:29:37 +07:00
Arthur Eubanks	0291e2c933	[Inliner] Run always-inliner in inliner-wrapper An alwaysinline function may not get inlined in inliner-wrapper due to the inlining order. Previously for the following, the inliner would first inline @a() into @b(), ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @a() br label %for.cond } ``` making @b() recursive and unable to be inlined into @a(), ending at ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @b() br label %for.cond } ``` Running always-inliner first makes sure that we respect alwaysinline in more cases. Fixes https://bugs.llvm.org/show_bug.cgi?id=46945. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D86988	2020-10-22 19:16:25 -07:00
Vedant Kumar	099bffe7f7	Revert "[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)" This reverts commit `26ee8aff2b`. It's necessary to insert bitcast the pointer operand of a lifetime marker if it has an opaque pointer type. rdar://70560161	2020-10-22 12:25:50 -07:00
Arthur Eubanks	92d9a3868a	Port -instnamer to NPM Some clang tests use this. Reviewed By: akhuang Differential Revision: https://reviews.llvm.org/D89931	2020-10-22 12:08:36 -07:00
Layton Kifer	d49911c282	[InstCombine][NFC] Use ConstantExpr::getBinOpIdentity Delete duplicate implementation getSelectFoldableConstant and replace with ConstantExpr::getBinOpIdentity. Differential Revision: https://reviews.llvm.org/D89839	2020-10-22 20:44:57 +02:00
Nikita Popov	3e37543111	[MemCpyOpt] Move GEP during call slot optimization When performing a call slot optimization to a GEP destination, it will currently usually fail, because the GEP is directly before the memcpy and as such does not dominate the call. We should move it above the call if that satisfies the domination requirement. I think that a constant-index GEP is the only useful thing to move here, as otherwise isDereferenceablePointer couldn't look through it anyway. As such I'm not trying to generalize this further. Differential Revision: https://reviews.llvm.org/D89623	2020-10-22 20:40:56 +02:00
Ettore Tiotto	e6521ce064	[NFC][PartialInliner]: Clean up code Make member function const where possible, use LLVM_DEBUG to print debug traces rather than a custom option, pass by reference to avoid null checking, ... Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D89895	2020-10-22 14:40:15 -04:00
Vedant Kumar	3419252a79	[InstCombine] Remove dbg.values describing contents of dead allocas When InstCombine removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. This effectively undoes work performed in an InstCombine run earlier in the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values before CallInst users of an alloca. The motivating example looks like: ``` define void @foo(i32 %0) { %a = alloca i32 ; This alloca is erased. store i32 %0, i32* %a dbg.value(i32 %0, "arg0") ; This dbg.value survives. dbg.value(i32* %a, "arg0", DW_OP_deref) call void @trivially_inlinable_no_op(i32* %a) ret void } ``` If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef) after inlining, making "arg0" unavailable. But we already have dbg.value descriptions of the alloca's value (from LowerDbgDeclare), so the DW_OP_deref dbg.value cannot serve its purpose of describing an initialization of the alloca by some callee. It invalidates other useful dbg.values, causing large gaps in location coverage, so we should delete it (even though doing so may cause stale dbg.values to appear, if there's a dead store to `%a` in @trivially_inlinable_no_op). OTOH, it wouldn't be correct to delete all dbg.value descriptions of an alloca. Note that it's possible to describe a variable that takes on different pointer values, e.g.: ``` void use(int ); void t(int a, int b) { int local = &a; // dbg.value(i32* %a.addr, "local") local = &b; // dbg.value(i32* undef, "local") use(&a); // (note: %b.addr is optimized out) local = &a; // dbg.value(i32* %a.addr, "local") } ``` In this example, the alloca for "b" is erased, but we need to describe the value of "local" as <unavailable> before the call to "use". This prevents "local" from appearing to be equal to "&a" at the callsite. rdar://66592859 Differential Revision: https://reviews.llvm.org/D85555	2020-10-22 10:00:13 -07:00
Serguei Katkov	75d0e0cd5f	[IRCE] consolidate profitability check Use BFI if it is available and BPI otherwise. This is a promised follow-up after D89541. Reviewers: ebrevnov, mkazantsev Reviewed By: ebrevnov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D89773	2020-10-22 11:26:45 +07:00
Zequan Wu	2f29341114	Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation"" This reverts commit `716f7636e1`.	2020-10-21 17:08:56 -07:00
Zequan Wu	716f7636e1	Revert "SimplifyCFG: Clean up optforfuzzing implementation" See discussion: https://reviews.llvm.org/D89590 This reverts commit `cdd006eec9`.	2020-10-21 16:56:32 -07:00
Arthur Eubanks	8d9466a385	[BlockExtract][NewPM] Port -extract-blocks to NPM Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89015	2020-10-21 12:51:11 -07:00
Arthur Eubanks	aa6c305344	[LowerMatrixIntrinsics][NewPM] Fix PreservedAnalyses result PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics didn't properly preserve CFG even though it claimed to. The legacy pass says it doesn't. Match the legacy pass's preserved analyses. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89175	2020-10-21 12:42:16 -07:00
Artur Pilipenko	e8cce5ad89	[RS4GC] NFC. Preparatory refactoring to make GC parseable memcpy For GC parseable element atomic memcpy/memmove we'll need to shuffle statepoint arguments. Make it possible by storing the arguments as Value , not Use .	2020-10-21 12:38:20 -07:00
Simon Pilgrim	7b4a828452	[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI.	2020-10-21 11:53:45 +01:00
Florian Hahn	88241ffb56	[Passes] Move ADCE before DSE & LICM. The adjustment seems to have very little impact on optimizations. The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is in consumer-typeset and the size there actually decreases by -0.1%, with not significant changes in the stats. On its own, it is mildly positive in terms of compile-time, most likely due to LICM & DSE having to process slightly less instructions. It should also be unlikely that DSE/LICM make much new code dead. http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions With DSE & MemorySSA, it gives some nice compile-time improvements, due to the fact that DSE can re-use the PDT from ADCE, if it does not make any changes: http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D87322	2020-10-21 10:30:56 +01:00
Martin Storsjö	4de215ff18	Revert "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support" Also revert "[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI." to make the primarily intended revert work. This reverts commits `ce13549761` and `e372a5f86f`. This commit caused failed asserts e.g. like this: $ cat repro.cpp bool a(char b) { return b >= '0' && b <= '9' \|\| (b \| 32) >= 'a' && (b \| 32) <= 'z'; $ clang++ -target x86_64-linux-gnu -c -O2 repro.cpp clang++: ../include/llvm/ADT/APInt.h:1151: bool llvm::APInt::operator==(const llvm::APInt&) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.	2020-10-21 09:47:18 +03:00
Geoffrey Martin-Noble	c17ae2916c	Remove unnecessary header include which violates layering This was introduced in https://reviews.llvm.org/D89774, but I don't think it should be necessary. Reviewed By: TaWeiTu, aeubanks Differential Revision: https://reviews.llvm.org/D89843	2020-10-20 20:14:03 -07:00
Nicolai Hähnle	848a68a032	DomTree: Extract (mostly) read-only logic into type-erased base classes Avoid having to instantiate and compile a subset of the dominator tree logic separately for each node type. More importantly, this allows generic algorithms to be built on top of dominator trees without writing them as templates -- such algorithms can now use opaque CfgBlockRef and CfgInterface instead. A type-erased implementation of dominator trees could be written in terms of CfgInterface as well, but doing so would change the current trade-off: it would slightly reduce code size at the cost of a slight runtime overhead. This patch does not change the trade-off, as it only does type-erasure where basic blocks can be treated in a fully opaque way, i.e. it only moves methods that don't require iteration over CFG successors and predecessors. v5: - rename generic_{begin,end,children} back without the generic_ prefix and refer explictly to base class methods in NewGVN, which wants to mutate the order of dominator tree node children directly v6: - style change: iDom -> idom; it's arguable whether this is really invalid, since it is actually standard camelCase, but clang-tidy complains about it so... shrug - rename {to,from}Generic -> {wrap,unwrap}Ref Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672 Differential Revision: https://reviews.llvm.org/D83089	2020-10-20 19:53:07 +02:00
Ta-Wei Tu	529ecd19df	[NPM] port -unify-loop-exits to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89774	2020-10-20 10:46:57 -07:00
Ta-Wei Tu	59286b36df	[NPM] Port -mergereturn to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89781	2020-10-20 10:33:58 -07:00
Florian Hahn	2e58010208	[DSE] Do not scan users of memory terminators for further reads. isMemTerminator checks if the current def is a memory terminator that terminates the memory pointed to by DefLoc. We do not have to add any of their users to the worklist, because the follow-on users cannot read the memory in question. This leads to more stores eliminated in the presence of lifetime calls. Previously we added the users of those intrinsics to the worklist, limiting elimination. In terms of removed stores, this gives a nice boost on some benchmarks (MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3): Same hash: 205 (filtered out) Remaining: 32 Metric: dse.NumFastStores Program base patch diff test-suite...000/197.parser/197.parser.test 4.00 8.00 100.0% test-suite...rolangs-C++/family/family.test 4.00 7.00 75.0% test-suite...marks/7zip/7zip-benchmark.test 1722.00 2189.00 27.1% test-suite...CFP2000/177.mesa/177.mesa.test 30.00 38.00 26.7% test-suite :: External/Nurbs/nurbs.test 44.00 49.00 11.4% test-suite...lications/sqlite3/sqlite3.test 115.00 128.00 11.3% test-suite...006/447.dealII/447.dealII.test 2715.00 3013.00 11.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 237.00 261.00 10.1% test-suite...tions/lambda-0.1.3/lambda.test 40.00 44.00 10.0% test-suite...3.xalancbmk/483.xalancbmk.test 1366.00 1475.00 8.0% test-suite...abench/jpeg/jpeg-6a/cjpeg.test 13.00 14.00 7.7% test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 46.00 7.0% test-suite...lications/ClamAV/clamscan.test 230.00 246.00 7.0% test-suite...006/450.soplex/450.soplex.test 284.00 299.00 5.3% test-suite...nsumer-jpeg/consumer-jpeg.test 21.00 22.00 4.8%	2020-10-20 16:55:22 +01:00
Simon Pilgrim	ec228fbfc0	[InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI.	2020-10-20 16:45:16 +01:00
Simon Pilgrim	ce13549761	[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI.	2020-10-20 16:26:41 +01:00
Florian Hahn	6439fde6d4	[DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem. This change should currently not have any impact, but guard against further inconsistencies between MemoryLocation and function attributes.	2020-10-20 14:37:53 +01:00
Simon Pilgrim	e372a5f86f	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types	2020-10-20 14:14:26 +01:00
Nicolai Hähnle	c0cdd22c72	Introduce CfgTraits abstraction The CfgTraits abstraction simplfies writing algorithms that are generic over the type of CFG, and enables writing such algorithms as regular non-template code that operates on opaque references to CFG blocks and values. Implementations of CfgTraits provide operations on the concrete CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock `. CfgInterface is an abstract base class which provides operations on opaque types CfgBlockRef and CfgValueRef. Those opaque types encapsulate a `void `, but the meaning depends on the concrete CFG type. For example, MachineCfgTraits -- for use with MachineIR in SSA form -- encodes a Register inside CfgValueRef. Converting between concrete references and opaque/generic ones is done by CfgTraits::{fromGeneric,toGeneric}. Convenience methods CfgTraits::{un}wrap{Iterator,Range} are available as well. Writing algorithms in terms of CfgInterface adds some overhead (virtual method calls, plus in same cases it removes the opportunity to inline iterators), but can be much more convenient since generic algorithms can be written as non-templates. This patch adds implementations of CfgTraits for all CFGs on which dominator trees are calculated, so that the dominator tree can be ported to this machinery. Only IrCfgTraits (LLVM IR) and MachineCfgTraits (Machine IR in SSA form) are complete, the other implementations are limited to the absolute minimum required to make the upcoming dominator tree changes work. v5: - fix MachineCfgTraits::blockdef_iterator and allow it to iterate over the instructions in a bundle - use MachineBasicBlock::printName v6: - implement predecessors/successors for all CfgTraits implementations - fix error in unwrapRange - rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming that is consistent with {wrap,unwrap}{Iterator,Range} - use getVRegDef instead of getUniqueVRegDef v7: - std::forward fix in wrapping_iterator - fix typos v8: - cleanup operators on CfgOpaqueType - address other review comments Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d Differential Revision: https://reviews.llvm.org/D83088	2020-10-20 13:50:52 +02:00
Simon Pilgrim	e346ea9905	[InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI.	2020-10-20 12:13:08 +01:00
Atmn Patel	595c615606	[IR] Adds mustprogress as a LLVM IR attribute This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85393	2020-10-20 03:09:57 -04:00
Serguei Katkov	38799975ce	[IRCE] Do not transform if loop has small number of iterations IRCE has some overhead for runtime checks and in case number of iteration is small the overhead can kill the benefit from optimizations. This CL bases on BlockFrequencyInfo of pre-header and header to estimate the number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop. Probably it is better to make more complex cost model but for simplicity it seems the be enough. The usage of BFI is added only for new pass manager and tries to use it efficiently. Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev Reviewed By: mkazantsev Subscribers: llvm-commits, fhahn Differential Revision: https://reviews.llvm.org/D89541	2020-10-20 10:33:59 +07:00
Jordan Rupprecht	8a377f1e3c	[NFC] Inline assertion-only variable	2020-10-19 15:11:37 -07:00
Roman Lebedev	e0567582b8	[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer The main tricky thing here is forward-declaring the enum: we have to specify it's underlying data type. In particular, this avoids the danger of switching over the SCEVTypes, but actually switching over an integer, and not being notified when some case is not handled. I have updated most of such switches to be exaustive and not have a default case, where it's pretty obvious to be the intent, however not all of them.	2020-10-20 00:10:22 +03:00
Roman Lebedev	3355284b2d	[NFC][SCEVExpander] isHighCostExpansionHelper(): rewrite as a switch If we switch over an enum, compiler can easily issue a diagnostic if some case is not handled. However with an if cascade that isn't so. Experimental evidence suggests new behavior to be superior.	2020-10-20 00:10:22 +03:00
Simon Pilgrim	adb52e5f9e	[InstCombine] foldOrOfICmps - only fold (icmp_eq B, 0) \| (icmp_ult/gt A, B) for integer types Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......	2020-10-19 17:05:38 +01:00
Simon Pilgrim	482e6f0041	Revert rGa704d8238c86bac: "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support" This reverts commit `a704d8238c`. Causing stage2 build failures on some bots.	2020-10-19 16:03:36 +01:00
Simon Pilgrim	de885f1b2a	[InstCombine] Add (icmp ne A, 0) \| (icmp ne B, 0) --> (icmp ne (A\|B), 0) vector support Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.	2020-10-19 15:41:21 +01:00
Simon Pilgrim	ecd25086d1	[InstCombine] Add (icmp eq B, 0) \| (icmp ult/gt A, B) -> (icmp ule A, B-1) vector support	2020-10-19 15:23:48 +01:00
Simon Pilgrim	a704d8238c	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support	2020-10-19 14:55:18 +01:00
Simon Pilgrim	1d90e53044	[InstCombine] foldOrOfICmps - pull out repeated getOperand() calls. NFCI.	2020-10-19 14:28:08 +01:00
Hans Wennborg	0628bea513	Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" This broke Chromium's PGO build, it seems because hot-cold-splitting got turned on unintentionally. See comment on the code review for repro etc. > This patch adds -f[no-]split-cold-code CC1 options to clang. This allows > the splitting pass to be toggled on/off. The current method of passing > `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose > correctly (say, with `-O0` or `-Oz`). > > To implement the -fsplit-cold-code option, an attribute is applied to > functions to indicate that they may be considered for splitting. This > removes some complexity from the old/new PM pipeline builders, and > behaves as expected when LTO is enabled. > > Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> > Differential Revision: https://reviews.llvm.org/D57265 > Reviewed By: Aditya Kumar, Vedant Kumar > Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar This reverts commit `273c299d5d`.	2020-10-19 12:31:14 +02:00
Simon Pilgrim	0b7b446a40	[InstCombine] Support vectors-with-undef in and(logicalshift(1,X),1) --> zext(X == 0) fold	2020-10-19 11:10:32 +01:00
Roman Lebedev	d083d55c2c	[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr All existing SCEV cast types operate on integers. D89456 will add SCEVPtrToIntExpr cast expression type. I believe this is best for consistency. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89455	2020-10-19 10:59:53 +03:00
Florian Hahn	f5cf7f544b	[DSE] Do not consider 'noop' intrinsics as read-clobbers. isNoopIntrinsic returns true for some intrinsics that are modeled in MemorySSA but do not actually read or write any memory and do not block DSE. Such intrinsics should not be considered as read-clobbers.	2020-10-18 15:51:05 +01:00
Dávid Bolvanský	65e94cc946	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-18 01:33:26 +02:00
Dávid Bolvanský	2a75e956e5	Revert "[InferAttrs] Add argmemonly attribute to string libcalls" This reverts commit `b77dd32a6f`. Sanitizer tests are broken.	2020-10-17 23:29:02 +02:00
Dávid Bolvanský	b77dd32a6f	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-17 22:42:36 +02:00
Sanjay Patel	53e92b4c0e	[InstCombine] (~A & B) ^ A -> A \| B Differential Revision: https://reviews.llvm.org/D86395	2020-10-17 12:20:18 -04:00
Nikita Popov	50cc9a0e61	[MemCpyOpt] Extract common function for unwinding check These two cases should be using the same logic. Not NFC, as this resolves the TODO regarding use of the underlying object.	2020-10-17 15:30:39 +02:00
Pedro Tammela	60b19424bb	[NFC] fix some typos in LoopUnrollPass This patch fixes a couple of typos in the LoopUnrollPass.cpp comments Differential Revision: https://reviews.llvm.org/D89603	2020-10-17 14:20:55 +01:00
Juneyoung Lee	62a0ec1612	Add support for !noundef metatdata on loads This patch adds metadata !noundef and makes load instructions can optionally have it. A load with !noundef always return a well-defined value (has no undef bit or isn't poison). If the loaded value isn't well defined, the behavior is undefined. This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values. It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise. The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead. The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89050	2020-10-17 13:50:10 +09:00
Artem Belevich	c36c0fabd1	[VectorCombine] Avoid crossing address space boundaries. We can not bitcast pointers across different address spaces, and VectorCombine should be careful when it attempts to find the original source of the loaded data. Differential Revision: https://reviews.llvm.org/D89577	2020-10-16 13:19:31 -07:00
Benjamin Kramer	b740899c50	[Indvars][NFCI] Simplify assertion. This should be semantically identical. Also avoids unused variable warnings in Release builds.	2020-10-16 19:58:55 +02:00
Matt Arsenault	0a7cd99a70	Reapply "OpaquePtr: Add type to sret attribute" This reverts commit `eb9f7c28e5`. Previously this was incorrectly handling linking of the contained type, so this merges the fixes from D88973.	2020-10-16 11:05:02 -04:00
Simon Pilgrim	83ae625f0c	[InstCombine] visitAnd - pull out repeated I.getType() calls. NFCI.	2020-10-16 15:43:11 +01:00
Simon Pilgrim	253f24cf4c	[InstCombine] Remove custom and(trunc(and(x,c1)),c2) fold This is more correctly handled by canEvaluateTruncated (one use checks etc.) and covers all the tests cases that were added for this fold.	2020-10-16 15:43:10 +01:00
Michael Liao	98f254960f	[globalopt] Teach to look through `addrspacecast`. - so that global variables in numbered address spaces could be properly analyzed. Differential Revision: https://reviews.llvm.org/D89140	2020-10-16 08:43:09 -04:00
Max Kazantsev	0857029011	[Indvars][NFC] Merge two functions together Logic of widenWithVariantUse is split into check and transform part, unlike any other transform in IndVars. We want to pass some extra flags from analysis to transform part and standartize the code at once, so merging them together.	2020-10-16 19:21:57 +07:00
Simon Pilgrim	981fdf01d5	[InstCombine] foldSelectRotate - canonicalize to OR(SHL,LSHR). NFCI. Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15	2020-10-16 13:18:53 +01:00
Max Kazantsev	bb39372e5e	[Indvars][NFCI] Remove meaningless restrictive code in IndVars Variable ExtendOperExpr only exists to check whether it is a SCEV ext. We create it as SCEV ext right here, so semantically this check is trivially true. In theory, it may fail if SCEV is smart enough and can simplify the expression. However, no matter whether it is an ext or not, we never use this fact for further reasoning. So this code is currently useless and in theory may become harmful with SCEV's development. We do not expect any behavior changes with removing it. If it caused negative changes, the patch should be reverted.	2020-10-16 18:04:31 +07:00
Max Kazantsev	0ee0c7dcc3	[Indvars][NFC] Remove duplicating checks Some facts have already been checked in widenWithVariantUse and then checked again in widenWithVariantUseCodegen. The latter is redundant, we can replace it with asserts.	2020-10-16 17:35:14 +07:00
Simon Pilgrim	1cf347e48b	[InstCombine] narrowRotate - minor refactoring for funnel shift support. NFC. Prep work for PR35155 - renamed narrowRotate to narrowFunnelShift, rewrote some comments and adjusted code to collect separate shift values, although we bail if they don't match (still only rotations are only actually folded). I'm trying to match matchFunnelShift as much as possible in case we finally get to merge these one day.	2020-10-16 11:27:28 +01:00
Simon Pilgrim	55991b44b7	[InstCombine] foldAndOrOfICmpsOfAndWithPow2 - add vector support Support vector cases for folding: (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2) (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)	2020-10-16 10:41:40 +01:00
Florian Hahn	51ff04567b	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." After investigation by @asbirlea, the issue that caused the revert appears to be an issue in the original source, rather than a problem with the compiler. This patch enables MemorySSA DSE again. This reverts commit `915310bf14`.	2020-10-16 09:02:53 +01:00
Vedant Kumar	273c299d5d	[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting This patch adds -f[no-]split-cold-code CC1 options to clang. This allows the splitting pass to be toggled on/off. The current method of passing `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose correctly (say, with `-O0` or `-Oz`). To implement the -fsplit-cold-code option, an attribute is applied to functions to indicate that they may be considered for splitting. This removes some complexity from the old/new PM pipeline builders, and behaves as expected when LTO is enabled. Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> Differential Revision: https://reviews.llvm.org/D57265 Reviewed By: Aditya Kumar, Vedant Kumar Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar	2020-10-15 23:13:33 +00:00
Florian Hahn	89c0124273	[LoopVersion] Unify SCEVChecks and alias check handling (NFC). This is an initial cleanup of the way LoopVersioning interacts with LAA. Currently LoopVersioning has 2 ways of initializing things: 1. Passing LAI and passing UseLAIChecks = true 2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and setAliasChecks. Both ways of initializing lead to the same result and the duplication seems more complicated than necessary. This patch removes the UseLAIChecks flag from the constructor and the setSCEVChecks & setAliasChecks helpers and move initialization exclusively to the constructor. This simplifies things, by providing a single way to initialize LoopVersioning and reducing duplication. Reviewed By: Meinersbur, lebedev.ri Differential Revision: https://reviews.llvm.org/D84406	2020-10-15 22:02:17 +01:00
David Green	13ec3dd66f	[LV] Add a getRecurrenceBinOp and make use of it. NFC	2020-10-15 18:21:41 +01:00
Hiroshi Yamauchi	1ebee7adf8	[PGO] Remove the old memop value profiling buckets. Following up D81682 and D83903, remove the code for the old value profiling buckets, which have been replaced with the new, extended buckets and disabled by default. Also syncing InstrProfData.inc between compiler-rt and llvm. Differential Revision: https://reviews.llvm.org/D88838	2020-10-15 10:09:49 -07:00
Simon Pilgrim	23f1616626	[InstCombine] Use m_SpecificInt instead of m_APInt + comparison. NFCI.	2020-10-15 16:06:27 +01:00
Simon Pilgrim	b3330ae42c	[InstCombine] SimplifyDemandedUseBits - xor - refactor cast<ConstantInt> usage to PatternMatch. NFCI. First step towards replacing these to add full vector support.	2020-10-15 16:06:23 +01:00
Simon Pilgrim	2b45639ea0	[InstCombine] InstCombineAndOrXor - refactor cast<ConstantInt> usages to PatternMatch. NFCI. First step towards replacing these to add full vector support.	2020-10-15 16:06:17 +01:00
Simon Pilgrim	09be7623e4	[InstCombine] visitXor - refactor ((X^C1)>>C2)^C3 -> (X>>C2)^((C1>>C2)^C3) fold. NFCI. This is still ConstantInt-only (scalar) but is refactored to use PatternMatch to make adding vector support in the future relatively trivial.	2020-10-15 14:38:15 +01:00
Simon Pilgrim	fadd152317	[AggressiveInstCombine] foldAnyOrAllBitsSet - add uniform vector support Replace m_ConstantInt with m_APInt to support uniform vectors (with no undef elements) Adding non-undef support would involve some refactoring of the MaskOps struct but this might still be worth it.	2020-10-15 11:02:35 +01:00
Simon Pilgrim	60ba9233d1	Revert rG25a97c3a43d7 - "[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts" This reverts commit `25a97c3a43`. We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent. If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.	2020-10-14 18:14:37 +01:00
Matt Arsenault	6a9484f4bf	InstCombine: Fix losing load properties in copy-constant-to-alloca Preserve the alignment and metadata. Atomic loads are skipped for this, but pass along the properties for consistency.	2020-10-14 12:55:25 -04:00
Matt Arsenault	6da31fa4a6	InstCombine: Fix infinite loop in copy-constant-to-alloca transform This was broken by `16295d521e`, when instructions started being handled and not just constant expressions. This was re-inserting an equivalent bitcast to the original memcpy operand, which made a non-functional IR change on every iteration. This also fixes a secondary problem where it was inserting addrspacecasts which may not have been legal (i.e. it changed the source address space). Start visiting all pointer users and fail out if we can't process them. Also start handling the relevant memory intrinsic users. These cases can be dealt with by running InferAddressSpaces separately.	2020-10-14 12:55:25 -04:00
Florian Hahn	93f6c6b79c	Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." This reverts the revert commit `710aceb645` and includes a fix for a memsan failure. Original message: This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible.	2020-10-14 17:41:23 +01:00
Simon Pilgrim	89657b3a3b	[InstCombine] narrowRotate - canonicalize to OR(SHL,LSHR). NFCI. Match the canonicalization code that was added to matchFunnelShift at rG02295e6d1a15	2020-10-14 16:45:00 +01:00
Simon Pilgrim	89a2a47870	[InstCombine] Add m_SpecificIntAllowUndef pattern matcher m_SpecificInt doesn't accept undef elements in a vector splat value - tweak specific_intval to optionally allow undefs and add the m_SpecificIntAllowUndef variants. Allows us to remove the m_APIntAllowUndef + comparison hack inside matchFunnelShift	2020-10-14 16:15:53 +01:00
Simon Pilgrim	25a97c3a43	[InstCombine] visitCallInst - retain undefs in vector funnel shift amounts By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns. Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.	2020-10-14 14:38:21 +01:00
Roman Lebedev	7ee6c40247	Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups While we haven't encountered an earth-shattering problem with this yet, by now it is pretty evident that trying to model the ptr->int cast implicitly leads to having to update every single place that assumed no such cast could be needed. That is of course the wrong approach. Let's back this out, and re-attempt with some another approach, possibly one originally suggested by Eli Friedman in https://bugs.llvm.org/show_bug.cgi?id=46786#c20 which should hopefully spare us this pain and more. This reverts commits `1fb6104293`, `7324616660`, `aaafe350bb`, `e92a8e0c74`. I've kept&improved the tests though.	2020-10-14 16:09:18 +03:00
Juneyoung Lee	9b3c2a72e4	[ValueTracking] Use assume's noundef operand bundle This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89219	2020-10-14 20:16:33 +09:00
Evgeniy Brevnov	d0c95808e5	[LV] Unroll factor is expected to be > 0 LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87679	2020-10-14 16:48:17 +07:00
Simon Pilgrim	1e4d882f9a	[InstCombine] matchFunnelShift - add support for non-uniform vectors containing undefs. Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result. The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.	2020-10-14 10:42:27 +01:00
sstefan1	ce16be253c	[Attributor][NFC] Make `createShallowWrapper()` available outside of Attributor D85703 will need to create shallow wrappers in order to track the spmd icv. We need to make it available. Differential Revision: https://reviews.llvm.org/D89342	2020-10-14 10:08:59 +02:00
Arthur Eubanks	518ec05a10	[LoopExtract][NewPM] Port -loop-extract to NPM -loop-extract-single is just -loop-extract on one loop. -loop-extract depended on -break-crit-edges and -loop-simplify in the legacy PM, but the NPM doesn't allow specifying pass dependencies like that, so manually add those passes to the RUN lines where necessary. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89016	2020-10-13 22:55:42 -07:00
Nikita Popov	3b31f05372	[LICM] Don't require AST in LoopPromoter (NFC) While promotion currently always has an AST available, it is only relevant for invalidation purposes in LoopPromoter, so we do not need to have it as a hard dependency.	2020-10-13 22:08:49 +02:00
Nikita Popov	cd6f40f432	[MemCpyOpt] Add test scaffolding for MSSA based MemCpyOpt This adds an -enable-memcpyopt-memoryssa option that currently does nothing apart from requiring MSSA as a dependency. The tests are split to run both with the option disabled and enabled. I went with this rather than the separate directory DSE uses, as I found it convenient to have a direct side-by-side comparison of differences. Differential Revision: https://reviews.llvm.org/D89206	2020-10-13 21:45:05 +02:00
Nikita Popov	e79ca751fc	[MemCpyOpt] Fix MemorySSA preservation moveUp() moves instructions, so we should move the corresponding memory accesses as well. We should also move the store instruction itself: Even though we'll end up removing it later, this gives us a correct MemoryDef to replace. The implementation is somewhat more complicated than it should be, because we also handle the case where P does not have a memory access due to a degnerate AA pipeline. Hopefully, the need for this will go away in the future, when the rest of the pass is based on MSSA. Differential Revision: https://reviews.llvm.org/D88778	2020-10-13 21:39:09 +02:00
Nikita Popov	baa3b87015	[MemCpyOpt] Don't shorten memset if memcpy operands may be the same If the memcpy operands are the same (which is allowed since D86815) then the memcpy is effectively a no-op and the partially overlapping memset is not dead. Differential Revision: https://reviews.llvm.org/D89192	2020-10-13 21:19:19 +02:00
Nikita Popov	39c39e8a7f	[MemCpyOpt] Don't shorten memset if destination observable through unwinding MemCpyOpt can shorten a memset if it is later partially overwritten by a memcpy. It checks that the destination is not read in between, but we also need to make sure that the destination cannot be observed via unwinding. Differential Revision: https://reviews.llvm.org/D89190	2020-10-13 21:12:19 +02:00
Xun Li	0ccf9263cc	[ASAN] Make sure we are only processing lifetime markers with offset 0 to alloca This patch addresses https://bugs.llvm.org/show_bug.cgi?id=47787 (and hence https://bugs.llvm.org/show_bug.cgi?id=47767 as well). In latter instrumentation code, we always use the beginning of the alloca as the base for instrumentation, ignoring any offset into the alloca. Because of that, we should only instrument a lifetime marker if it's actually pointing to the beginning of the alloca. Differential Revision: https://reviews.llvm.org/D89191	2020-10-13 10:21:45 -07:00
Nikita Popov	6713332fdd	[LoopVersioningLICM] Fix noalias metadata emission The previous code added the scope on each iteration, so that the same scope was represented many times in the same !noalias metadata. That's legal, and semantically equivalent to only storing the scope once, but it's also wasteful and may pessimize further optimization if AATags get intersected naively, as done by the AliasSetTracker.	2020-10-13 18:58:05 +02:00
Simon Pilgrim	9c3138bd6d	[InstCombine] visitTrunc - pass through undefs for trunc(shift(trunc/ext(x),c)) patterns Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons. I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant. Differential Revision: https://reviews.llvm.org/D88687	2020-10-13 14:35:18 +01:00
Vitaly Buka	710aceb645	Revert "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." It introduced a memory leak. This reverts commit `525b085a65`.	2020-10-13 03:14:08 -07:00
Simon Pilgrim	5df61724a1	[InstCombine] Support uniform vector splats in ((((X >> C) & CC) + Y) << C) folds. Add support for uniform vector splats (no undefs).	2020-10-13 09:28:39 +01:00
Roman Lebedev	1fb6104293	Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown" This relands commit `1c021c64ca` which was reverted in commit `17cec6a11a` because an assertion was being triggered, since `BuildConstantFromSCEV()` wasn't updated to handle the case where the constant we want to truncate is actually a pointer. I was unsuccessful in coming up with a test case where we'd end there with constant zext/sext of a pointer, so i didn't handle those cases there until there is a test case. Original commit message: While we indeed can't treat them as no-ops, i believe we can/should do better than just modelling them as `unknown`. `inttoptr` story is complicated, but for `ptrtoint`, it seems straight-forward to model it just as a zext-or-trunc of unknown. This may be important now that we track towards making inttoptr/ptrtoint casts not no-op, and towards preventing folding them into loads/etc (see D88979/D88789/D88788) Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88806	2020-10-12 23:02:55 +03:00
Simon Pilgrim	4ff7136268	[InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI. There's no need to create constant vector splats manually - missed this one in rG24dd0cd1edd5	2020-10-12 18:39:45 +01:00
Simon Pilgrim	24dd0cd1ed	[InstCombine] FoldShiftByConstant - create Scalar/Vector constant with ConstantInt::get(). NFCI. There's no need to create constant vector splats manually.	2020-10-12 18:17:20 +01:00
Simon Pilgrim	2de368f6a7	[InstCombine] FoldShiftByConstant - merge equivalent types. NFCI. Consistently use the original shift instruction's Type/BitWidth instead of the operands, casted values etc.	2020-10-12 18:17:19 +01:00
Florian Hahn	525b085a65	[VPlan] Use VPValue def for VPMemoryInstructionRecipe. This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84680	2020-10-12 18:02:33 +01:00
Hans Wennborg	17cec6a11a	Revert `1c021c64c` "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown" > While we indeed can't treat them as no-ops, i believe we can/should > do better than just modelling them as `unknown`. `inttoptr` story > is complicated, but for `ptrtoint`, it seems straight-forward > to model it just as a zext-or-trunc of unknown. > > This may be important now that we track towards > making inttoptr/ptrtoint casts not no-op, > and towards preventing folding them into loads/etc > (see D88979/D88789/D88788) > > Reviewed By: mkazantsev > > Differential Revision: https://reviews.llvm.org/D88806 It caused the following assert during Chromium builds: llvm/lib/IR/Constants.cpp:1868: static llvm::Constant llvm::ConstantExpr::getTrunc(llvm::Constant , llvm::Type *, bool): Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed. See code review for a link to a reproducer. This reverts commit `1c021c64ca`.	2020-10-12 18:39:35 +02:00
Florian Hahn	ea058d289c	[VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe. Now that operands of the recipe are managed through VPUser, we can simplify the printing by just using the operands.	2020-10-12 16:51:54 +01:00
Florian Hahn	ad5541045a	[LoopDeletion] Remove over-eager SCEV verification. `60b852092c` introduced SCEV verification to deleteDeadLoop, but it appears this check is currently a bit over-eager and some users of deleteDeadLoop appear to only patch up SE after calling it (e.g. PR47753). Remove the extra check for now. We can consider adding it back after we tracked down the source of the inconsistency for PR47753.	2020-10-12 16:18:30 +01:00
Simon Pilgrim	bbf3925879	[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED) If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139). Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values. Differential Revision: https://reviews.llvm.org/D88783	2020-10-12 16:06:41 +01:00
Simon Pilgrim	fa56623370	[InstCombine] matchFunnelShift - remove shift value commutation. NFCI. After rG02295e6d1a15 we no longer need to invert the shift values for fshr - this is just hidden at the moment as funnel shifts only ever match for constant values so never use the fshr "Sub on SHL" path.	2020-10-12 15:55:18 +01:00
Simon Pilgrim	02295e6d1a	[InstCombine] matchFunnelShift - canonicalize to OR(SHL,LSHR). NFCI. Simplify the shift amount matching code by canonicalizing the shift ops first.	2020-10-12 15:10:59 +01:00
Simon Pilgrim	45d785e22b	Revert rGb97093e520036f8 - "[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw" This reverts commit `b97093e520`. Funnel shift argument commutation isn't working correctly	2020-10-12 11:38:52 +01:00
Roman Lebedev	1c021c64ca	[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown While we indeed can't treat them as no-ops, i believe we can/should do better than just modelling them as `unknown`. `inttoptr` story is complicated, but for `ptrtoint`, it seems straight-forward to model it just as a zext-or-trunc of unknown. This may be important now that we track towards making inttoptr/ptrtoint casts not no-op, and towards preventing folding them into loads/etc (see D88979/D88789/D88788) Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88806	2020-10-12 11:04:03 +03:00
David Sherwood	c5ba0d33cc	[SVE] Make ElementCount and TypeSize use a new PolySize class I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count this is just an unsigned value. The ElementCount class is now just a simple derivation of PolySize<unsigned>, whereas TypeSize is more complicated because it still needs to contain the uint64_t cast operator, since there are still many places in the code that rely upon this implicit cast. As such the class also still needs some of it's own operators. I've tried to minimise the amount of code in the base PolySize class, which led to a couple of changes: 1. In some places we were relying on '==' operator comparisons between ElementCounts and the scalar value 1. I didn't put this operator in the new PolySize class, and thought it was actually clearer to use the isScalar() function instead. 2. I removed the isByteSized function and replaced it with calls to isKnownMultipleOf(8). I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so that it's more consistent with coefficientDivideBy. Differential Revision: https://reviews.llvm.org/D88409	2020-10-12 08:23:38 +01:00
Roman Lebedev	544a6aa267	[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load And another step towards transforms not introducing inttoptr and/or ptrtoint casts that weren't there already. As we've been establishing (see D88788/D88789), if there is a int<->ptr cast, it basically must stay as-is, we can't do much with it. I've looked, and the most source of new such casts being introduces, as far as i can tell, is this transform, which, ironically, tries to reduce count of casts.. On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in -33.58% less `IntToPtr`s (19014 -> 12629) and +76.20% more `PtrToInt`s (18589 -> 32753), which is an increase of +20.69% in total. However just on RawSpeed, where i know there are basically none `IntToPtr` in the original source code, this results in -99.27% less `IntToPtr`s (2724 -> 20) and +82.92% more `PtrToInt`s (4513 -> 8255). which is again an increase of 14.34% in total. To me this does seem like the step in the right direction, we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`, which seems like a reasonable trade-off. See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995 for some more discussion on the subject. (Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair` should be taught about this, yes) Reviewed By: nlopes, nikic Differential Revision: https://reviews.llvm.org/D88979	2020-10-11 20:24:28 +03:00
David Green	be6e8e50f4	[LV] Tail folded inloop reductions. This expands upon the inloop reductions added in e9761688e41cb9e976, allowing them to be inserted into tail folded loops. Reductions are generates with the form: x = select(mask, vecop, zero) v = vecreduce.add(x) c = add chain, v Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction. Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order. The order they are added is altered to be after any phis, keeping the requirements for the underlying IR. Differential Revision: https://reviews.llvm.org/D84451	2020-10-11 16:58:34 +01:00
Sanjay Patel	3f3356bdd9	[InstCombine] allow vector splats for add+xor --> shifts	2020-10-11 09:04:24 -04:00
Sanjay Patel	f81200ae99	[InstCombine] add one-use check to add+xor transform As shown in the affected test, we could increase instruction count without this limitation. There's another test with extra use that shows we still convert directly to a real "sext" if possible.	2020-10-11 09:04:24 -04:00
Simon Pilgrim	b97093e520	[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139). Differential Revision: https://reviews.llvm.org/D88783	2020-10-11 10:37:20 +01:00
Simon Pilgrim	b752daa26b	[InstCombine] Replace getLogBase2 internal helper with ConstantExpr::getExactLogBase2. NFCI. This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to. The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.	2020-10-11 10:31:17 +01:00
Xun Li	667dfe39ca	[Coroutines] Refactor/Rewrite Spill and Alloca processing This patch is a refactoring of how we process spills and allocas during CoroSplit. In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas. And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points. This approach is fundamentally confusing, and unfortunately, incorrect. First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately. Doing so simplify lots of code and makes the logic more clear and easier to reason about. Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap. There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship. For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call. Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!). All these issues suggest that we need to separate spill and alloca in order to properly implement this. This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch. The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap). Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index. In doing so, I also cleaned up a few things and also discovered a few other bugs. Cleanups: 1. Found out that PromiseFieldId is not used, delete it. 2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users. 3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap. 4. All the loops that process Spills are simplified now because we use a map instead of a vector. Bugs: It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function. The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame. Differential Revision: https://reviews.llvm.org/D88872	2020-10-10 22:21:34 -07:00
Simon Pilgrim	702ccb40e2	[InstCombine] getLogBase2(undef) -> 0. Move the undef element handling into the getLogBase2 helper instead of pre-empting with replaceUndefsWith.	2020-10-10 20:29:03 +01:00
Simon Pilgrim	3aab3cbd4a	[InstCombine] getLogBase2 - no need to specify Type. NFCI. In all the getLogBase2 uses, the specified Type is always the same as the constant being folded.	2020-10-10 20:09:55 +01:00
Nikita Popov	5e855f1e80	[MemCpyOpt] Don't hoist store that's not guaranteed to execute MemCpyOpt can hoist stores while load+store pairs into memcpy. This hoisting can currently result in stores being executed that weren't guaranteed to execute in the original problem. Differential Revision: https://reviews.llvm.org/D89154	2020-10-10 10:26:28 +02:00
Changpeng Fang	f192a27ed3	Sink: Handle instruction sink when a user is dead Summary: The current instruction sink pass uses findNearestCommonDominator of all users to find block to sink the instruction to. However, a user may be in a dead block, which will result in unexpected behavior. This patch handles such cases by skipping dead blocks. This patch fixes: https://bugs.llvm.org/show_bug.cgi?id=47415 Reviewers: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D89166	2020-10-09 16:20:26 -07:00
Eli Friedman	278299b0f0	[SCCP] Reduce the number of times ResolvedUndefsIn is called for large modules. If a module has many values that need to be resolved by ResolvedUndefsIn, compilation takes quadratic time overall. Solve should do a small amount of work, since not much is added to the worklists each time markOverdefined is called. But ResolvedUndefsIn is linear over the length of the function/module, so resolving one undef at a time is quadratic in general. To solve this, make ResolvedUndefsIn resolve every undef value at once, instead of resolving them one at a time. This loses a little optimization power, but can be a lot faster. We still need a loop around ResolvedUndefsIn because markOverdefined could change the set of blocks that are live. That should be uncommon, hopefully. We could optimize it by tracking which blocks transition from dead to live, instead of iterating over the whole module to find them. But I'll leave that for later. (The whole function will become a lot simpler once we start pruning branches on undef.) The regression test changes seem minor. The specific cases in question could probably be optimized with a bit more work, but they seem like edge cases that don't really matter. Fixes an "infinite" compile issue my team found on an internal workoad. Differential Revision: https://reviews.llvm.org/D89080	2020-10-09 15:24:16 -07:00
Giorgis Georgakoudis	3a6bfcf2f9	[OpenMPOpt] Merge parallel regions There are cases that generated OpenMP code consists of multiple, consecutive OpenMP parallel regions, either due to high-level programming models, such as RAJA, Kokkos, lowering to OpenMP code, or simply because the programmer parallelized code this way. This optimization merges consecutive parallel OpenMP regions to: (1) reduce the runtime overhead of re-activating a team of threads; (2) enlarge the scope for other OpenMP optimizations, e.g., runtime call deduplication and synchronization elimination. This implementation defensively merges parallel regions, only when they are within the same BB and any in-between instructions are safe to execute in parallel. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83635	2020-10-09 09:59:04 -07:00
Arthur Eubanks	0689dab844	[FixIrreducible][NewPM] Port -fix-irreducible to NPM In the NPM, a pass cannot depend on another non-analysis pass. So pin the test that tests that -lowerswitch is run automatically to legacy PM. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D89051	2020-10-09 09:22:09 -07:00
Arthur Eubanks	9c21c6c966	[LoopInterchange][NewPM] Port -loop-interchange to NPM Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D89058	2020-10-09 09:21:31 -07:00
Simon Pilgrim	8a836daaa9	[InstCombine] Support lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) uniform vector tests FoldShiftByConstant is hardcoded for scalar/uniform outer shift amounts atm so that needs to be fixed first to support non-uniform cases	2020-10-09 16:54:46 +01:00
Simon Pilgrim	1c040a3e56	[InstCombine] commonShiftTransforms - add support for pow2 nonuniform constant vectors in srem fold Note: we already fold srem to undef if any denominator vector element is undef.	2020-10-09 15:59:33 +01:00
Sanjay Patel	080e6bc205	[InstCombine] allow vector splats for add+and with high-mask There might be a better way to specify the pre-conditions, but this is hopefully clearer than the way it was written: https://rise4fun.com/Alive/Jhk3 Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2) %a = and %x, C2 %r = add %a, C1 => %a2 = add %x, C1 %r = and %a2, C2	2020-10-09 10:39:11 -04:00
Simon Pilgrim	9e796d5e71	[InstCombine] foldShiftOfShiftedLogic - add support for nonuniform constant vectors	2020-10-09 14:25:12 +01:00
Simon Pilgrim	556316cf72	[InstCombine] foldShiftOfShiftedLogic - replace cast<BinaryOperator> with m_BinOp matcher. NFCI. Allows us to drop the !isa<ConstantExpr> check.	2020-10-09 14:10:12 +01:00
Max Kazantsev	225df71951	[NFC] Add option to disable IV widening if needed IV widening is sometimes a strictly harmful transform (some examples of this are shown in tests 11, 12 in widen-loop-comp.ll). One of the reasons of this is that sometimes SCEV fails to prove some facts after part of guards has been widened. Though each single such case looks like a bug that can be addressed, it seems that disabling of IV widening may be profitable in some cases. We want to have an option to do so. By default, existing behavior is preserved and IV widening is on.	2020-10-09 18:32:03 +07:00
Simon Pilgrim	d9f064dc0b	[InstCombine] visitTrunc - trunc(shl(X, C)) --> shl(trunc(X),trunc(C)) vector support Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp). Differential Revision: https://reviews.llvm.org/D89067	2020-10-08 22:07:51 +01:00
Sanjay Patel	f688ae7a0e	[InstCombine] allow vector splats for add+xor with low-mask This can be allowed with undef elements too, but that can be another step: https://alive2.llvm.org/ce/z/hnC4Z-	2020-10-08 15:53:38 -04:00
Simon Pilgrim	6aa10ae5bf	[Transforms] visitCmpBlock - don't dereference a dyn_cast<>. NFCI. Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail. Prevents clang static analyzer warning that we could deference a null pointer.	2020-10-08 20:18:32 +01:00
Sanjay Patel	5ac89add1e	[InstCombine] remove unnecessary one-use check from add-xor transform Pre-conditions seem to be optimal, but we don't need a use check because we are only replacing an add with a sub. https://rise4fun.com/Alive/hzN Pre: (~C1 \| C2 == -1) && isPowerOf2(C2+1) %m = and i8 %x, C1 %f = xor i8 %m, C2 %r = add i8 %f, C3 => %r = sub i8 C2 + C3, %m	2020-10-08 15:08:51 -04:00
Simon Pilgrim	0716805c02	[SLP] optimizeGatherSequence - assert every Instruction in the worklist is non-null. Fixes clang static analyzer warning.	2020-10-08 20:02:18 +01:00
Simon Pilgrim	8f0658ae67	[Transforms] CodeExtractor::verifyAssumptionCache - don't dereference a dyn_cast<>. NFCI. Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail. Prevents clang static analyzer warning that we could deference a null pointer.	2020-10-08 19:04:30 +01:00
Sanjay Patel	b57451b011	[InstCombine] allow vector splats for add+xor with signmask	2020-10-08 10:46:34 -04:00
Simon Pilgrim	5415fef3ab	[InstCombine] matchFunnelShift - support non-uniform constant vector shift amounts (PR46895) Complete basic PR46895 fixes by refactoring D87452/D88402 to allow us to match non-uniform constant values. We still don't handle non-uniform vectors that contain undef elements, but that can wait until we have a decent generic mechanism for this. Differential Revision: https://reviews.llvm.org/D88420	2020-10-08 12:56:27 +01:00
Markus Lavin	06758c6a61	[DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before transformation compute SCEV for each @llvm.dbg.value in the loop body and store it (along side its current DIExpression). After transformation update those @llvm.dbg.value now referencing undef by comparing its stored SCEV to the SCEV of the current loop-header PHI-nodes. Allow match with offset by inserting compensation code in the DIExpression. Includes fix for the nullptr deref that caused the original commit to be reverted in `9d63029770`. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-10-08 13:16:43 +02:00
Simon Pilgrim	e1d4ca0009	[InstCombine] matchRotate - add support for matching general funnel shifts with constant shift amounts (PR46896) First step towards extending the existing rotation support to full funnel shift handling now that the backend legalization support has improved. This enables us to match the shift by constant cases, which are pretty trivial to expand again if necessary. D88420 will add non-uniform support for funnel shifts as well once its been finalized. Differential Revision: https://reviews.llvm.org/D88834	2020-10-08 11:05:14 +01:00
Simon Pilgrim	aa47962cc9	[InstCombine] canNarrowShiftAmt - replace custom Constant matching with m_SpecificInt_ICMP The existing code ignores undef values which matches m_SpecificInt_ICMP, although m_SpecificInt_ICMP returns false for an all-undef constant, I've added test coverage at rGfe0197e194a64f9 to show that undef folding should already have dealt with that case.	2020-10-08 10:53:32 +01:00
David Green	498f89d188	[LV] Collect dead induction truncates We currently collect the ICmp and Add from an induction variable, marking them as dead so that vplan values are not created for them. This extends that to include any single use trunk from the ICmp, which allows the Add to more readily be removed too. This can help with costing vplan nodes, as the ICmp and Add are more reliably removed and are not double-counted. Differential Revision: https://reviews.llvm.org/D88873	2020-10-08 08:28:58 +01:00
Reid Kleckner	940d7aaea9	Port StripGCRelocates pass to NPM Fixes one test under NPM Differential Revision: https://reviews.llvm.org/D88766	2020-10-07 14:41:29 -07:00
Reid Kleckner	da48fe1732	[NPM] Port strip nonlinetable debuginfo pass to the new pass manager Fixes a few tests in llvm/test/Transforms/Utils. Differential Revision: https://reviews.llvm.org/D88762	2020-10-07 14:35:36 -07:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Roman Lebedev	fed0f890e5	InstCombine: Negator: don't rely on complexity sorting already being performed (PR47752) In some cases, we can negate instruction if only one of it's operands negates. Previously, we assumed that constants would have been canonicalized to RHS already, but that isn't guaranteed to happen, because of InstCombine worklist visitation order, as the added test (previously-hanging) shows. So if we only need to negate a single operand, we should ensure ourselves that we try constant operand first. Do that by re-doing the complexity sorting ourselves, when we actually care about it. Fixes https://bugs.llvm.org/show_bug.cgi?id=47752	2020-10-07 15:09:50 +03:00
Max Kazantsev	fba42aea43	[NFC] Use getZero instead of getConstant(0)	2020-10-07 13:53:36 +07:00
Roman Lebedev	7fa503ef4a	[SROA] rewritePartition()/findCommonType(): if uses have conflicting type, try getTypePartition() before falling back to largest integral use type (PR47592) And another step towards transformss not introducing inttoptr and/or ptrtoint casts that weren't there already. In this case, when load/store uses have conflicting types, instead of falling back to the iN, we can try to use allocated sub-type. As disscussed, this isn't the best idea overall (we shouldn't rely on allocated type), but it works fine as a temporary measure. I've measured, and @ `-O3` as of vanilla llvm test-suite + RawSpeed, this results in +0.05% more bitcasts, -5.51% less inttoptr and -1.05% less ptrtoint (at the end of middle-end opt pipeline) See https://bugs.llvm.org/show_bug.cgi?id=47592 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D88788	2020-10-07 09:20:19 +03:00
Johannes Doerfert	7993d61177	[Attributor] Use smarter way to determine alignment of GEPs Use same logic existing in other places to deal with base case GEPs. Add the original Attributor talk example.	2020-10-06 19:31:08 -05:00
Johannes Doerfert	c4cfe7a435	[Attributor] Ignore read accesses to constant memory The old function attribute deduction pass ignores reads of constant memory and we need to copy this behavior to replace the pass completely. First step are constant globals. TBAA can also describe constant accesses and there are other possibilities. We might want to consider asking the alias analyses that are available but for now this is simpler and cheaper.	2020-10-06 19:31:07 -05:00
Johannes Doerfert	3f540c05df	[Attributor] Give up early on AANoReturn::initialize If the function is not assumed `noreturn` we should not wait for an update to mark the call site as "may-return". This has two kinds of consequences: - We have less iterations in many tests. - We have less deductions based on "known information" (since we ask earlier, point 1, and therefore assumed information is not "known" yet). The latter is an artifact that we might want to tackle properly at some point but which is not easily fixable right now.	2020-10-06 19:31:07 -05:00
Nikita Popov	616f545048	[MemCpyOpt] Use dereferenceable pointer helper The call slot optimization has some home-grown code for checking whether the destination is dereferenceable. Replace this with the generic isDereferenceableAndAlignedPointer() helper. I'm not checking alignment here, because that is currently handled separately and may be an enforced alignment for allocas. The clean way of integrating that part would probably be to accept a callback in isDereferenceableAndAlignedPointer() for the actual isAligned check, which would then have a chance to use an enforced alignment instead. This allows the destination to be a GEP (among other things), though the two open TODOs may prevent it from working in practice. Differential Revision: https://reviews.llvm.org/D88805	2020-10-06 18:41:19 +02:00
Nikita Popov	6b441ca523	[MemCpyOpt] Check for throwing calls during call slot optimization When performing call slot optimization for a non-local destination, we need to check whether there may be throwing calls between the call and the copy. Otherwise, the early write to the destination may be observable by the caller. This was already done for call slot optimization of load/store, but not for memcpys. For the sake of clarity, I'm moving this check into the common optimization function, even if that does need an additional instruction scan for the load/store case. As efriedma pointed out, this check is not sufficient due to potential accesses from another thread. This case is left as a TODO. Differential Revision: https://reviews.llvm.org/D88799	2020-10-06 18:24:40 +02:00
Nikita Popov	80cde02e85	[MemCpyOpt] Add separate statistic for call slot optimization (NFC)	2020-10-06 18:14:10 +02:00
Dávid Bolvanský	86429c4eaf	[SimplifyLibCalls] Optimize mempcpy_chk to mempcpy	2020-10-06 17:08:46 +02:00
Johannes Doerfert	4a7a988442	[Attributor][FIX] Move assertion to make it not trivially fail The idea of this assertion was to check the simplified value before we assign it, not after, which caused this to trivially fail all the time.	2020-10-06 09:32:18 -05:00
Johannes Doerfert	04f6951397	[Attributor][FIX] Dead return values are not `noundef` When we assume a return value is dead we might still visit return instructions via `Attributor::checkForAllReturnedValuesAndReturnInsts(..)`. When we do so the "returned value" is potentially simplified to `undef` as it is the assumed "returned value". This is a problem if there was a preexisting `noundef` attribute that will only be removed as we manifest the `undef` return value. We should not use this combination to derive `unreachable` though. Two test cases fixed.	2020-10-06 09:32:18 -05:00
Johannes Doerfert	957094e31b	[Attributor][NFC] Ignore benign uses in AAMemoryBehaviorFloating In AAMemoryBehaviorFloating we used to track benign uses in a SetVector. With this change we look through benign uses eagerly to reduce the number of elements (=Uses) we look at during an update. The test does actually not fail prior to this commit but I already wrote it so I kept it.	2020-10-06 09:32:18 -05:00
Simon Pilgrim	17b9a91ec2	[InstCombine] canRewriteGEPAsOffset - don't dereference a dyn_cast<>. NFCI. We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly. Prevents clang static analyzer warning that we could deference a null pointer.	2020-10-06 14:48:34 +01:00
Simon Pilgrim	75d33a3a97	[InstCombine] FoldShiftByConstant - consistently use ConstantExpr in logicalshift(trunc(shift(x,c1)),c2) fold. NFCI. This still only gets used for scalar types but now always uses ConstantExpr in preparation for vector support - it was using APInt methods in some places.	2020-10-06 14:48:34 +01:00
Simon Pilgrim	21100f885d	[InstCombine] FoldShiftByConstant - use PatternMatch for logicalshift(trunc(shift(x,c1)),c2) fold. NFCI.	2020-10-06 13:13:08 +01:00
Simon Pilgrim	0b402e985e	[InstCombine] FoldShiftByConstant - remove unnecessary cast<>. NFC. Op1 is already a Constant*	2020-10-06 13:13:08 +01:00
Serguei Katkov	b988898013	[GVN LoadPRE] Extend the scope of optimization by using context to prove safety of speculation Use context to prove that load can be safely executed at a point where load is being hoisted. Postpone the decision about safety of speculative load execution till the moment we know where we hoist load and check safety at that context. Reviewers: nikic, fhahn, mkazantsev, lebedev.ri, efriedma, reames Reviewed By: reames, mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D88725	2020-10-06 09:25:16 +07:00
Vedant Kumar	9afb1c566e	Revert "Outline non returning functions unless a longjmp" This reverts commit `20797989ea`. This patch (https://reviews.llvm.org/D69257) cannot complete a stage2 build due to the change: ``` CI->getCalledFunction()->getName().contains("longjmp") ``` There are several concrete issues here: - The callee may not be a function, so `getCalledFunction` can assert. - The called value may not have a name, so `getName` can assert. - There's no distinction made between "my_longjmp_test_helper" and the actual longjmp libcall. At a higher level, there's a serious layering problem here. The splitting pass makes policy decisions in a general way (e.g. based on attributes or profile data). Special-casing certain names breaks the layering. It subverts the work of library maintainers (who may now need to opt-out of unexpected optimization behavior for any affected functions) and can lead to inconsistent optimization behavior (as not all llvm passes special-case ".longjmp." in the same way). The patch may need significant revision to address these issues. But the immediate issue is that this crashes while compiling llvm's unit tests in a stage2 build (due to the `getName` problem).	2020-10-05 14:10:25 -07:00
Roman Lebedev	e00f189d39	[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592) (it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html) This canonicalization seems dubious. Most importantly, while it does not create `inttoptr` casts by itself, it may cause them to appear later, see e.g. D88788. I think it's pretty obvious that it is an undesirable outcome, by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts are not no-op, and are no longer eager to look past them. Which e.g. means that given ``` %a = load i32 %b = inttoptr %a %c = inttoptr %a ``` we likely won't be able to tell that `%b` and `%c` is the same thing. As we can see in D88789 / D88788 / D88806 / D75505, we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least) And we can't recover the situation post-inlining in instcombine. So it really does look like this fold is actively breaking otherwise-good IR, in a way that is not recoverable. And that means, this fold isn't helpful in exposing the passes that are otherwise unaware of these patterns it produces. Thusly, i propose to simply not perform such a canonicalization. The original motivational RFC does not state what larger problem that canonicalization was trying to solve, so i'm not sure how this plays out in the larger picture. On vanilla llvm test-suite + RawSpeed, this results in increase of asm instructions and final object size by ~+0.05% decreases final count of bitcasts by -4.79% (-28990), ptrtoint casts by -15.41% (-3423), and of inttoptr casts by -25.59% (-6919, sic). Overall, there's -0.04% less IR blocks, -0.39% instructions. See https://bugs.llvm.org/show_bug.cgi?id=47592 Differential Revision: https://reviews.llvm.org/D88789	2020-10-06 00:00:30 +03:00
Dávid Bolvanský	a4bae56ab8	Revert "[SLC] Optimize mempcpy_chk to mempcpy" This reverts commit `3f1fd59de3`.	2020-10-05 22:27:14 +02:00
Dávid Bolvanský	3f1fd59de3	[SLC] Optimize mempcpy_chk to mempcpy As reported in PR46735: void* f(void d, const void s, size_t l) { return __builtin___mempcpy_chk(d, s, l, __builtin_object_size(d, 0)); } This can be optimized to `return mempcpy(d, s, l);`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86019	2020-10-05 22:18:36 +02:00
Roman Lebedev	59127de243	[NFC][GCOV] Fix build: there's `llvm::stable_partition()` wrapper	2020-10-05 22:52:32 +03:00
Fangrui Song	e338f8fe69	[gcov] Fix non-determinism (DenseMap iteration order) of checksum computation ... by using MapVector. The issue was caused by `63182c2ac0`. Also use stable_partition instead of partition to get stable results across different STL implementations.	2020-10-05 12:39:36 -07:00
Nikita Popov	3641d375f6	[InstCombine] Handle GEP inbounds in select op replacement (PR47730) When retrying the "simplify with operand replaced" select optimization without poison flags, also handle inbounds on GEPs. Of course, this particular example would also be safe to transform while keeping inbounds, but the underlying machinery does not know this (yet).	2020-10-05 21:13:02 +02:00
Simon Pilgrim	8fb4645321	[InstCombine] FoldShiftByConstant - use m_Specific. NFCI. Use m_Specific instead of m_Value followed by an equality check - we already do this for the similar folds above, it looks like an oversight in rG2b459fe7e1e where the original pattern match code looked a little different.	2020-10-05 18:56:10 +01:00
Simon Pilgrim	4ce61144cb	[InstCombine] canEvaluateShifted - remove dead (and never used code). NFC. This was already #if'd out when it was added back in 2010 at rG18d7fc8fc6767 and has never been touched since.	2020-10-05 18:05:13 +01:00
Nikita Popov	9d63029770	Revert "[DebugInfo] Improve dbg preservation in LSR." This reverts commit `a3caf7f610`. The ReleaseLTO-g test-suite configuration has been failing to build since this commit, because clang segfaults while building 7zip.	2020-10-05 19:02:30 +02:00
Florian Hahn	348d85a6c7	[VPlan] Clean up uses/operands on VPBB deletion. Update the code responsible for deleting VPBBs and recipes to properly update users and release operands. This is another preparation for D84680 & following patches towards enabling modeling def-use chains in VPlan.	2020-10-05 14:43:52 +01:00
Markus Lavin	a3caf7f610	[DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before transformation compute SCEV for each @llvm.dbg.value in the loop body and store it (along side its current DIExpression). After transformation update those @llvm.dbg.value now referencing undef by comparing its stored SCEV to the SCEV of the current loop-header PHI-nodes. Allow match with offset by inserting compensation code in the DIExpression. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-10-05 09:55:16 +02:00
Florian Hahn	357bbaab66	[VPlan] Add VPRecipeBase::toVPUser helper (NFC). This adds a helper to convert a VPRecipeBase pointer to a VPUser, for recipes that inherit from VPUser. Once VPRecipeBase directly inherits from VPUser this helper can be removed.	2020-10-04 19:43:27 +01:00
Florian Hahn	f5fe7abe8a	[VPlan] Account for removed users in replaceAllUsesWith. Make sure we do not iterate using an invalid iterator. Another small fix/step towards traversing the def-use chains in VPlan.	2020-10-04 18:18:58 +01:00
Anatoly Parshintsev	a566f0525a	[RISCV][ASAN] instrumentation pass now uses proper shadow offset [10/11] patch series to port ASAN for riscv64 Depends On D87580 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D87581	2020-10-04 16:30:38 +03:00
Roman Lebedev	03bd5198b6	[OldPM] Pass manager: run SROA after (simple) loop unrolling I have stumbled into this pretty accidentally, when rewriting some spaghetti-like code into something more structured, which involved using some `std::array<>`s. And to my surprise, the `alloca`s remained, causing about `+160%` perf regression. https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions suggests that this has geomean compile-time cost of `+0.08%`. Note that D68593 / `cecc0d27ad` already did this chage for NewPM, but left OldPM in a pessimized state. This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 \| PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 \| PR42794 ]] and probably some other reports. Reviewed By: nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D87972	2020-10-04 11:53:50 +03:00
Florian Hahn	82dcd383c4	[VPlan] Properly update users when updating operands. When updating operands of a VPUser, we also have to adjust the list of users for the new and old VPValues. This is required once we start transitioning recipes to become VPValues.	2020-10-03 20:54:58 +01:00
Michał Górny	66e493f81e	[asan] Stop instrumenting user-defined ELF sections Do not instrument user-defined ELF sections (whose names resemble valid C identifiers). They may have special use semantics and modifying them may break programs. This is e.g. the case with NetBSD __link_set API that expects these sections to store consecutive array elements. Differential Revision: https://reviews.llvm.org/D76665	2020-10-03 19:54:38 +02:00
Simon Pilgrim	aacfe2be53	[InstCombine] recognizeBSwapOrBitReverseIdiom - add vector support Add basic vector handling to recognizeBSwapOrBitReverseIdiom/collectBitParts - this works at the element level, all vector element operations must match (splat constants etc.) and there is no cross-element support (insert/extract/shuffle etc.).	2020-10-03 16:26:46 +01:00
Simon Pilgrim	347fd9955a	[InstCombine] recognizeBSwapOrBitReverseIdiom - use generic CreateIntegerCast Try to appease buildbots breakages due to D88578	2020-10-03 15:29:22 +01:00
Simon Pilgrim	3aa93f690b	[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191) (Reapplied) If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern. Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type. Differential Revision: https://reviews.llvm.org/D88578	2020-10-03 14:52:42 +01:00
Nikita Popov	fbf818724f	[MemCpyOpt] Make moveUp() a member method (NFC) So we don't have to pass through more parameters in the future.	2020-10-03 11:28:49 +02:00
Arthur Eubanks	321986fe68	[MetaRenamer][NewPM] Port metarenamer to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D88690	2020-10-02 15:42:25 -07:00
Vitaly Buka	aff896dea1	[NFC][MSAN] Extract llvm.abs handling into a function Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D88519	2020-10-02 15:01:25 -07:00
Nikita Popov	94704ed008	[MemCpyOpt] Add helper to erase instructions (NFC) Next to erasing the instruction, we also always want to remove it from MSSA and MD. Use a common function to do so. This is a refactoring split out from D26739.	2020-10-02 21:52:10 +02:00
Nikita Popov	87b63c1726	[MemCpyOpt] Avoid double invalidation (NFCI) The removal of the cpy instruction is left to the caller of performCallSlotOptzn(), including the invalidation of MD. Both call-sites already do this. Also handle incrementation of NumMemCpyInstr consistently at the call-site. One of the call-site was already doing this, which ended up incrementing the statistic twice. This fix was part of D26739.	2020-10-02 21:50:46 +02:00
Arthur Eubanks	7468afe9ca	[DAE] MarkLive in MarkValue(MaybeLive) if any use is live While looping through all args or all return values, we may mark a use of a later iteration as live. Previously when we got to that later value it would ignore that and continue adding to Uses instead of marking it live. For example, when looping through arg#0 and arg#1, MarkValue(arg#0, Live) may cause some use of arg#1 to be live, but MarkValue(arg#1, MaybeLive) will not notice that and continue adding into Uses. Now MarkValue(RA, MaybeLive) will MarkLive(RA) if any use is live. Fixes PR47444. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D88529	2020-10-02 10:55:08 -07:00
Arthur Eubanks	eb55735073	Reland [AlwaysInliner] Update BFI when inlining Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88324	2020-10-02 10:46:57 -07:00
Arthur Eubanks	9b8c0b8b46	Revert "[AlwaysInliner] Update BFI when inlining" This reverts commit `b1bf24667f`.	2020-10-02 10:34:51 -07:00
Arthur Eubanks	b1bf24667f	[AlwaysInliner] Update BFI when inlining Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88324	2020-10-02 10:26:34 -07:00
Simon Pilgrim	0364721e3e	Revert rG3d14a1e982ad27 - "[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191)" This reverts commit `3d14a1e982`. This is breaking on some 2stage clang buildbots	2020-10-02 18:17:14 +01:00
Florian Hahn	0867a9e85a	[VPlan] Use isa<> instead of directly checking VPRecipeID (NFC). getVPRecipeID is intended to be only used in `classof` helpers. Instead of checking it directly, use isa<> with the correct recipe type.	2020-10-02 17:47:35 +01:00
Simon Pilgrim	3d14a1e982	[InstCombine] recognizeBSwapOrBitReverseIdiom - support for 'partial' bswap patterns (PR47191) If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern. Differential Revision: https://reviews.llvm.org/D88578	2020-10-02 17:25:12 +01:00
Simon Pilgrim	0347f3ea72	TruncInstCombine.cpp - fix header include ordering to fix llvm-include-order clang-tidy warning. NFCI.	2020-10-02 17:25:12 +01:00
Simon Pilgrim	5e8e89d814	TruncInstCombine.cpp - use auto * to fix llvm-qualified-auto clang-tidy warning. NFCI.	2020-10-02 17:25:11 +01:00
Philip Reames	f29645e7af	[gvn] Handle a corner case w/vectors of non-integral pointers If we try to coerce a vector of non-integral pointers to a narrower type (either narrower vector or single pointer), we use inttoptr and violate the semantics of non-integral pointers. In theory, we can handle many of these cases, we just need to use a different code idiom to convert without going through inttoptr and back. This shows up as wrong code bugs, and in some cases, crashes due to failed asserts. Modeled after a change which has lived downstream for a couple years, though completely rewritten to be more idiomatic.	2020-10-01 19:20:21 -07:00
Joseph Huber	82453e759c	[OpenMP] Add Missing Runtime Call for Globalization Remarks Summary: Add a missing runtime call to perform data globalization checks. Reviewers: jdoerfert Subscribers: guansong hiraditya llvm-commits sstefan1 yaxunl Tags: #LLVM #OpenMP Differential Revision: https://reviews.llvm.org/D88621	2020-10-01 21:19:53 -04:00
Philip Reames	bb0344644a	[memcpyopt] Conservatively handle non-integral pointers If we allow the non-integral pointers to become memset and memcpy, we loose the ability to reason about pointer propagation. This patch is modeled on changes we've carried downstream for a long time, figured it was worth being equally conservative for other users. There is room to refine the semantics and handling here if anyone is motivated.	2020-10-01 16:46:56 -07:00
Philip Reames	de3cb9548d	Fix a bug in memset formation with vectors of non-integral pointers We were converting the non-integral store into a integer store which is not legal.	2020-10-01 16:11:11 -07:00
Nikita Popov	9d1c8c0ba9	[InstCombine] Fix select operand simplification with undef (PR47696) When replacing X == Y ? f(X) : Z with X == Y ? f(Y) : Z, make sure that Y cannot be undef. If it may be undef, we might end up picking a different value for undef in the comparison and the select operand.	2020-10-01 21:15:48 +02:00
zoecarver	6c25816d7b	[DSE] Look through memory PHI arguments when removing noop stores in MSSA. Summary: Adds support for "following" memory through MSSA PHI arguments. This will help catch more noop stores that exist between blocks. Originally part of D79391. Reviewers: fhahn, jfb, asbirlea Differential Revision: https://reviews.llvm.org/D82588	2020-10-01 10:42:02 -07:00
Simon Pilgrim	29ac9fae54	[InstCombine] collectBitParts - convert to use PatterMatch matchers and avoid IntegerType casts. Make sure we're using getScalarSizeInBits instead of cast<IntegerType> to get Type bit widths. This is preliminary cleanup before we can start adding vector support to the bswap/bitreverse (element level) matching.	2020-10-01 16:44:14 +01:00
Simon Pilgrim	567049f892	[InstCombine] Use m_FAbs matcher helper. NFCI.	2020-10-01 14:42:34 +01:00
Sjoerd Meijer	d53b4bee0c	[LoopFlatten] Add a loop-flattening pass This is a simple pass that flattens nested loops. The intention is to optimise loop nests like this, which together access an array linearly: for (int i = 0; i < N; ++i) for (int j = 0; j < M; ++j) f(A[iM+j]); into one loop: for (int i = 0; i < (NM); ++i) f(A[i]); It can also flatten loops where the induction variables are not used in the loop. This can help with codesize and runtime, especially on simple cpus without advanced branch prediction. This is only worth flattening if the induction variables are only used in an expression like i*M+j. If they had any other uses, we would have to insert a div/mod to reconstruct the original values, so this wouldn't be profitable. This partially fixes PR40581 as this pass triggers on one of the two cases. I will follow up on this to learn LoopFlatten a few more (small) tricks. Please note that LoopFlatten is not yet enabled by default. Patch by Oliver Stannard, with minor tweaks from Dave Green and myself. Differential Revision: https://reviews.llvm.org/D42365	2020-10-01 13:54:45 +01:00
Simon Pilgrim	bc730b5e43	[InstCombine] collectBitParts - use APInt directly to check for out of range bit shifts. NFCI.	2020-10-01 12:50:36 +01:00
Arthur Eubanks	460dda071e	[WholeProgramDevirt][NewPM] Add NPM testing path to match legacy pass The legacy pass's default constructor sets UseCommandLine = true and goes down a separate testing route. Match that in the NPM pass. This fixes all tests in llvm/test/Transforms/WholeProgramDevirt under NPM. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D88588	2020-09-30 17:27:37 -07:00
Simon Pilgrim	323d08e50a	[InstCombine] Fix bswap(trunc(bswap(x))) -> trunc(lshr(x, c)) vector support Use getScalarSizeInBits not getPrimitiveSizeInBits to determine the shift value at the element level.	2020-09-30 16:01:08 +01:00
Simon Pilgrim	c722b32596	[InstCombine] recognizeBSwapOrBitReverseIdiom - merge the regular/trunc+zext paths. NFCI. There doesn't seem to be any good reason for having a separate path for when we bswap/bitreverse at a smaller size than the destination size - so merge these to make the instruction generation a lot clearer.	2020-09-30 14:54:04 +01:00
Simon Pilgrim	d5545a8993	[InstCombine] recognizeBSwapOrBitReverseIdiom - remove unnecessary cast. NFCI.	2020-09-30 14:44:15 +01:00
Florian Hahn	d856365470	[VPlan] Change recipes to inherit from VPUser instead of a member var. Now that VPUser is not inheriting from VPValue, we can take the next step and turn the recipes that already manage their operands via VPUser into VPUsers directly. This is another small step towards traversing def-use chains in VPlan. This is NFC with respect to the generated code, but makes the interface more powerful.	2020-09-30 14:39:00 +01:00
Simon Pilgrim	621c6c8962	[InstCombine] recognizeBSwapOrBitReverseIdiom - cleanup bswap/bitreverse detection loop. NFCI. Early out if both pattern matches have failed (or we don't want them). Fix case of bit index iterator (and avoid Wshadow issue).	2020-09-30 14:19:18 +01:00
Simon Pilgrim	413b4998bd	[InstCombine] recognizeBSwapOrBitReverseIdiom - use ArrayRef::back() helper. NFCI. Post-commit feedback on D88316	2020-09-30 13:39:18 +01:00
Simon Pilgrim	05290eead3	InstCombine] collectBitParts - cleanup variable names. NFCI. Fix a number of WShadow warnings (I was used as the instruction and index......) and fix cases to match style. Also, replaced the Bit APInt mask check in AND instructions with a direct APInt[] bit check.	2020-09-30 13:25:32 +01:00
Simon Pilgrim	af47d40b9c	[InstCombine] recognizeBSwapOrBitReverseIdiom - recognise zext(bswap(trunc(x))) patterns (PR39793) PR39793 demonstrated an issue where we fail to recognize 'partial' bswap patterns of the lower bytes of an integer source. In fact, most of this is already in place collectBitParts suitably tags zero bits, so we just need to correctly handle this case by finding the zero'd upper bits and reducing the bswap pattern just to the active demanded bits. Differential Revision: https://reviews.llvm.org/D88316	2020-09-30 12:07:19 +01:00
Simon Pilgrim	ec3f24d453	[InstCombine] recognizeBSwapOrBitReverseIdiom - assert for correct bit providence indices. NFCI. As suggested by @spatel on D88316	2020-09-30 11:16:33 +01:00
Jeremy Morse	05659606a2	Revert "[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC" Some of the buildbots have croaked with this patch, for examples failures that begin in this build: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/29933 This reverts commit `674f57870f`.	2020-09-30 09:52:12 +01:00
Vedant Kumar	674f57870f	[gardening] Replace some uses of setDebugLoc(DebugLoc()) with dropLocation(), NFC	2020-09-29 17:39:07 -07:00
Vedant Kumar	26ee8aff2b	[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI) Lifetime marker intrinsics support any pointer type, so CodeExtractor does not need to bitcast to `i8*` in order to use these markers.	2020-09-29 16:34:36 -07:00
Sanjay Patel	0527c8749b	[InstCombine] ease alignment restriction for converting masked load to normal load I think we initially made this fold conservative to be safer, but we do not need the alignment attribute/metadata limitation because the masked load intrinsic itself specifies the alignment. A normal vector load is better for IR transforms and should be no worse in codegen than the masked alternative. If it is worse for some target, the backend can reverse this transform. Differential Revision: https://reviews.llvm.org/D88505	2020-09-29 15:26:22 -04:00
Simon Pilgrim	0cf48a7065	[InstCombine] visitTrunc - trunc (shr (trunc A), C) --> trunc(shr A, C) Attempt to fold trunc (shr (trunc A), C) --> trunc(shr A, C) iff the shift amount if small enough that all zero/sign bits created by the shift are removed by the last trunc. Helps fix the regressions encountered in D88316. I've tweaked a couple of shift values as suggested by @lebedev.ri to ensure we have coverage of shift values close (above/below) to the max limit. Differential Revision: https://reviews.llvm.org/D88429	2020-09-29 18:27:42 +01:00
Juneyoung Lee	67aac915ba	[BuildLibCalls] Add noundef to the returned pointers of allocators and argument of free This patch adds noundef to the returned pointers of allocators (malloc, calloc, ...) and the pointer argument of free. The returned pointer of allocators cannot be poison or (partially) undef. Since the pointer that is given to free should precisely have zero offset, it cannot be poison or (partially) undef too. For the size arguments of allocators, noundef wasn't attached simply because I wasn't sure whether attaching it is okay or not. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D87984	2020-09-30 02:13:48 +09:00
Simon Pilgrim	b610d73b3f	[InstCombine] visitTrunc - remove dead trunc(lshr (zext A), C) combine. NFCI. I added additional test coverage at rG7a55989dc4305 - but all are handled independently of this combine and http://lab.llvm.org:8080/coverage/coverage-reports/ indicates the code is never used. Differential revision: https://reviews.llvm.org/D88492	2020-09-29 17:15:16 +01:00
Simon Pilgrim	89a8a0c910	[InstCombine] Inherit exact flags on extended shifts in trunc (lshr (sext A), C) --> (ashr A, C) This was missed in D88475	2020-09-29 15:32:09 +01:00
Simon Pilgrim	14ff38e235	[InstCombine] visitTrunc - trunc (lshr (sext A), C) --> (ashr A, C) non-uniform support This came from @lebedev.ri's suggestion to use m_SpecificInt_ICMP for D88429 - since I was going to change the m_APInt to m_Constant for that patch I thought I would do it for the only other user of the APInt first. I've added a ConstantExpr::getUMin helper - its trivial to add UMAX/SMIN/SMAX but thought I'd wait until we have use cases. Differential Revision: https://reviews.llvm.org/D88475	2020-09-29 15:01:16 +01:00
Florian Hahn	7bae2bc5a8	[LoopUtils] Only verify SE in builds with assertions. Follow up to `60b852092c`.	2020-09-29 13:39:23 +01:00
Daniel Kiss	c5a4900e1a	[AArch64] Add BTI to CFI jumptables. With branch protection the jump to the jump table entries requires a landing pad. Reviewed By: eugenis, tamas.petz Differential Revision: https://reviews.llvm.org/D81251	2020-09-29 13:50:23 +02:00
David Stenberg	e6f332ef1e	[IndVarSimplify] Fix Modified status for removal of overflow intrinsics When removing an overflow intrinsic the Changed status in SimplifyIndvar was not set, leading to the IndVarSimplify pass returning an incorrect status. This was caught using the check introduced by D80916. As pointed out in the code review, a similar bug may exist for eliminateTrunc(). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D85971	2020-09-29 13:20:59 +02:00
Vitaly Buka	4aa6abe4ef	[msan] Fix llvm.abs.v intrinsic The last argument of the intrinsic is a boolean flag to control INT_MIN handling and does not affect msan metadata.	2020-09-29 03:52:27 -07:00
sstefan1	cb9cfa0d2f	[OpenMPOpt][Fix] Only initialize ICV initial values once. Reviewers: jdoerfert, ggeorgakoudis Differential Revision: https://reviews.llvm.org/D88441	2020-09-29 12:22:58 +02:00
Florian Hahn	60b852092c	[LoopDeletion] Forget loop before setting values to undef After D71539, we need to forget the loop before setting the incoming values of phi nodes in exit blocks, because we are looking through those phi nodes now and the SCEV expression could depend on the loop phi. If we update the phi nodes before forgetting the loop, we miss those users during invalidation. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D88167	2020-09-29 10:38:44 +01:00
Florian Hahn	b76df593eb	Revert "Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one."" Looks like there is still another remaining issue: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/22273/steps/build%20libcxx%2Fmsan/logs/stdio This reverts commit `86a20d9e34`.	2020-09-29 09:18:19 +01:00
Florian Hahn	86a20d9e34	Recommit "[SCCP] Do not replace deref'able ptr with un-deref'able one." This version includes an small fix allowing function pointers to be unconditionally replaced for now. This reverts commit `4c5e4aa89b`.	2020-09-29 09:10:27 +01:00
Max Kazantsev	e862e78b63	[NFC] Use assert instead of checking the guaranteed condition From preconditions it is known that either A dominates B or B dominates A. If A does not dominate B, we do not really need to check it. Assert should be enough. Should save some compile time.	2020-09-29 11:38:45 +07:00
Max Kazantsev	d266fd960e	[IndVars] Remove exiting conditions that are trivially true/false When removing exiting loop conditions, we only consider checks for which we know the exact exit count. We could also eliminate checks for which the condition is always true/false. Differential Revision: https://reviews.llvm.org/D87344 Reviewed By: lebedev.ri, reames	2020-09-29 11:35:32 +07:00
Philip Reames	e46d74b589	[CVP] Allow two transforms in one invocation For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early. This is somewhat of a speculative fix. Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding. I never received a test case. However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue. So, since it's a real issue, fix it and hope.	2020-09-28 15:11:42 -07:00
Dominic Chen	06e68f05da	[AddressSanitizer] Copy type metadata to prevent miscompilation When ASan and e.g. Dead Virtual Function Elimination are enabled, the latter will rely on type metadata to determine if certain virtual calls can be removed. However, ASan currently does not copy type metadata, which can cause virtual function calls to be incorrectly removed. Differential Revision: https://reviews.llvm.org/D88368	2020-09-28 13:56:05 -04:00
Simon Pilgrim	63ee42a06b	[InstCombine] matchRotate - force splat of uniform constant rotation amounts (PR46895) Fixes minor bug in D88402 where we were using the original shift constant (with undefs) instead of one with the splat values (re)splatted to all elements.	2020-09-28 15:12:41 +01:00
Simon Pilgrim	dabb14cadd	[InstCombine] matchRotate - allow undef in uniform constant rotation amounts (PR46895) An extension to D87452, we can safely permit undefs in the uniform/splat detection https://alive2.llvm.org/ce/z/nT-ptN Differential Revision: https://reviews.llvm.org/D88402	2020-09-28 13:36:13 +01:00
Benjamin Kramer	7e5a356d2b	[Coroutines] Remove unused includes. NFC.	2020-09-28 10:27:23 +02:00
Chuanqi Xu	b3a722e66b	[Coroutines] Reuse storage for local variables with non-overlapping lifetimes bug 45566 shows the process of building coroutine frame won't consider that the lifetimes of different local variables are not overlapped, which means the compiler could generates smaller frame. This patch calculate the lifetime range of each alloca by StackLifetime class. Then the patch build non-overlapped sets for allocas whose lifetime ranges are not overlapped. We use the largest type in a non-overlapped set as the field type in the frame. In insertSpills process, if we find the type of field is not the same with the alloca, we cast the pointer to the field type to the pointer to the alloca type. Since the lifetime range of alloca in one non-overlapped set is not overlapped with each other, it should be ok to reuse the storage space in the frame. Test plan: check-llvm, check-clang, cppcoro, folly Reviewers: junparser, lxfind, modocache Differential Revision: https://reviews.llvm.org/D87596	2020-09-28 15:48:00 +08:00
Dávid Bolvanský	155ac33394	[BuildLibCalls] Add noalias for strcat and stpcpy strcat: destination and source shall not overlap. (http://www.cplusplus.com/reference/cstring/strcat/) stpcpy: The strings may not overlap, and the destination string dest must be large enough to receive the copy. (https://man7.org/linux/man-pages/man3/stpcpy.3.html) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88335	2020-09-27 21:37:09 +02:00
Nikita Popov	fe79061be2	[LVI][CVP] Use block value when simplifying icmps Add a flag to getPredicateAt() that allows making use of the block value. This allows us to take into account range information from the current block, rather than only information that is threaded over edges, making the icmp simplification in CVP a lot more powerful. I'm not changing getPredicateAt() to use the block value unconditionally to avoid any impact on the JumpThreading pass, which is somewhat picky about LVI query order. Most test changes here are just icmps that now get dropped (while previously only a result used in a return was replaced). The three tests in icmp.ll show some representative improvements. Some of the folds this enables have been covered by IPSCCP in the meantime, but LVI can reason about some cases which are hard to support in IPSCCP, such as in test_br_cmp_with_offset. The compile-time time cost of doing this is fairly minimal, with a ~0.05% CTMark regression for ReleaseThinLTO: https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions This is because the block values will typically already be queried and cached by other CVP optimizations anyway. Differential Revision: https://reviews.llvm.org/D69686	2020-09-27 20:25:16 +02:00
Fangrui Song	50bd71e1d7	[NewPM] Port ConstraintElimination to the new pass manager If -enable-constraint-elimination is specified, add it to the -O2/-O3 pipeline. (-O1 uses a separate function now.) Reviewed By: fhahn, aeubanks Differential Revision: https://reviews.llvm.org/D88365	2020-09-27 11:12:26 -07:00
Benjamin Kramer	7b782062b4	[InstCombine] Simplify code. NFCI.	2020-09-27 19:11:07 +02:00
Nikita Popov	9b959b59df	[LVI] Require context instruction in external API (NFCI) Require CxtI in getConstant() and getConstantRange() APIs. Accordingly drop the BB parameter, as it is implied by CxtI->getParent(). This makes sure we don't forget to pass the context instruction, and makes the API contract clearer (also clean up the comments to that effect -- the value holds at the context instruction, not the end of the block).	2020-09-27 18:07:24 +02:00
Nikita Popov	c8abf1c12d	[CVP] Pass context instruction when narrowing div/rem This fold was the only place not passing the context instruction. The tests worked around that fact by introducing a basic block split, which is now no longer necessary.	2020-09-27 17:51:30 +02:00
Fangrui Song	82420b4e49	[DivRemPairs] Use DenseMapBase::find instead of operator[]. NFC	2020-09-27 01:13:14 -07:00
Fangrui Song	400bdbc422	[ConstraintElimination] Internalize function/class and delete an implied condition. NFC Delete an implied condition (E.NumIn <= CB.NumIn)	2020-09-26 15:04:39 -07:00
Florian Hahn	915310bf14	Revert "[DSE] Switch to MemorySSA-backed DSE by default." There appears to be a mis-compile with MemorySSA-backed DSE in combination with llvm.lifetime.end. It currently appears like DSE is doing the right thing and the llvm.lifetime.end markers are incorrect. The reverted patch uncovers the mis-compile. This patch temporarily switches back to the legacy DSE implementation, while we investigate. This reverts commit `9d172c8e9c`.	2020-09-26 18:35:27 +01:00
Florian Hahn	8f0466edc0	[DSE] Unify & fix mem terminator location checks. When looking for memory defs killed by memory terminators the code currently incorrectly ignores the size argument of llvm.lifetime.end. This patch updates the code to use isMemTerminator and updates isMemTerminator to use isOverwrite() to make sure locations that are outside the range marked as dead by llvm.lifetime.end are not considered. Note that isOverwrite is only used for llvm.lifetime.end, because free-like functions make the whole underlying object dead.	2020-09-26 13:47:50 +01:00
Tyker	8d5b289a46	[LoopDelete][Assume] Allow deleting loops with assumes This pervent very poor optimization caused by a signle assume like https://godbolt.org/z/EK3oMh baseline flags: -O3 patched flags: -O3 -mllvm --enable-knowledge-retention Before the patch ``` Metric: compile_time Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 20.72 29.74 43.5% test-suite :: CTMark/Bullet/bullet.test 24.39 24.91 2.2% test-suite :: CTMark/7zip/7zip-benchmark.test 37.39 38.06 1.8% test-suite :: CTMark/kimwitu++/kc.test 11.76 11.94 1.5% test-suite :: CTMark/sqlite3/sqlite3.test 12.94 12.91 -0.3% test-suite :: CTMark/SPASS/SPASS.test 11.72 11.70 -0.2% test-suite :: CTMark/lencod/lencod.test 16.12 16.10 -0.1% test-suite :: CTMark/ClamAV/clamscan.test 13.31 13.30 -0.1% test-suite :: CTMark/mafft/pairlocalalign.test 9.12 9.12 -0.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 9.34 9.34 -0.1% Geomean difference 4.2% Metric: compiler_Kinst_count Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107576069.87 172886418.90 60.7% test-suite :: CTMark/Bullet/bullet.test 123291865.66 125457117.96 1.8% test-suite :: CTMark/kimwitu++/kc.test 56347884.64 57298544.14 1.7% test-suite :: CTMark/7zip/7zip-benchmark.test 180637699.58 183341656.57 1.5% test-suite :: CTMark/sqlite3/sqlite3.test 66723788.85 66664692.80 -0.1% test-suite :: CTMark/ClamAV/clamscan.test 69581500.56 69597863.92 0.0% test-suite :: CTMark/lencod/lencod.test 94236501.48 94216545.32 -0.0% test-suite :: CTMark/SPASS/SPASS.test 58516756.95 58505089.07 -0.0% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 48832815.53 48841989.39 0.0% test-suite :: CTMark/mafft/pairlocalalign.test 49682720.53 49686324.34 0.0% Geomean difference 5.4% ``` After the patch ``` Metric: compile_time Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 20.70 22.40 8.2% test-suite :: CTMark/7zip/7zip-benchmark.test 37.13 38.05 2.5% test-suite :: CTMark/Bullet/bullet.test 24.25 24.83 2.4% test-suite :: CTMark/kimwitu++/kc.test 11.69 11.94 2.2% test-suite :: CTMark/ClamAV/clamscan.test 13.19 13.36 1.3% test-suite :: CTMark/lencod/lencod.test 16.02 16.19 1.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 9.29 9.36 0.7% test-suite :: CTMark/SPASS/SPASS.test 11.64 11.73 0.7% test-suite :: CTMark/mafft/pairlocalalign.test 9.10 9.15 0.5% test-suite :: CTMark/sqlite3/sqlite3.test 12.95 12.96 0.0% Geomean difference 1.9% Metric: compiler_Kinst_count Program baseline patched diff test-suite :: CTMark/tramp3d-v4/tramp3d-v4.test 107590933.61 114044834.72 6.0% test-suite :: CTMark/kimwitu++/kc.test 56344526.77 57235806.29 1.6% test-suite :: CTMark/Bullet/bullet.test 123291285.10 125128334.97 1.5% test-suite :: CTMark/7zip/7zip-benchmark.test 180641540.10 183155706.39 1.4% test-suite :: CTMark/sqlite3/sqlite3.test 66725619.22 66668713.92 -0.1% test-suite :: CTMark/SPASS/SPASS.test 58509029.85 58478704.75 -0.1% test-suite :: CTMark/consumer-typeset/consumer-typeset.test 48843711.23 48826894.68 -0.0% test-suite :: CTMark/lencod/lencod.test 94233305.79 94207544.23 -0.0% test-suite :: CTMark/ClamAV/clamscan.test 69587887.66 69603549.90 0.0% test-suite :: CTMark/mafft/pairlocalalign.test 49686968.65 49689291.04 0.0% Geomean difference 1.0% ``` Reviewed By: jdoerfert, efriedma Differential Revision: https://reviews.llvm.org/D86816	2020-09-26 12:32:44 +02:00
Arthur Eubanks	83e3ea2cfc	[LowerTypeTests][NewPM] Add constructor that uses command line flags This matches the legacy PM pass by having one constructor use command line flags, and the other use parameters to the pass. This fixes all tests under Transforms/LowerTypeTests using NPM. Reviewed By: ychen, pcc Differential Revision: https://reviews.llvm.org/D87845	2020-09-25 17:39:59 -07:00
Layton Kifer	48961ba0de	[TRE][NFC] Refactor Basic Block Processing Simplify and improve readability. Differential Revision: https://reviews.llvm.org/D82269	2020-09-25 16:01:05 -07:00
Simon Pilgrim	9ff9c1d8ee	[InstCombine] matchRotate - support (uniform) constant rotation amounts (PR46895) This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors. Differential Revision: https://reviews.llvm.org/D87452	2020-09-25 22:03:10 +01:00
Simon Pilgrim	2a0ca17f66	[InstCombine] collectBitParts - add fshl/fshr handling Pulled from D87452, this is a fixed version of the collectBitParts fshl/fshr handling which as @nikic noticed wasn't checking for different providers or had correct bit ordering (which was hid by only testing shift amounts of bitwidth/2). Differential Revision: https://reviews.llvm.org/D88292	2020-09-25 20:34:59 +01:00
Arthur Eubanks	d3f6972abb	[LoopReroll][NewPM] Port -loop-reroll to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87957	2020-09-25 12:09:06 -07:00
Daniel Paoliello	d2166076b8	[Coroutine] Split PHI Nodes in `cleanuppad` blocks in a way that obeys EH pad rules Issue Details: In order to support coroutine splitting, any multi-value PHI node in a coroutine is split into multiple blocks with single-value PHI Nodes, which then allows a subsequent transform to generate `reload` instructions as required (i.e., to reload the value if required if the coroutine has been resumed). This causes issues with EH pads (`catchswitch` and `catchpad`) as all pads within a `catchswitch` must have the same unwind destination, but the coroutine splitting logic may modify them to each have a unique unwind destination if there is a PHI node in the unwind `cleanuppad` that is set from values in the `catchswitch` and `cleanuppad` blocks. Fix Details: During splitting, if such a PHI node is detected, then create a "dispatcher" `cleanuppad` as well as the blocks with single-value PHI Nodes: thus the "dispatcher" is the unwind destination and it will detect which predecessor called it and then branch to the appropriate single-value PHI node block, which will then branch back to the original `cleanuppad` block. Reviewed By: GorNishanov, lxfind Differential Revision: https://reviews.llvm.org/D88059	2020-09-25 11:30:38 -07:00
Joseph Huber	a22814194e	[OpenMP] OpenMPOpt Support for Globalization Remarks Summary: This patch add support for printing analysis messages relating to data globalization on the GPU. This occurs when data is shared between the threads in a GPU context and must be pushed to global or shared memory. Reviewers: jdoerfert Subscribers: guansong hiraditya llvm-commits ormris sstefan1 yaxunl Tags: #OpenMP #LLVM Differential Revision: https://reviews.llvm.org/D88243	2020-09-24 18:23:12 -04:00
Vedant Kumar	dfc5a9eb57	[Instruction] Add dropLocation and updateLocationAfterHoist helpers Introduce a helper which can be used to update the debug location of an Instruction after the instruction is hoisted. This can be used to safely drop a source location as recommended by the docs. For more context, see the discussion in https://reviews.llvm.org/D60913. Differential Revision: https://reviews.llvm.org/D85670	2020-09-24 15:00:04 -07:00
Sanjay Patel	0a349d5827	[SLP] clean up - use 'const' and ArrayRef constructor; NFC Follow-on tidying suggested in the post-commit review of `6a23668`.	2020-09-24 15:31:07 -04:00
Craig Topper	03f22b08e2	[SLP] Remove LHS and RHS from OperationData. These were only really used for 2 things. One was to check if the operand matches the phi if it exists. The other was for the createOp method to build the reduction. For the first case we still have the operation we just need to know how to index its operands. So I've modified getLHS/getRHS to just use the opcode/kind to know how to find the right operands on an instruction that is now passed in. For the other case we had to create an OperationData object to set the LHS/RHS values and copy the opcode/kind from another object. We would then just call createOp on that temporary object. Instead I've made LHS/RHS arguments to createOp and removed all these temporary objects. Differential Revision: https://reviews.llvm.org/D88193	2020-09-24 10:57:11 -07:00
Simon Pilgrim	81a408808f	[Scalar] ConstantHoistingPass - iterate with const references. NFCI. Fix some clang-tidy warnings.	2020-09-24 18:40:50 +01:00
Matt Arsenault	d65a7003c4	OpaquePtr: Add helpers for sret to mirror byval Sret should really have a type parameter like byval does.	2020-09-24 09:57:28 -04:00
Zequan Wu	f5435399e8	[CGProfile] don't emit cgprofile entry if called function is dllimport Differential Revision: https://reviews.llvm.org/D88127	2020-09-23 16:56:54 -07:00
Arthur Eubanks	6b1ce83a12	[NewPM][CGSCC] Handle newly added functions in updateCGAndAnalysisManagerForPass This seems to fit the CGSCC updates model better than calling addNewFunctionInto{Ref,}SCC() on newly created/outlined functions. Now addNewFunctionInto{Ref,}SCC() are no longer necessary. However, this doesn't work on newly outlined functions that aren't referenced by the original function. e.g. if a() was outlined into b() and c(), but c() is only referenced by b() and not by a(), this will trigger an assert. This also fixes an issue I was seeing with newly created functions not having passes run on them. Ran check-llvm with expensive checks. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87798	2020-09-23 15:22:18 -07:00
Craig Topper	7a3c643c35	[SLP] Make HorizontalReduction::getOperationData take an Instruction* instead of a Value. NFCI All of the callers already have an Instruction . Many of them from a dyn_cast. Also update the OperationData constructor to use a Instruction& to remove a dyn_cast and make it clear that the pointer is non-null. Differential Revision: https://reviews.llvm.org/D88132	2020-09-23 10:51:03 -07:00
Krzysztof Parzyszek	76e8c1899e	Break long line accidentally left in the previous commit	2020-09-23 12:24:45 -05:00
Krzysztof Parzyszek	e976fb1e54	[EarlyCSE] Fix crash with expensive checks after D87691 D87691 reordered some checks, which turned out to be unsafe. More specifically, when examining a store instruction, the check against getOrCreateResult should be done before attempting to call isSameMemGeneration. Otherwise a crash in MSSA walker can occur. This patch restores the order of these calls to what it was originally.	2020-09-23 12:21:34 -05:00
Simon Pilgrim	91589cf679	Add missing namespace closure comments. NFCI. Fixes some clang-tidy llvm-namespace-comment warnings.	2020-09-23 16:19:25 +01:00
Simon Pilgrim	474dc33d07	Add missing namespace closure comment. NFCI. Fixes clang-tidy llvm-namespace-comment warning.	2020-09-23 16:19:25 +01:00
Florian Hahn	31923f6b36	[VPlan] Disconnect VPValue and VPUser. This refactors VPuser to not inherit from VPValue to facilitate introducing operations that introduce multiple VPValues (e.g. VPInterleaveRecipe). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84679	2020-09-23 14:44:31 +01:00
David Sherwood	59c4d5aad0	[SVE] Fix InstCombinerImpl::PromoteCastOfAllocation for scalable vectors In this patch I've fixed some warnings that arose from the implicit cast of TypeSize -> uint64_t. I tried writing a variety of different cases to show how this optimisation might work for scalable vectors and found: 1. The optimisation does not work for cases where the cast type is scalable and the allocated type is not. This because we need to know how many times the cast type fits into the allocated type. 2. If we pass all the various checks for the case when the allocated type is scalable and the cast type is not, then when creating the new alloca we have to take vscale into account. This leads to sub-optimal IR that is worse than the original IR. 3. For the remaining case when both the alloca and cast types are scalable it is hard to find examples where the optimisation would kick in, except for simple bitcasts, because we typically fail the ABI alignment checks. For now I've changed the code to bail out if only one of the alloca and cast types is scalable. This means we continue to support the existing cases where both types are fixed, and also the specific case when both types are scalable with the same size and alignment, for example a simple bitcast of an alloca to another type. I've added tests that show we don't attempt to promote the alloca, except for simple bitcasts: Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll Differential revision: https://reviews.llvm.org/D87378	2020-09-23 08:43:05 +01:00
Martin Storsjö	b90132399a	[CVP] Remove a redundant trailing semicolon, fixing GCC warnings. NFC.	2020-09-23 09:03:01 +03:00
Martin Storsjö	2c4c659666	[InstCombine] Add parentheses in assert to silence GCC warning. NFC.	2020-09-23 09:03:01 +03:00
Hubert Tong	32c9991dab	[InstCombine] Fix errno bug in pow expansion to sqrt A conversion from `pow` to `sqrt` shall not call an `errno`-setting `sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow` call need not. This patch avoids the erroneous (pun not intended) transformation by applying the restrictions discussed in the thread for https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html. The existing tests are updated (depending on emphasis in the checks for library calls, avoidance of overlap, and overall coverage): - to add `ninf`, retaining the intended library call, - to use the intrinsic, retaining the use of `select`, or - to expect the replacement to not occur. The following is tested: - The pow intrinsic folds to a `select` instruction to handle -//infinity//. - The pow library call folds, with `ninf`, to `sqrt` without the `select` instruction associated with handling -//infinity//. - The pow library call does not fold to `sqrt` without `ninf`. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D87877	2020-09-22 18:58:05 -04:00
Alexey Bataev	d6ac649ccd	[SLP]Fix coding style, NFC.	2020-09-22 17:44:29 -04:00
Stefanos Baziotis	89c1e35f3c	[LoopInfo] empty() -> isInnermost(), add isOutermost() Differential Revision: https://reviews.llvm.org/D82895	2020-09-22 23:28:51 +03:00
Roman Lebedev	b289dc5306	[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands This is practically identical to what we already do for UDiv/URem: https://rise4fun.com/Alive/04K Name: narrow udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow exact udiv Pre: C0 u<= 255 && C1 u<= 255 %r = udiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = udiv exact i8 %t0, %t1 %r = zext i8 %t2 to i16 Name: narrow urem Pre: C0 u<= 255 && C1 u<= 255 %r = urem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = urem i8 %t0, %t1 %r = zext i8 %t2 to i16 ... only here we need to look for 'min signed bits', not 'active bits', and there's an UB to be aware of: https://rise4fun.com/Alive/KG86 https://rise4fun.com/Alive/LwR Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = sdiv exact i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i9 %t1 = trunc i16 C1 to i9 %t2 = srem i9 %t0, %t1 %r = sext i9 %t2 to i16 Name: narrow sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow exact sdiv Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = sdiv exact i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = sdiv exact i8 %t0, %t1 %r = sext i8 %t2 to i16 Name: narrow srem Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1) %r = srem i16 C0, C1 => %t0 = trunc i16 C0 to i8 %t1 = trunc i16 C1 to i8 %t2 = srem i8 %t0, %t1 %r = sext i8 %t2 to i16 The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks the logic, that we can losslessly truncate ConstantRange to `getMinSignedBits()` and signext it back, and it will be identical to the original CR. On vanilla llvm test-suite + RawSpeed, this fires 1262 times, while the same fold for UDiv/URem only fires 384 times. Sic! Additionally, this causes +606.18% (+1079) extra cases of aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145) of aggressive-instcombine.NumInstrsReduced folds.	2020-09-22 21:37:30 +03:00
Roman Lebedev	4977eadee5	[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms	2020-09-22 21:37:30 +03:00
Roman Lebedev	ba5afe5588	[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits() As an exhaustive test shows, this logic is fully identical to the old implementation, with exception of the case where both of the operands had empty ranges: ``` TEST_F(ConstantRangeTest, CVP_UDiv) { unsigned Bits = 4; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) { if(CR0.isEmptySet()) return; EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) { if(CR0.isEmptySet()) return; unsigned MaxActiveBits = 0; for (const ConstantRange &CR : {CR0, CR1}) MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits()); ConstantRange OperandRange(Bits, /isFullSet=/false); for (const ConstantRange &CR : {CR0, CR1}) OperandRange = OperandRange.unionWith(CR); unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits(); EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1; }); }); } ```	2020-09-22 21:37:29 +03:00
Roman Lebedev	4eeeb356fc	[CVP] Enhance SRem -> URem fold to work not just on non-negative operands This is a continuation of `8d487668d0`, the logic is pretty much identical for SRem: Name: pos pos Pre: C0 >= 0 && C1 >= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, C1 Name: pos neg Pre: C0 >= 0 && C1 <= 0 %r = srem i8 C0, C1 => %r = urem i8 C0, -C1 Name: neg pos Pre: C0 <= 0 && C1 >= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, C1 %r = sub i8 0, %t0 Name: neg neg Pre: C0 <= 0 && C1 <= 0 %r = srem i8 C0, C1 => %t0 = urem i8 -C0, -C1 %r = sub i8 0, %t0 https://rise4fun.com/Alive/Vd6 Now, this new logic does not result in any new catches as of vanilla llvm test-suite + RawSpeed. but it should be virtually compile-time free, and it may be important to be consistent in their handling, because if we had a pair of sdiv-srem, and only converted one of them, -divrempairs will no longer see them as a pair, and thus not "merge" them.	2020-09-22 21:37:28 +03:00
Hubert Tong	6801950192	[InstCombine] For pow(x, +/-0.5), stop falling into pow(x, 1.5), etc. case The current code for handling pow(x, y) where y is an integer plus 0.5 is not explicitly guarded against attempting to transform the case where abs(y) is exactly 0.5. The latter case is meant to be handled by `replacePowWithSqrt`. Indeed, if the pow(x, integer+0.5) case proceeds past a certain point, it will hit an assertion by attempting to form pow(x, 0) using `getPow`. This patch adds an explicit check to prevent attempting the pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during the review of D87877. This has the effect of retaining the shrinking of `pow` to `powf` when the `sqrt` libcall cannot be formed. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D88066	2020-09-22 14:23:32 -04:00
Hamilton Tobon Mosquera	bd31abc1d0	[OpenMPOpt] Refactored "issue" and "wait" declarations for data map runtime call. Refactored __tgt_target_data_begin_mapper_<issue\|wait> to receive the handle as an input/output argument. This given the compiler warning of returning the handle as copy. Differential Revision: https://reviews.llvm.org/D88029	2020-09-22 10:50:17 -05:00
Florian Hahn	c671e34bf2	[VPlan] Add dump() helper to VPValue & VPRecipeBase. This provides a convenient way to print VPValues and recipes in a debugger. In particular it saves the user from instantiating VPSlotTracker to print recipes or values.	2020-09-22 15:55:16 +01:00
Sanjay Patel	0c3bfbe4bc	[SLP] reduce code duplication for checking parent block; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	bbd49a0266	[SLP] move misplaced code comments; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	062276c691	[SLP] clean up code in gather(); NFC 1. Use range for-loop to avoid repeatedly accessing end index. 2. Better variable names.	2020-09-22 09:21:20 -04:00
Simon Pilgrim	d682a36ef9	[SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI.	2020-09-22 14:01:47 +01:00
Meera Nakrani	a3d0dce260	[ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop Changes TTI function getIntImmCostInst to take an additional Instruction parameter, which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT. We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated. Required minor changes in some non-ARM backends to allow for the optional parameter to be included. Differential Revision: https://reviews.llvm.org/D87457	2020-09-22 11:54:10 +00:00
Arthur Eubanks	3bf703fb6d	[AlwaysInliner] Emit optimization remarks To match the normal inliner in preparation for https://reviews.llvm.org/D86988. Also change a FIXME to an assert. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88067	2020-09-21 22:09:28 -07:00
Serguei Katkov	5502cfa091	[LoopUnswitch] Trivial simplification: remove trivial dead condition after unswitch Non trivial loop unswitch can keep the dead condition instruction. CL adds trivial dead code elimination for unused condition. Reviewers: asbirlea, aqjune, fhahn, DaniilSuchkov, reames Reviewed By: asbirlea Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D88014	2020-09-22 09:04:59 +07:00
Krzysztof Parzyszek	ae3f54c1e9	[EarlyCSE] Handle masked loads and stores Extend the handling of memory intrinsics to also include non- target-specific intrinsics, in particular masked loads and stores. Invent "isHandledNonTargetIntrinsic" to distinguish between intrin- sics that should be handled natively from intrinsics that can be passed to TTI. Add code that handles masked loads and stores and update the testcase to reflect the results. Differential Revision: https://reviews.llvm.org/D87340	2020-09-21 18:47:10 -05:00
Arthur Eubanks	1747f77764	[SimplifyCFG] Override options in default constructor SimplifyCFG's options should always be overridden by command line flags, but they mistakenly weren't in the default constructor. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D87718	2020-09-21 16:33:01 -07:00
Krzysztof Parzyszek	2c768c7d6c	[EarlyCSE] Small refactoring changes, NFC 1. Store intrinsic ID in ParseMemoryInst instead of a boolean flag "IsTargetMemInst". This will make it easier to add support for target-independent intrinsics. 2. Extract the complex multiline conditions from EarlyCSE::processNode into a new function "getMatchingValue". Differential Revision: https://reviews.llvm.org/D87691	2020-09-21 16:11:06 -05:00
Sanjay Patel	7451bf0b0b	[SLP] use std::distance/find to reduce code; NFC We were already using this code pattern right after the loop, so this makes it consistent.	2020-09-21 16:22:55 -04:00
Sanjay Patel	6bad3caeb0	[InstCombine] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Sanjay Patel	be93505986	[LoopVectorize] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Arthur Eubanks	f4f7df037e	[DIE] Remove DeadInstEliminationPass This pass is like DeadCodeEliminationPass, but only does one pass through a function instead of iterating on users of eliminated instructions. DeadCodeEliminationPass should be used in all cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87933	2020-09-21 12:12:25 -07:00
Arthur Eubanks	746a2c3775	[ObjCARC] Initialize return value Mistakenly removed initialization of `Changed` in https://reviews.llvm.org/D87806.	2020-09-21 11:03:44 -07:00
Sanjay Patel	a44238cb44	[SLP] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 13:54:06 -04:00
Sanjay Patel	1e6b240d7d	[IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC This reduces code duplication for common construct. Follow-ups can use this in SLP, LoopVectorizer, and other passes.	2020-09-21 13:47:01 -04:00
Simon Pilgrim	005f826a05	[SLP] Use for-range loops across ValueLists. NFCI. Also rename some existing loops that used a 'j' iterator to consistently use 'V'.	2020-09-21 18:24:23 +01:00
Sanjay Patel	46075e0b78	[SLP] simplify interface for gather(); NFC The implementation of gather() should be reduced too, but this change by itself makes things a little clearer: we don't try to gather to a different type or number-of-values than whatever is passed in as the value list itself.	2020-09-21 12:57:28 -04:00
Arthur Eubanks	024979b7b6	[ObjCARC][NewPM] Port objc-arc-contract to NPM Similar to https://reviews.llvm.org/D86178. This is a module pass instead of a function pass since ARCRuntimeEntryPoints can lazily add function declarations. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D87806	2020-09-21 09:40:14 -07:00
Simon Pilgrim	3ddecfd220	SLPVectorizer.cpp - fix include ordering. NFCI.	2020-09-21 17:17:11 +01:00
Alexey Bataev	3ff07fcd54	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-21 10:51:03 -04:00
Florian Hahn	57ae9bb932	[LSR] Preserve MSSA when using SplitCriticalEdge. LSR claims to MemorySSA, but we also have to make sure it is preserved when splitting critical edges. This can be done by passing MSSAU to SplitCriticalEdge. Fixes PR47557.	2020-09-21 09:51:26 +01:00
Sanjay Patel	7903ae4720	[InstCombine] factorize left shifts of add/sub We do similar factorization folds in SimplifyUsingDistributiveLaws, but that drops no-wrap properties. Propagating those optimally may help solve: https://llvm.org/PR47430 The propagation is all-or-nothing for these patterns: when all 3 incoming ops have nsw or nuw, the 2 new ops should have the same no-wrap property: https://alive2.llvm.org/ce/z/Dv8wsU This also solves: https://llvm.org/PR47584	2020-09-20 12:55:24 -04:00
Sanjay Patel	cf75e83275	[InstCombine] replace zombie unreachable values with 'undef' before erasing The test (currently crashing) is reduced from the example provided in the post-commit discussion in D87149. Differential Revision: https://reviews.llvm.org/D87965	2020-09-20 12:25:08 -04:00
Fangrui Song	871d03a675	[FunctionAttrs] Inline setDoesNotRecurse() and delete it. NFC It always returns true, which may lead to confusion. Inline it because it is trivial and only called twice.	2020-09-19 22:24:52 -07:00
Fangrui Song	0526713aa8	[FunctionAttrs] Remove redundant check. NFC	2020-09-19 20:46:18 -07:00
Fangrui Song	6913812abc	Fix some clang-tidy bugprone-argument-comment issues	2020-09-19 20:41:25 -07:00
Nikita Popov	f4e5541809	[Local] Clean up enforceKnownAlignment() (NFC) I want to export this function, and the current API was a bit weird: It took an additional Alignment argument that didn't really have anything to do with what the function does. Drop it, and perform a max at the callsite. Also rename it to tryEnforceAlignment().	2020-09-19 22:29:40 +02:00
Florian Hahn	1d8f2e5292	[SCEVExpander] Support expanding nonintegral pointers with constant base. Currently SCEVExpander creates inttoptr for non-integral pointers if the base is a null constant for example. This results in invalid IR. This patch changes InsertNoopCastOfTo to emit a GEP & bitcast to convert to a non-integral pointer. First, a GEP of i8* null is generated and the integral value is used as index. The GEP is then bitcasted to the target type. This was exposed by D71539. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87827	2020-09-19 17:19:53 +01:00
Xun Li	11453740bc	[ASAN] Properly deal with musttail calls in ASAN When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR. This patch fixes the issue by moving the insertion point prior to the musttail call if there is one. Differential Revision: https://reviews.llvm.org/D87777	2020-09-18 23:10:34 -07:00
Fangrui Song	76eec6c95b	[SCEV] Fix an unused variable in -DLLVM_ENABLE_ASSERTIONS=off build	2020-09-18 16:19:05 -07:00
Eric Christopher	ecfd8161bf	Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions." as it's infinite looping on occasion. This reverts commit `455ca0ebb6`.	2020-09-18 12:50:04 -07:00
Huihui Zhang	9ad6049736	[InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant. We cannot iterate on scalable vector, the number of elements is unknown at compile-time. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87918	2020-09-18 11:26:36 -07:00
Alexey Bataev	455ca0ebb6	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-18 09:34:59 -04:00
Florian Hahn	9d172c8e9c	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." This switches to using DSE + MemorySSA by default again, after fixing the issues reported after the first commit. Notable fixes `fc82006331`, `a0017c2bc2`. This reverts commit `3a59628f3c`.	2020-09-18 11:05:00 +01:00
Nikita Popov	13e19d2e7c	Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc" This reverts commit `05d4c4ebc2`. mstorsjo reports a miscompile after this change in https://reviews.llvm.org/D87188#2281093. Reverting until I can investigate this.	2020-09-18 09:38:26 +02:00
Nikita Popov	05d4c4ebc2	[InstCombine] Canonicalize SPF_ABS to abs intrinc Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic. To be conservative, the one-use check on the comparison is retained, this may be relaxed if all goes well. It's pretty likely that this will uncover places that missing handling for the abs() intrinsic. Please report any seen performance regressions. Differential Revision: https://reviews.llvm.org/D87188	2020-09-17 22:28:34 +02:00
Whitney Tsang	1cee33e9db	[LoopUnrollAndJam] Allow unroll and jam loops forced by user. Summary: Allow unroll and jam loops forced by user. LoopUnrollAndJamPass is still disabled by default in the NPM pipeline, and can be controlled by -enable-npm-unroll-and-jam. Reviewed By: Meinersbur, dmgreen Differential Revision: https://reviews.llvm.org/D87786	2020-09-17 19:40:14 +00:00
Nikita Popov	91ce8e121b	[GVN] Use that assume(!X) implies X==false (PR47496) We already use that assume(X) implies X==true, do the same for assume(!X) implying X==false. This fixes PR47496.	2020-09-17 21:34:44 +02:00
Sanjay Patel	48a23bccf3	[VectorCombine] limit load+insert transform to one-use As discussed in: https://llvm.org/PR47558 ...there are several potential fixes/follow-ups visible in the test case, but this is the quickest and safest fix of the perf regression.	2020-09-17 14:29:15 -04:00
Sanjay Patel	ddd9575d15	[VectorCombine] rearrange bailouts for load insert for efficiency; NFC	2020-09-17 13:50:37 -04:00
Xun Li	5b533d6cde	[Coroutine] Fix a bug where Coroutine incorrectly spills phi and invoke defs before CoroBegin When a spill definition is before CoroBegin, we cannot spill it to the frame immediately after the definition. We have to spill it after the frame is ready. The current implementation handles it properly for any other kinds of instructions except for PhINode and InvokeInst, which could also be defined before CoroBegin. This patch fixes it by moving the CoroBegin dominance check earlier, so that it covers all cases. Added a test. Differential Revision: https://reviews.llvm.org/D87810	2020-09-17 08:13:07 -07:00
Sanjay Patel	03783f19dc	[SLP] sort candidates to increase chance of optimal compare reduction This is one (small) part of improving PR41312: https://llvm.org/PR41312 As shown there and in the smaller tests here, if we have some member of the reduction values that does not match the others, we want to push it to the end (bring the matching members forward and together). In the regression tests, we have 5 candidates for the 4 slots of the reduction. If the one "wrong" compare is grouped with the others, it prevents forming the ideal v4i1 compare reduction. Differential Revision: https://reviews.llvm.org/D87772	2020-09-17 08:49:27 -04:00
Roman Lebedev	aadf55d1ce	[NFC] EliminateDuplicatePHINodes(): small-size optimization: if there are <= 32 PHI's, O(n^2) algo is faster (geomean -0.08%) This is functionally equivalent to the old implementation. As per https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=4739e6e4eb54d3736e6457249c0919b30f6c855a&stat=instructions this is a clear geomean compile-time regression-free win with overall geomean of `-0.08%` 32 PHI's appears to be the sweet spot; both the 16 and 64 performed worse: https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=c4efe1fbbfdf0305ac26cd19eacb0c7774cdf60e&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=5f4e9bf6416e45eba483a4e5e263749989fdb3b3&to=e4989d1c67010d3339d1a40ff5286a31f10cfe82&stat=instructions If we have more PHI's than that, we fall-back to the original DenseSet-based implementation, so the not-so-fast cases will still be handled. However compile-time isn't the main motivation here. I can name at least 3 limitations of this CSE: 1. Assumes that all PHI nodes have incoming basic blocks in the same order (can be fixed while keeping the DenseMap) 2. Does not special-handle `undef` incoming values (i don't see how we can do this with hashing) 3. Does not special-handle backedge incoming values (maybe can be fixed by hashing backedge as some magical value) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87408	2020-09-17 11:29:03 +03:00
Michael Liao	4e4c89b22c	[EarlyCSE] Simplify max/min pattern matching. NFC.	2020-09-16 18:34:46 -04:00
Nikita Popov	222bf3ffbc	Reapply [InstCombine] Simplify select operand based on equality condition Reapply after fixing SimplifyWithOpReplaced() to never return the original value, which would lead to an infinite loop in this transform. ----- For selects of the type X == Y ? A : B, check if we can simplify A by using the X == Y equality and replace the operand if that's possible. We already try to do this in InstSimplify, but will only fold if the result of the simplification is the same as B, in which case the select can be dropped entirely. Here the select will be retained, just one operand simplified. As we are performing an actual replacement here, we don't have problems with refinement / poison values. Differential Revision: https://reviews.llvm.org/D87480	2020-09-16 20:53:58 +02:00
Arthur Eubanks	c27b64bbe1	[Coro][NewPM] Handle llvm.coro.prepare.retcon in NPM coro-split pass Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D87731	2020-09-16 09:09:10 -07:00
Dangeti Tharun kumar	01e2b394ee	[Partial Inliner] Compute intrinsic cost through TTI https://bugs.llvm.org/show_bug.cgi?id=45932 assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost. Intrinsics like "assume" are considered regular function calls while computing costs. This patch enables computeBBInlineCost to queries TTI for intrinsic call cost. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87132	2020-09-16 15:12:31 +01:00
Sanjay Patel	24238f09ed	[SLP] fix formatting; NFC Also move variable declarations closer to usage and add code comments.	2020-09-16 08:50:27 -04:00
Sanjay Patel	6a23668e78	[SLP] remove uses of 'auto' that obscure functionality; NFC	2020-09-16 08:26:21 -04:00
Sanjay Patel	0cee1bf5d1	[SLP] remove redundant size check; NFC We bail out on small array size anyway.	2020-09-16 08:11:19 -04:00
Sanjay Patel	bbad998bab	[SLP] move loop index variable declaration to its use; NFC	2020-09-16 07:59:31 -04:00
Sanjay Patel	158989184e	[SLP] change poorly named variable; NFC 'V' shadows a function argument.	2020-09-16 07:59:31 -04:00
Arthur Eubanks	ba12e77ec1	[NewPM] Port strip* passes to NPM strip-nondebug and strip-debug-declare have no existing associated tests Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D87639	2020-09-15 18:25:12 -07:00
Arthur Eubanks	f7aa1563eb	[LowerSwitch][NewPM] Port lowerswitch to NPM Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D87726	2020-09-15 18:18:31 -07:00
Wenlei He	2c391a5a14	[LICM] Make Loop ICM profile aware again D65060 was reverted because it introduced non-determinism by using BFI counts from already freed blocks. The parent of this revision fixes that by using a VH callback on blocks to prevent this from happening and makes sure BFI data is passed correctly in LoopStandardAnalysisResults. This re-introduces the previous optimization of using BFI data to prevent LICM from hoisting/sinking if the instruction will end up moving to a colder block. Internally at Facebook this change results in a ~7% win in a CPU related metric in one of our big services by preventing hoisting cold code into a hot pre-header like the added test case demonstrates. Testing: ninja check Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87551	2020-09-15 17:21:58 -07:00
Wenlei He	2ea4c2c598	[BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults ~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~ ~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~ ~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~ ~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~ ~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~ ~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~ See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 . This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes. Testing Ninja check Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D86156	2020-09-15 16:16:24 -07:00
Xun Li	7b4cc0961b	[TSAN] Handle musttail call properly in EscapeEnumerator (and TSAN) Call instructions with musttail tag must be optimized as a tailcall, otherwise could lead to incorrect program behavior. When TSAN is instrumenting functions, it broke the contract by adding a call to the tsan exit function inbetween the musttail call and return instruction, and also inserted exception handling code. This happend throguh EscapeEnumerator, which adds exception handling code and returns ret instructions as the place to insert instrumentation calls. This becomes especially problematic for coroutines, because coroutines rely on tail calls to do symmetric transfers properly. To fix this, this patch moves the location to insert instrumentation calls prior to the musttail call for ret instructions that are following musttail calls, and also does not handle exception for musttail calls. Differential Revision: https://reviews.llvm.org/D87620	2020-09-15 15:20:05 -07:00
Huihui Zhang	3b7f5166bd	[SLPVectorizer][SVE] Skip scalable-vector instructions before vectorizeSimpleInstructions. For scalable type, the aggregated size is unknown at compile-time. Skip instructions with scalable type to ensure the list of instructions for vectorizeSimpleInstructions does not contains any scalable-vector instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87550	2020-09-15 13:10:15 -07:00
Matt Arsenault	7d6ca2ec57	InferAddressSpaces: Fix assert with unreachable code Invalid IR in unreachable code is technically valid IR. In this case, the address space of the value was never inferred, and we tried to rewrite it with an invalid address space value which would assert.	2020-09-15 15:48:43 -04:00
Florian Hahn	3d42d54955	[ConstraintElimination] Add constraint elimination pass. This patch is a first draft of a new pass that adds a more flexible way to eliminate compares based on more complex constraints collected from dominating conditions. In particular, it aims at simplifying conditions of the forms below using a forward propagation approach, rather than instcomine-style ad-hoc backwards walking of def-use chains. if (x < y) if (y < z) if (x < z) <- simplify or if (x + 2 < y) if (x + 1 < y) <- simplify assuming no wraps The general approach is to collect conditions and blocks, sort them by dominance and then iterate over the sorted list. Conditions are turned into a linear inequality and add it to a system containing the linear inequalities that hold on entry to the block. For blocks, we check each compare against the system and see if it is implied by the constraints in the system. We also keep a stack of processed conditions and remove conditions from the stack and the constraint system once they go out-of-scope (= do not dominate the current block any longer). Currently there still are the least the following areas for improvements * Currently large unsigned constants cannot be added to the system (coefficients must be represented as integers) * The way constraints are managed currently is not very optimized. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D84547	2020-09-15 19:31:11 +01:00
Florian Hahn	3a59628f3c	Revert "[DSE] Switch to MemorySSA-backed DSE by default." This reverts commit `fb109c42d9`. Temporarily revert due to a mis-compile pointed out at D87163.	2020-09-15 18:07:56 +01:00
Fangrui Song	4452cc4086	[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan Similar to the tsan suppression in `Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN), the D81766 optimization should be suppressed under tsan due to potential spurious data race reports: struct A { int i; const short s; // the load cannot be vectorized because int modify; // it overlaps with bytes being concurrently modified long pad1, pad2; }; // __tsan_read16 does not know that some bytes are undef and accessing is safe Similarly, under asan, users can mark memory regions with `__asan_poison_memory_region`. A widened load can lead to a spurious use-after-poison error. hwasan/memtag should be similarly suppressed. `mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so we need to exclude memtag in `vectorizeLoadInsert`. Note, memtag suppression can be relaxed if the load is aligned to the its granule (usually 16), but that is out of scope of this patch. Reviewed By: spatel, vitalybuka Differential Revision: https://reviews.llvm.org/D87538	2020-09-15 09:47:21 -07:00
Simon Pilgrim	2b42d53e5e	SLPVectorizer.h - remove unnecessary AliasAnalysis.h include. NFCI. Forward declare AAResults instead of the (old) AliasAnalysis type. Remove includes from SLPVectorizer.cpp that are already included in SLPVectorizer.h.	2020-09-15 16:24:05 +01:00
Simon Pilgrim	65c6ae3b6a	[Utils] isLegalToPromote - Fix missing null check before writing to FailureReason. The FailureReason input parameter maybe null, we check this in all other cases in the method but this one was missed somehow. Fixes clang-tidy warning.	2020-09-15 14:49:04 +01:00
Sanjay Patel	aa57c1c967	[InstCombine] fix bug in pow expansion There at least one other bug related to pow -> sqrt transforms: http://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html ...but we probably can't solve that without fixing this first.	2020-09-15 09:29:48 -04:00
Simon Pilgrim	796c805269	ProvenanceAnalysis.h - remove unnecessary AliasAnalysis.h include. NFCI. Forward declare AAResults instead of the (old) AliasAnalysis type.	2020-09-15 13:34:35 +01:00
Bjorn Pettersson	aa8be5aeea	[Scalarizer] Avoid changing name of non-instructions The "takeName" logic in ScalarizerVisitor::gather did not consider that the value vector could refer to non-instructions, such as global variables. This patch make sure that we avoid changing the name of a value if it isn't an instruction. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D87685	2020-09-15 14:15:50 +02:00
Benjamin Kramer	b768546fe0	Revert "[InstCombine] Simplify select operand based on equality condition" This reverts commit `cfff88c03c`. Sends instcombine into an infinite loop. ``` define i1 @foo(i32 %arg, i32 %arg1) { bb: %tmp = udiv i32 %arg, %arg1 %tmp2 = mul nsw i32 %tmp, %arg1 %tmp3 = icmp eq i32 %tmp2, %arg %tmp4 = select i1 %tmp3, i32 %tmp, i32 undef %tmp5 = icmp sgt i32 %tmp4, 255 ret i1 %tmp5 } ```	2020-09-15 12:22:47 +02:00
Simon Pilgrim	5f13d6c1ee	[Transforms][Coroutines] Add missing header path to CMakeLists.txt Helps Visual Studio check include dependencies.	2020-09-15 10:37:25 +01:00
David Sherwood	69cccb3189	[SVE] Fix isLoadInvariantInLoop for scalable vectors I've amended the isLoadInvariantInLoop function to bail out for scalable vectors for now since the invariant.start intrinsic is only ever generated by the clang frontend for thread locals or struct and class constructors, neither of which support sizeless types. In addition, the intrinsic itself does not currently support the concept of a scaled size, which makes it impossible to compare the sizes of different scalable objects, e.g. <vscale x 32 x i8> and <vscale x 16 x i8>. Added new tests here: Transforms/LICM/AArch64/sve-load-hoist.ll Transforms/LICM/hoisting.ll Differential Revision: https://reviews.llvm.org/D87227	2020-09-15 08:30:19 +01:00
Arthur Eubanks	10b12d4035	Reland [docs][NewPM] Add docs for writing NPM passes As to not conflict with the legacy PM example passes under llvm/lib/Transforms/Hello, this is under HelloNew. This makes the CMakeLists.txt and general directory structure less confusing for people following the example. Much of the doc structure was taken from WritinAnLLVMPass.rst. This adds a HelloWorld pass which simply prints out each function name. More will follow after this, e.g. passes over different units of IR, analyses. https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more. Relanded with missing "Support" dependency in LLVMBuild.txt. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86979	2020-09-14 16:06:19 -07:00
Arthur Eubanks	39ec36415d	Revert "[docs][NewPM] Add docs for writing NPM passes" This reverts commit `c2590de30d`. Breaks shared libs build	2020-09-14 15:55:17 -07:00
Arthur Eubanks	f3d8344854	[PruneEH][NFC] Use CallGraphUpdater in PruneEH In preparation for porting the pass to NPM. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87632	2020-09-14 14:43:19 -07:00
Arthur Eubanks	c2590de30d	[docs][NewPM] Add docs for writing NPM passes As to not conflict with the legacy PM example passes under llvm/lib/Transforms/Hello, this is under HelloNew. This makes the CMakeLists.txt and general directory structure less confusing for people following the example. Much of the doc structure was taken from WritinAnLLVMPass.rst. This adds a HelloWorld pass which simply prints out each function name. More will follow after this, e.g. passes over different units of IR, analyses. https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86979	2020-09-14 13:26:03 -07:00
Teresa Johnson	226d80ebe2	[MemProf] Rename HeapProfiler to MemProfiler for consistency This is consistent with the clang option added in `7ed8124d46`, and the comments on the runtime patch in D87120. Differential Revision: https://reviews.llvm.org/D87622	2020-09-14 13:14:57 -07:00
Nikita Popov	cfff88c03c	[InstCombine] Simplify select operand based on equality condition For selects of the type X == Y ? A : B, check if we can simplify A by using the X == Y equality and replace the operand if that's possible. We already try to do this in InstSimplify, but will only fold if the result of the simplification is the same as B, in which case the select can be dropped entirely. Here the select will be retained, just one operand simplified. As we are performing an actual replacement here, we don't have problems with refinement / poison values. Differential Revision: https://reviews.llvm.org/D87480	2020-09-14 20:07:06 +02:00
Simon Pilgrim	4ff4708d39	collectBitParts - use const references. NFCI. Fixes clang-tidy warnings first noticed on D87452.	2020-09-14 18:23:00 +01:00
Florian Hahn	f715d81c9d	[DSE] Only eliminate candidates that always store the same loc. AliasAnalysis/MemoryLocation does not account for loops. Two MemoryLocation can be must-overwrite, even if the first one writes multiple locations in a loop. This patch prevents removing such stores, by only considering candidates that are known to be loop invariant, or executed in the same BB. Currently the invariant check is quite conservative and only considers Alloca and Alloca-like instructions and arguments as invariant base pointers. It also considers GEPs with all constant indices and invariant bases as invariant. This can be improved in the future, but the current implementation has only minor impact on the total number of stores eliminated (25903 vs 26047 for the baseline). There are some 2-10% swings for some individual benchmarks. In roughly half of the cases, the number of stores removed increases actually, because we skip candidates that are unlikely to be valid candidates early.	2020-09-14 12:06:58 +01:00
David Sherwood	816663adb5	[SVE] In LoopIdiomRecognize::isLegalStore bail out for scalable vectors The function LoopIdiomRecognize::isLegalStore looks for stores in loops that could be transformed into memset or memcpy. However, the algorithm currently requires that we know how big the store is at runtime, i.e. that the store size will not overflow an unsigned integer. For scalable vectors we cannot guarantee this so I have changed the code to bail out for now. In addition, even if we add a way to query the maximum value of vscale in future we will still need to update the algorithm to cope with non-constant strides. The additional cost associated with calculating the memset and memcpy arguments will need to be taken into account as well. This patch also fixes up an implicit TypeSize -> uint64_t cast, thereby removing a warning. I've added tests here showing a fixed width vector loop being transformed into memcpy, and a scalable vector loop remaining unchanged: Transforms/LoopIdiom/memcpy-vectors.ll Differential Revision: https://reviews.llvm.org/D87439	2020-09-14 11:28:31 +01:00
David Stenberg	bfcb824ba5	[JumpThreading] Fix an incorrect Modified status This fixes PR47297. When ProcessBlock() was able to constant fold the terminator's condition, but not do any more transformations, the function would return false, which would lead to the JumpThreading pass returning an incorrect modified status. This patch makes so that ProcessBlock() returns true in such cases. This will trigger an unnecessary invocation of ProcessBlock() in such cases, but this should be rare to occur. This was caught using the check introduced by D80916. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87392	2020-09-14 10:36:13 +02:00
Jay Foad	9a4476072e	[UnifyLoopExits] Fix non-deterministic iteration order This was causing random minor codegen differences in shaders compiled with the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D87548	2020-09-14 09:09:58 +01:00
David Blaikie	6e06f1cd08	GCOVProfiling: Avoid use-after-move Turns out this was use-after-move of function_ref, which is trivially copyable and movable, so the move did nothing and use after move was safe. But since this function_ref is being copied into a std::function, change the function_ref to be std::function to avoid extra layers of type erasure indirection - and then it's a real use after move, and fix that by referring to the moved-to member variable rather than the moved-from parameter.	2020-09-13 12:54:36 -07:00
Fangrui Song	5f4e9bf641	[gcov] Fix memory leak due to BranchProbabilityInfoWrapperPass This is weird.	2020-09-13 00:44:32 -07:00
Fangrui Song	63182c2ac0	[gcov] Add spanning tree optimization gcov is an "Edge Profiling with Edge Counters" application according to Optimally Profiling and Tracing Programs (1994). The minimum number of counters necessary is \|E\|-(\|V\|-1). The unmeasured edges form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage this optimization. This patch implements the optimization for clang --coverage. The produced .gcda files are much smaller now.	2020-09-13 00:07:31 -07:00
Fangrui Song	f086e85eea	[gcov] Assign names to some types and loaded values used in @__llvm_internal* This makes the generated IR much more readable.	2020-09-12 22:42:37 -07:00
Fangrui Song	d6fadc49e3	[gcov] Process .gcda immediately after the accompanying .gcno instead of doing all .gcda after all .gcno i.e. change the work flow from * .gcno for function A * .gcno for function B * .gcno for function C * .gcda for function A * .gcda for function B * .gcda for function C to * .gcno for function A * .gcda for function A * .gcno for function B * .gcda for function B * .gcno for function C * .gcda for function C Currently there is duplicate logic in .gcno & .gcda processing: how functions are filtered, which edges are instrumented, etc. This refactor enables simplification. Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode, __llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.	2020-09-12 13:53:03 -07:00
Fangrui Song	7d3825ed95	Revert "[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering" This reverts commit `412c9c0bf2`.	2020-09-12 12:34:43 -07:00
Fangrui Song	412c9c0bf2	[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering	2020-09-12 12:21:32 -07:00
Fangrui Song	c55c14837e	[gcov] Clean up by getting llvm.dbg.cu earlier	2020-09-12 12:21:32 -07:00
Florian Hahn	e082dee2b5	[DSE] Bail out on MemoryPhis when deleting stores at end of function. When deleting stores at the end of a function, we have to do PHI translation, otherwise we might miss reads in different iterations of a loop. See multiblock-loop-carried-dependence.ll for details. This fixes a mis-compile and surprisingly also increases the number of eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto. This is most likely because we save budget by not exploring through MemoryPhis, which are less likely to result in valid candidates for elimination. The issue was reported post-commit for `fb109c42d9`.	2020-09-12 19:05:59 +01:00
David Green	74760bb00f	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
Tyker	78de7297ab	Reland [AssumeBundles] Use operand bundles to encode alignment assumptions NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Complemantary to the assumption outliner prototype in D71692, this patch shows how we could simplify the code emitted for an alignemnt assumption. The generated code is smaller, less fragile, and it makes it easier to recognize the additional use as a "assumption use". As mentioned in D71692 and on the mailing list, we could adopt this scheme, and similar schemes for other patterns, without adopting the assumption outlining.	2020-09-12 15:36:06 +02:00
Nikita Popov	36e2e2e12e	[InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322) This is a followup to D86834, which partially fixed this issue in InstSimplify. However, InstCombine repeats the same transform while dropping poison flags -- which does not cover cases where poison is introduced in some other way. The fix here is a bit more comprehensive, because things are quite entangled, and it's hard to only partially address it without regressing optimization. There are really two changes here: * Export the SimplifyWithOpReplaced API from InstSimplify, with an added AllowRefinement flag. For replacements inside the TrueVal we don't actually care whether refinement occurs or not, the replacement is always legal. This part of the transform is now done in InstSimplify only. (It should be noted that the current AllowRefinement check is not sufficient -- that's an issue we need to address separately.) * Change the InstCombine fold to work by temporarily dropping poison generating flags, running the fold and then restoring the flags if it didn't work out. This will ensure that the InstCombine fold is correct as long as the InstSimplify fold is correct. Differential Revision: https://reviews.llvm.org/D87445	2020-09-12 14:45:06 +02:00
Sanjay Patel	40f12ef621	[SLP] further limit bailout for load combine candidate (PR47450) The test example based on PR47450 shows that we can match non-byte-sized shifts, but those won't ever be bswap opportunities. This isn't a full fix (we'd still match if the shifts were by 8-bits for example), but this should be enough until there's evidence that we need to do more (this is a borderline case for vectorization in the first place).	2020-09-11 11:56:11 -04:00
Krzysztof Parzyszek	f92908cc74	[DSE] Make sure that DSE+MSSA can handle masked stores Differential Revision: https://reviews.llvm.org/D87414	2020-09-11 10:00:21 -05:00
Sanjay Patel	6aa3fc4a5b	Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)" This reverts commit `324a53205a`. On closer examination of at least one of the test diffs, this does not appear to be correct in all cases. Even the existing 'nsw' creation may be wrong based on this example: https://alive2.llvm.org/ce/z/uL4Hw9 https://alive2.llvm.org/ce/z/fJMKQS	2020-09-11 10:54:48 -04:00
Sanjay Patel	324a53205a	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430) There's no signed wrap if both geps have 'inbounds': https://alive2.llvm.org/ce/z/nZkQTg https://alive2.llvm.org/ce/z/7qFauh	2020-09-11 10:39:09 -04:00
Simon Pilgrim	48b510c4bc	[NFC] Fix compiler warnings due to integer comparison of different signedness Fix by directly using INT_MAX and INT32_MAX. Patch by: @nullptr.cpp (Yang Fan) Differential Revision: https://reviews.llvm.org/D87347	2020-09-11 15:32:03 +01:00
David Sherwood	1e1770a07e	[SVE][CodeGen] Fix InlineFunction for scalable vectors When inlining functions containing allocas of scalable vectors we cannot specify the size in the lifetime markers, since we don't know this at compile time. Added new test here: test/Transforms/Inline/AArch64/sve-alloca-merge.ll Differential Revision: https://reviews.llvm.org/D87139	2020-09-11 08:34:51 +01:00
Michael Liao	f787fe15d8	[EarlyCSE] Remove unnecessary operand swap. - As min/max are commutative operators, there is no need to swap operands. That breaks the convention calculating the hash value.	2020-09-11 02:14:04 -04:00
Michael Liao	41e68f7ee7	[EarlyCSE] Fix and recommit the revised `c9826829d7` In addition to calculate hash consistently by swapping SELECT's operands, we also need to inverse the select pattern favor to match the original logic. [EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 23:30:56 -04:00
Michael Liao	39dc75f66c	Revert "[EarlyCSE] Equivalent SELECTs should hash equally" This reverts commit `c9826829d7` as it breaks regression tests.	2020-09-10 22:37:35 -04:00
Florian Hahn	fb109c42d9	[DSE] Switch to MemorySSA-backed DSE by default. The tests have been updated and I plan to move them from the MSSA directory up. Some end-to-end tests needed small adjustments. One difference to the legacy DSE is that legacy DSE also deletes trivially dead instructions that are unrelated to memory operations. Because MemorySSA-backed DSE just walks the MemorySSA, we only visit/check memory instructions. But removing unrelated dead instructions is not really DSE's job and other passes will clean up. One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll, but I think this comes down to legacy DSE not handling instructions that may throw correctly in that case. To cover this with MemorySSA-backed DSE, we need an update to llvm.coro.begin to treat it's return value to belong to the same underlying object as the passed pointer. There are some minor cases MemorySSA-backed DSE currently misses, e.g. related to atomic operations, but I think those can be implemented after the switch. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers and details in the thread on llvm-dev. Impact on CTMark: ``` Legacy Pass Manager exec instrs size-text O3 + 0.60% - 0.27% ReleaseThinLTO + 1.00% - 0.42% ReleaseLTO-g. + 0.77% - 0.33% RelThinLTO (link only) + 0.87% - 0.42% RelLO-g (link only) + 0.78% - 0.33% ``` http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions ``` New Pass Manager exec instrs. size-text O3 + 0.95% - 0.25% ReleaseThinLTO + 1.34% - 0.41% ReleaseLTO-g. + 1.71% - 0.35% RelThinLTO (link only) + 0.96% - 0.41% RelLO-g (link only) + 2.21% - 0.35% ``` http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions Reviewed By: asbirlea, xbolva00, nikic Differential Revision: https://reviews.llvm.org/D87163	2020-09-10 22:24:32 +01:00
Bryan Chan	c9826829d7	[EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 16:59:24 -04:00
Christopher Tetreault	7ddfd9b3eb	[SVE] Bail from VectorUtils heuristics for scalable vectors Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of elements for scalable vectors. Assert that the mask type is not scalable in possiblyDemandedEltsInMask . Assert that the types are correct in all three functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87424	2020-09-10 12:29:37 -07:00

... 12 13 14 15 16 ...

26463 Commits