llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	0895b836d7	[SimplifyCFG] FoldBranchToCommonDest(): don't deal with unconditional branches The case where BB ends with an unconditional branch, and has a single predecessor w/ conditional branch to BB and a single successor of BB is exactly the pattern SpeculativelyExecuteBB() transform deals with. (and in this case they both allow speculating only a single instruction) Well, or FoldTwoEntryPHINode(), if the final block has only those two predecessors. Here, in FoldBranchToCommonDest(), only a weird subset of that transform is supported, and it's glued on the side in a weird way. In particular, it took me a bit to understand that the Cond isn't actually a branch condition in that case, but just the value we allow to speculate (otherwise it reads as a miscompile to me). Additionally, this only supports for the speculated instruction to be an ICmp. So let's just unclutter FoldBranchToCommonDest(), and leave this transform up to SpeculativelyExecuteBB(). As far as i can tell, this shouldn't really impact optimization potential, but if it does, improving SpeculativelyExecuteBB() will be more beneficial anyways. Notably, this only affects a single test, but EarlyCSE should have run beforehand in the pipeline, and then FoldTwoEntryPHINode() would have caught it. This reverts commit rL158392 / commit `d33f4efbfd`.	2021-01-22 17:22:49 +03:00
Anton Rapetov	a4914dc1f2	[SLP] do not traverse constant uses Walking the use list of a Constant (particularly, ConstantData) is not scalable, since a given constant may be used by many instructinos in many functions in many modules. Differential Revision: https://reviews.llvm.org/D94713	2021-01-22 08:14:09 -05:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Xun Li	bd3ca6666d	[Inlining] Delete redundant optnone/alwaysinline check The same check is done in InlineCost: `8b0bd54d0e/llvm/lib/Analysis/InlineCost.cpp (L2537-L2552)` Also, doing a check on the callee here is confusing, because anything that deals with callee should be done in the inner loop where we proecss all calls from the same caller. Differential Revision: https://reviews.llvm.org/D95186	2021-01-21 18:38:10 -08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Sanjay Patel	2f03528f5e	[SLP] rename reduction variable to avoid shadowing; NFC The code structure can likely be improved now that 'OperationData' is gone.	2021-01-21 16:02:38 -05:00
Anton Rapetov	bfec9148a0	Scalar: Don't visit constants in findInnerReductionPhi in LoopInterchange In LoopInterchange, `findInnerReductionPhi()` looks for reduction variables, which cannot be constants. Update it to return early in that case. This also addresses a blocker for removing use-lists from ConstantData, whose users could be spread across arbitrary modules in the same LLVMContext. Differential Revision: https://reviews.llvm.org/D94712	2021-01-21 12:33:06 -08:00
Sanjay Patel	d77753381f	[SLP] simplify reduction matching This is NFC-intended and removes the "OperationData" class which had become nothing more than a recurrence (reduction) type. I adjusted the matching logic to distinguish instructions from non-instructions - that's all that the "IsLeafValue" member was keeping track of.	2021-01-21 14:58:57 -05:00
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Sanjay Patel	070af1b788	[InstCombine] avoid crashing on attribute propagation In https://llvm.org/PR48810 , we are crashing while trying to propagate attributes from mempcpy (returns void*) to memcpy (returns nothing - void). We can avoid the crash by removing known incompatible attributes for the void return type. I'm not sure if this goes far enough (should we just drop all attributes since this isn't the same function?). We also need to audit other transforms in LibCallSimplifier to make sure there are no other cases that have the same problem. Differential Revision: https://reviews.llvm.org/D95088	2021-01-21 08:13:26 -05:00
Florian Hahn	bee486851c	[LoopUnswitch] Implement first version of partial unswitching. This patch applies the idea from D93734 to LoopUnswitch. It adds support for unswitching on conditions that are only invariant along certain paths through a loop. In particular, it targets conditions in the loop header that depend on values loaded from memory. If either path from the true or false successor through the loop does not modify memory, perform partial loop unswitching. That is, duplicate the instructions feeding the condition in the pre-header. Then unswitch on the duplicated condition. The condition is now known in the unswitched version for the 'invariant' path through the original loop. On caveat of this approach is that one of the loops created can be partially unswitched again. To avoid this behavior, `llvm.loop.unswitch.partial.disable` metadata is added to the unswitched loops, to avoid subsequent partial unswitching. If that's the approach to go, I can move the code handling the metadata kind into separate functions. This increases the cases we unswitch quite a bit in SPEC2006/SPEC2000 & MultiSource. It also allows us to eliminate a dead loop in SPEC2017's omnetpp ``` Tests: 236 Same hash: 170 (filtered out) Remaining: 66 Metric: loop-unswitch.NumBranches Program base patch diff test-suite...000/255.vortex/255.vortex.test 2.00 23.00 1050.0% test-suite...T2006/401.bzip2/401.bzip2.test 7.00 55.00 685.7% test-suite :: External/Nurbs/nurbs.test 5.00 26.00 420.0% test-suite...s-C/unix-smail/unix-smail.test 1.00 3.00 200.0% test-suite.../Prolangs-C++/ocean/ocean.test 1.00 3.00 200.0% test-suite...tions/lambda-0.1.3/lambda.test 1.00 3.00 200.0% test-suite...yApps-C++/PENNANT/PENNANT.test 2.00 5.00 150.0% test-suite...marks/Ptrdist/yacr2/yacr2.test 1.00 2.00 100.0% test-suite...lications/viterbi/viterbi.test 1.00 2.00 100.0% test-suite...plications/d/make_dparser.test 12.00 24.00 100.0% test-suite...CFP2006/433.milc/433.milc.test 14.00 27.00 92.9% test-suite.../Applications/lemon/lemon.test 7.00 12.00 71.4% test-suite...ce/Applications/Burg/burg.test 6.00 10.00 66.7% test-suite...T2006/473.astar/473.astar.test 16.00 26.00 62.5% test-suite...marks/7zip/7zip-benchmark.test 78.00 121.00 55.1% ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93764	2021-01-21 09:46:41 +00:00
Kazu Hirata	e53472de68	[Transforms] Use llvm::append_range (NFC)	2021-01-20 21:35:54 -08:00
Kazu Hirata	8f5da41c4d	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-20 21:35:52 -08:00
Dávid Bolvanský	bb3f169b59	[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95095	2021-01-21 00:12:37 +01:00
Mircea Trofin	ccec2cf1d9	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `d97f776be5`. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	95ce32c787	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Nikita Popov	1c6d1e57c1	[PredicateInfo] Handle logical and/or Teach PredicateInfo to handle logical and/or the same way as bitwise and/or. This allows handling logical and/or inside IPSCCP and NewGVN.	2021-01-20 21:03:07 +01:00
Nikita Popov	ca4ed1e7ae	[PredicateInfo] Generalize processing of conditions Branch/assume conditions in PredicateInfo are currently handled in a rather ad-hoc manner, with some arbitrary limitations. For example, an `and` of two `icmp`s will be handled, but an `and` of an `icmp` and some other condition will not. That also includes the case where more than two conditions and and'ed together. This patch makes the handling more general by looking through and/ors up to a limit and considering all kinds of conditions (though operands will only be taken for cmps of course). Differential Revision: https://reviews.llvm.org/D94447	2021-01-20 20:40:41 +01:00
Mircea Trofin	d97f776be5	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `e8aec763a5`.	2021-01-20 11:19:34 -08:00
Mircea Trofin	e8aec763a5	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
Dávid Bolvanský	16d6e85271	[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94850	2021-01-20 19:45:23 +01:00
Sanjay Patel	c09be0d2a0	[SLP] reduce reduction code for checking vectorizable ops; NFC This is another step towards removing `OperationData` and fixing FMF matching/propagation bugs when forming reductions.	2021-01-20 11:14:48 -05:00
Sanjay Patel	1c54112a57	[SLP] refactor more reduction functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param. More streamlining is possible, but I'm trying to avoid logic/typo bugs while fixing this. Eventually, we should not need the `OperationData` class.	2021-01-20 11:14:48 -05:00
Sanjay Patel	8590d24543	[SLP] move reduction createOp functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param.	2021-01-20 11:14:48 -05:00
Joseph Tremoulet	40cd262c43	Loop peeling: check that latch is conditional branch Loop peeling assumes that the loop's latch is a conditional branch. Add a check to canPeel that explicitly checks for this, and testcases that otherwise fail an assertion when trying to peel a loop whose back-edge is a switch case or the non-unwind edge of an invoke. Reviewed By: skatkov, fhahn Differential Revision: https://reviews.llvm.org/D94995	2021-01-20 11:01:16 -05:00
Chuanqi Xu	c1bc7981ba	[Coroutine] Remain alignment information when merging frame variables Summary: This is to address bug48712. The solution in this patch is that when we want to merge two variable a into the storage frame of variable b only if the alignment of a is multiple of b. There may be other strategies. But now I think they are hard to handle and benefit little. Or we can implement them in the future. Test-plan: check-llvm Reviewers: jmorse, lxfind, junparser Differential Revision: https://reviews.llvm.org/D94891	2021-01-20 18:59:00 +08:00
David Sherwood	255a507716	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	21b1ad0340	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Alexey Bataev	e463bd53c0	Revert "[SLP]Merge reorder and reuse shuffles." This reverts commit `438682de6a` to fix the bug with the reducing size of the resulting vector for the entry node with multiple users.	2021-01-19 11:48:04 -08:00
Mariya Podchishchaeva	7113de301a	[ScalarizeMaskedMemIntrin] Add missing dependency The pass has dependency on 'TargetTransformInfoWrapperPass', but the corresponding call to INITIALIZE_PASS_DEPENDENCY was missing. Differential Revision: https://reviews.llvm.org/D94916	2021-01-19 22:33:47 +03:00
Nikita Popov	21443381c0	Reapply [InstCombine] Replace one-use select operand based on condition Relative to the original change, this adds a check that the instruction on which we're replacing operands is safe to speculatively execute, because that's what we're effectively doing. We're executing the instruction with the replaced operand, which is fine if it's pure, but not fine if can cause side-effects or UB (aka is not speculatable). Additionally, we cannot (generally) replace operands in phi nodes, as these may refer to a different loop iteration. This is also covered by the speculation check. ----- InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-19 20:26:38 +01:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Hans Wennborg	58bdfcfac0	Revert `5238e7b302` "[InstCombine] Replace one-use select operand based on condition" This caused a miscompile in Chromium, see comments on the codereview for discussion and pointer to a reproducer. > InstCombine already performs a fold where X == Y ? f(X) : Z is > transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, > if f(X) only has one use, then we can always directly replace the > use inside the instruction. To actually be profitable, limit it to > the case where Y is a non-expr constant. > > This could be further extended to replace uses further up a one-use > instruction chain, but for now this only looks one level up. > > Among other things, this also subsumes D94860. > > Differential Revision: https://reviews.llvm.org/D94862 This also reverts the follow-up a003f26539cf4db744655e76c41f4c4a8913f116: > [llvm] Prevent infinite loop in InstCombine of select statements > > This fixes an issue where the RHS and LHS the comparison operation > creating the predicate were swapped back and forth forever. > > Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 11:50:56 +01:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Tres Popp	a003f26539	[llvm] Prevent infinite loop in InstCombine of select statements This fixes an issue where the RHS and LHS the comparison operation creating the predicate were swapped back and forth forever. Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 10:31:48 +01:00
David Sherwood	c3ce262794	[NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065	2021-01-19 09:08:40 +00:00
Juneyoung Lee	2d89ebd5d1	Address unused variable warning	2021-01-19 09:30:16 +09:00
Juneyoung Lee	0441df94ad	[InstCombine,InstSimplify] Optimize select followed by and/or/xor This patch adds `A & (A && B)` -> `A && B` (similarly for or + logical or) Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`. Alive2 proof: merge_and: https://alive2.llvm.org/ce/z/teMR97 merge_or: https://alive2.llvm.org/ce/z/b4yZUp xor_and: https://alive2.llvm.org/ce/z/_-TXHi xor_or: https://alive2.llvm.org/ce/z/2uYx_a Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94861	2021-01-19 09:14:17 +09:00
Juneyoung Lee	395c737d9f	[SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of (x == C1 \|\| x == C2 \|\| ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible. D93065 has more context about the transition, including links to the list of optimizations being updated. Differential Revision: https://reviews.llvm.org/D93943	2021-01-19 08:53:40 +09:00
Sanjay Patel	5b77ac32b1	[SLP] match maxnum/minnum intrinsics as FP reduction ops After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready. We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this should restore that functionality for SLP. There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). Fixing FMF propagation is a planned follow-up. Differential Revision: https://reviews.llvm.org/D94913	2021-01-18 17:37:16 -05:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	dc300beba7	[STLExtras] Add a default value to drop_begin This patch adds the default value of 1 to drop_begin. In the llvm codebase, 70% of calls to drop_begin have 1 as the second argument. The interface similar to with std::next should improve readability. This patch converts a couple of calls to drop_begin as examples. Differential Revision: https://reviews.llvm.org/D94858	2021-01-18 10:16:34 -08:00
Xun Li	1d04dc52dd	[Coroutine] Do not CoroElide if there are musttail calls This is to address https://bugs.llvm.org/show_bug.cgi?id=48626. When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal. We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway. Differential Revision: https://reviews.llvm.org/D94834	2021-01-18 09:06:21 -08:00
Sanjay Patel	3dbbadb8ef	[SLP] rename reduction query for min/max ops; NFC This will avoid confusion once we start matching min/max intrinsics. All of these hacks to accomodate cmp+sel idioms should disappear once we canonicalize to min/max intrinsics.	2021-01-18 09:32:57 -05:00
Sanjay Patel	d1c4e859ce	[SLP] reduce opcode API dependency in reduction cost calc; NFC The icmp opcode is now hard-coded in the cost model call. This will make it easier to eventually remove all opcode queries for min/max patterns as we transition to intrinsics.	2021-01-18 09:32:57 -05:00
Florian Hahn	e6d758de82	[InferAttrs] Mark some library functions as willreturn. This patch marks some library functions as willreturn. On the first pass, I excluded most functions that interact with streams/the filesystem. Along with willreturn, it also adds nounwind to a set of math functions. There probably are a few additional attributes we can add for those, but that should be done separately. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94684	2021-01-18 13:40:21 +00:00
Caroline Concatto	36710c38c1	[NFC]Migrate VectorCombine.cpp to use InstructionCost This patch changes these functions: vectorizeLoadInsert isExtractExtractCheap foldExtractedCmps scalarizeBinopOrCmp getShuffleExtract foldBitcastShuf to use the class InstructionCost when calling TTI.get<something>Cost(). This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 ps.:This patch adds the test \|\| !NewCost.isValid(), because we want to return false when: !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive and !NewCost.isValid() && OldCost.isValid() Therefore for simplication we only add test for !NewCost.isValid() Differential Revision: https://reviews.llvm.org/D94069	2021-01-18 13:37:21 +00:00
Dávid Bolvanský	ed396212da	[InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691) ``` unsigned r(int v) { return (1 \| -(v < 0)) * v; } `r` is equivalent to `abs(v)`. ``` ``` define <4 x i8> @src(<4 x i8> %0) { %1: %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 } %3 = or <4 x i8> %2, { 1, 1, 1, undef } %4 = mul nsw <4 x i8> %3, %0 ret <4 x i8> %4 } => define <4 x i8> @tgt(<4 x i8> %0) { %1: %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 } %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0 %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0 ret <4 x i8> %4 } Transformation seems to be correct! ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94874	2021-01-17 17:06:14 +01:00
Nikita Popov	5238e7b302	[InstCombine] Replace one-use select operand based on condition InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-16 23:25:02 +01:00
Roman Lebedev	32fc32317a	[SimplifyCFG] markAliveBlocks(): catchswitch: preserve PostDomTree When removing catchpad's from catchswitch, if that removes a successor, we need to record that in DomTreeUpdater. This fixes PostDomTree preservation failure in an existing test. This appears to be the single issue that i see in my current test coverage.	2021-01-17 01:21:05 +03:00
Sanjay Patel	49b96cd9ef	[SLP] remove opcode field from reduction data class This is NFC-intended and another step towards supporting intrinsics as reduction candidates. The remaining bits of the OperationData class do not make much sense as-is, so I will try to improve that, but I'm trying to take minimal steps because it's still not clear how this was intended to work.	2021-01-16 13:55:52 -05:00
Sanjay Patel	fcfcc3cc6b	[SLP] fix typos; NFC	2021-01-16 13:55:52 -05:00
Sanjay Patel	48dbac5b6b	[SLP] remove unnecessary use of 'OperationData' This is another NFC-intended patch to allow matching intrinsics (example: maxnum) as candidates for reductions. It's possible that the loop/if logic can be reduced now, but it's still difficult to understand how this all works.	2021-01-16 13:55:52 -05:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Dávid Bolvanský	a1500105ee	[SimplifyCFG] Optimize CFG when null is passed to a function with nonnull argument Example: ``` __attribute__((nonnull,noinline)) char * pinc(char p) { return ++p; } char foo(bool b, char a) { return pinc(b ? 0 : a); } ``` optimize to ``` char foo(bool b, char *a) { return pinc(a); } ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94180	2021-01-15 23:53:43 +01:00
Sanjay Patel	ceb3cdccd0	[SLP] remove dead code in reduction matching; NFC To get into this block we had: !A \|\| B \|\| C and we checked C in the first 'if' clause leaving !A \|\| B. But the 2nd 'if' is checking: A && !B --> !(!A \|\| B)	2021-01-15 17:03:26 -05:00
Nick Desaulniers	ed0fd567eb	BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor Otherwise we'll fail the assertion in SplitBlockPredecessors() related to splitting the edges from CallBr's. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1161 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1252 Reviewed By: void, MaskRay, jyknight Differential Revision: https://reviews.llvm.org/D88438	2021-01-15 13:51:47 -08:00
Roman Lebedev	a14c36fe27	[SimplifyCFG] switchToSelect(): don't forget to insert DomTree edge iff needed DestBB might or might not already be a successor of SelectBB, and it wasn't we need to ensure that we record the fact in DomTree. The testcase used to crash in lazy domtree updater mode + non-per-function domtree validity checks disabled.	2021-01-15 23:35:57 +03:00
Roman Lebedev	c6654a4cda	[SimplifyCFG][BasicBlockUtils] Port SplitBlockPredecessors()/SplitLandingPadPredecessors() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	286cf6cb02	[SimplifyCFG] Port SplitBlockAndInsertIfThen() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	c845c724c2	[Utils][SimplifyCFG] Port SplitBlock() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	b81f75fa79	[Utils] splitBlockBefore() always operates on DomTreeUpdater, so take it, not DomTree Even though not all it's users operate on DomTreeUpdater, it itself internally operates on DomTreeUpdater, so it must mean everything is fine with that, so just do that globally.	2021-01-15 23:35:56 +03:00
Sanjay Patel	1f21de535d	[SLP] remove unused reduction functions; NFC These were made obsolete by simplifying the code in recent patches.	2021-01-15 14:59:33 -05:00
Jamie Schmeiser	17d0fb7f57	Set option default for enabling memory ssa for new pass manager loop sink pass to true. Summary: Set the default for the option enabling memory ssa use in the loop sink pass to true for the new pass manager. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D92486	2021-01-15 09:56:44 -05:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	9bcc0d1040	[CodeGen, Transforms] Use llvm::sort (NFC)	2021-01-14 20:30:31 -08:00
Sanjay Patel	b21905dfe3	[SLP] remove unnecessary state in matching reductions This is NFC-intended. I'm still trying to figure out how the loop where this is used works. It does not seem like we require this data at all, but it's hard to confirm given the complicated predicates.	2021-01-14 18:32:37 -05:00
Bjorn Pettersson	d58512b2e3	[SLP] Don't vectorize stores of non-packed types (like i1, i2) In the spirit of commit `fc783e91e0` (llvm-svn: 248943) we shouldn't vectorize stores of non-packed types (i.e. types that has padding between consecutive variables in a scalar layout, but being packed in a vector layout). The problem was detected as a miscompile in a downstream test case. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D94446	2021-01-14 11:30:33 +01:00
Daniel Paoliello	ff5e896425	Fix unused variable in CoroFrame.cpp when building Release with GCC 10 When building with GCC 10, the following warning is reported: ``` /llvm-project/llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1527:28: warning: unused variable ‘CS’ [-Wunused-variable] 1527 \| if (CatchSwitchInst *CS = ``` This change adds a cast to `void` to avoid the warning. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94456	2021-01-13 22:53:25 -08:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Wei Mi	86341247c4	[NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h to Pass.h. In some compiler passes like SampleProfileLoaderPass, we want to know which LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and it also cannot represent phase in full LTO. The patch extends it to include full LTO phases and move it from PassBuilder.h to Pass.h, then it is much easier for PassBuilder to communiate with each pass about current LTO phase. Differential Revision: https://reviews.llvm.org/D94613	2021-01-13 15:55:40 -08:00
Arthur Eubanks	39e6d24237	[NewPM] Only non-trivially loop unswitch at -O3 and for non-optsize functions This matches the legacy pipeline/pass. Reviewed By: asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D94559	2021-01-13 14:54:49 -08:00
Kazu Hirata	fb98a1be43	Fix the warnings on unused variables (NFC)	2021-01-13 13:32:40 -08:00
Sanjay Patel	123674a816	[SLP] simplify type check for reductions This is NFC-intended. The 'valid' call allows int/FP/pointers for other parts of SLP. The difference here is that we can't reduce pointers.	2021-01-13 13:30:46 -05:00
Andrew Litteken	05b1a15f70	[IROutliner] Adapting to hoisted bitcasts in CodeExtractor In commit `700d2417d8` the CodeExtractor was updated so that bitcasts that have lifetime markers that beginning outside of the region are deduplicated outside the region and are not used as an output. This caused a discrepancy in the IROutliner, where in these cases there were arguments added to the aggregate function that were not needed causing assertion errors. The IROutliner queries the CodeExtractor twice to determine the inputs and outputs, before and after `findAllocas` is called with the same ValueSet for the outputs causing the duplication. This has been fixed with a dummy ValueSet for the first call. However, the additional bitcasts prevent us from using the same similarity relationships that were previously defined by the IR Similarity Analysis Pass. In these cases, we check whether the initial version of the region being analyzed for outlining is still the same as it was previously. If it is not, i.e. because of the additional bitcast instructions from the CodeExtractor, we discard the region. Reviewers: yroux Differential Revision: https://reviews.llvm.org/D94303	2021-01-13 11:10:37 -06:00
Nikita Popov	17863614da	[InstCombine] Fold select -> and/or using impliesPoison We can fold a ? b : false to a & b if is_poison(b) implies that is_poison(a), at which point we're able to reuse all the usual fold on ands. In particular, this covers the very common case of icmp X, C && icmp X, C'. The same applies to ors. This currently only has an effect if the -instcombine-unsafe-select-transform=0 option is set. Differential Revision: https://reviews.llvm.org/D94550	2021-01-13 17:45:40 +01:00
David Sherwood	4cd48535ec	[NFC][InstructionCost] Use InstructionCost in Transforms/Scalar/RewriteStatepointsForGC.cpp In places where we calculate costs using TTI.getXXXCost() interfaces I have changed the code to use InstructionCost instead of unsigned. The change is non functional since InstructionCost behaves in the same way as an integer for valid costs. Currently the getXXXCost() functions used in this file do not return invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential revision: https://reviews.llvm.org/D94484	2021-01-13 09:42:58 +00:00
Kazu Hirata	8a20e2b3d3	[llvm] Use Optional::getValueOr (NFC)	2021-01-12 21:43:50 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
Yuanfang Chen	5c7dcd7aea	[Coroutine] Update promise object's final layout index promise is a header field but it is not guaranteed that it would be the third field of the frame due to `performOptimizedStructLayout`. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94137	2021-01-12 17:44:02 -08:00
Luo, Yuanke	055644cc45	[X86][AMX] Prohibit pointer cast on load. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D94372	2021-01-13 09:39:19 +08:00
Hongtao Yu	175288a1af	Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names. Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94455	2021-01-12 15:15:53 -08:00
modimo	2a49b7c64a	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Nikita Popov	23390e7a13	[InstCombine] Handle logical and/or in assume optimization assume(a && b) can be converted to assume(a); assume(b) even if the condition is logical. Same for assume(!(a \|\| b)).	2021-01-12 22:36:40 +01:00
Sanjay Patel	9e7895a868	[SLP] reduce code duplication while processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	92fb5c49e8	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	554be30a42	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	46507a96fc	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
Philip Reames	caafdf07bb	[LV] Weaken spuriously strong assert in LoopVersioning LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here. Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.	2021-01-12 12:57:13 -08:00
Philip Reames	9f61fbd75a	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Florian Hahn	6cd44b204c	[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`. Similar to D94125, derive `willreturn` for functions that are `readonly` and `mustprogress` in FunctionAttrs. To quote the reasoning from D94125: Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: jdoerfert, nikic Differential Revision: https://reviews.llvm.org/D94502	2021-01-12 20:02:34 +00:00
Dávid Bolvanský	0529946b5b	[instCombine] Add (A ^ B) \| ~(A \| B) -> ~(A & B) define i32 @src(i32 %x, i32 %y) { %0: %xor = xor i32 %y, %x %or = or i32 %y, %x %neg = xor i32 %or, 4294967295 %or1 = or i32 %xor, %neg ret i32 %or1 } => define i32 @tgt(i32 %x, i32 %y) { %0: %and = and i32 %x, %y %neg = xor i32 %and, 4294967295 ret i32 %neg } Transformation seems to be correct! https://alive2.llvm.org/ce/z/Cvca4a	2021-01-12 19:29:17 +01:00
Quentin Colombet	905623b64d	[NFC][LICM] Minor improvements to debug output Added a utility function in Value class to print block name and use block labels for unnamed blocks. Changed LICM to call this function in its debug output. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D93577	2021-01-11 18:02:49 -08:00

1 2 3 4 5 ...

26382 Commits