llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	c09be0d2a0	[SLP] reduce reduction code for checking vectorizable ops; NFC This is another step towards removing `OperationData` and fixing FMF matching/propagation bugs when forming reductions.	2021-01-20 11:14:48 -05:00
Sanjay Patel	1c54112a57	[SLP] refactor more reduction functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param. More streamlining is possible, but I'm trying to avoid logic/typo bugs while fixing this. Eventually, we should not need the `OperationData` class.	2021-01-20 11:14:48 -05:00
Sanjay Patel	8590d24543	[SLP] move reduction createOp functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param.	2021-01-20 11:14:48 -05:00
Joseph Tremoulet	40cd262c43	Loop peeling: check that latch is conditional branch Loop peeling assumes that the loop's latch is a conditional branch. Add a check to canPeel that explicitly checks for this, and testcases that otherwise fail an assertion when trying to peel a loop whose back-edge is a switch case or the non-unwind edge of an invoke. Reviewed By: skatkov, fhahn Differential Revision: https://reviews.llvm.org/D94995	2021-01-20 11:01:16 -05:00
Chuanqi Xu	c1bc7981ba	[Coroutine] Remain alignment information when merging frame variables Summary: This is to address bug48712. The solution in this patch is that when we want to merge two variable a into the storage frame of variable b only if the alignment of a is multiple of b. There may be other strategies. But now I think they are hard to handle and benefit little. Or we can implement them in the future. Test-plan: check-llvm Reviewers: jmorse, lxfind, junparser Differential Revision: https://reviews.llvm.org/D94891	2021-01-20 18:59:00 +08:00
David Sherwood	255a507716	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	21b1ad0340	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Alexey Bataev	e463bd53c0	Revert "[SLP]Merge reorder and reuse shuffles." This reverts commit `438682de6a` to fix the bug with the reducing size of the resulting vector for the entry node with multiple users.	2021-01-19 11:48:04 -08:00
Mariya Podchishchaeva	7113de301a	[ScalarizeMaskedMemIntrin] Add missing dependency The pass has dependency on 'TargetTransformInfoWrapperPass', but the corresponding call to INITIALIZE_PASS_DEPENDENCY was missing. Differential Revision: https://reviews.llvm.org/D94916	2021-01-19 22:33:47 +03:00
Nikita Popov	21443381c0	Reapply [InstCombine] Replace one-use select operand based on condition Relative to the original change, this adds a check that the instruction on which we're replacing operands is safe to speculatively execute, because that's what we're effectively doing. We're executing the instruction with the replaced operand, which is fine if it's pure, but not fine if can cause side-effects or UB (aka is not speculatable). Additionally, we cannot (generally) replace operands in phi nodes, as these may refer to a different loop iteration. This is also covered by the speculation check. ----- InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-19 20:26:38 +01:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Hans Wennborg	58bdfcfac0	Revert `5238e7b302` "[InstCombine] Replace one-use select operand based on condition" This caused a miscompile in Chromium, see comments on the codereview for discussion and pointer to a reproducer. > InstCombine already performs a fold where X == Y ? f(X) : Z is > transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, > if f(X) only has one use, then we can always directly replace the > use inside the instruction. To actually be profitable, limit it to > the case where Y is a non-expr constant. > > This could be further extended to replace uses further up a one-use > instruction chain, but for now this only looks one level up. > > Among other things, this also subsumes D94860. > > Differential Revision: https://reviews.llvm.org/D94862 This also reverts the follow-up a003f26539cf4db744655e76c41f4c4a8913f116: > [llvm] Prevent infinite loop in InstCombine of select statements > > This fixes an issue where the RHS and LHS the comparison operation > creating the predicate were swapped back and forth forever. > > Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 11:50:56 +01:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Tres Popp	a003f26539	[llvm] Prevent infinite loop in InstCombine of select statements This fixes an issue where the RHS and LHS the comparison operation creating the predicate were swapped back and forth forever. Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 10:31:48 +01:00
David Sherwood	c3ce262794	[NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065	2021-01-19 09:08:40 +00:00
Juneyoung Lee	2d89ebd5d1	Address unused variable warning	2021-01-19 09:30:16 +09:00
Juneyoung Lee	0441df94ad	[InstCombine,InstSimplify] Optimize select followed by and/or/xor This patch adds `A & (A && B)` -> `A && B` (similarly for or + logical or) Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`. Alive2 proof: merge_and: https://alive2.llvm.org/ce/z/teMR97 merge_or: https://alive2.llvm.org/ce/z/b4yZUp xor_and: https://alive2.llvm.org/ce/z/_-TXHi xor_or: https://alive2.llvm.org/ce/z/2uYx_a Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94861	2021-01-19 09:14:17 +09:00
Juneyoung Lee	395c737d9f	[SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of (x == C1 \|\| x == C2 \|\| ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible. D93065 has more context about the transition, including links to the list of optimizations being updated. Differential Revision: https://reviews.llvm.org/D93943	2021-01-19 08:53:40 +09:00
Sanjay Patel	5b77ac32b1	[SLP] match maxnum/minnum intrinsics as FP reduction ops After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready. We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this should restore that functionality for SLP. There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). Fixing FMF propagation is a planned follow-up. Differential Revision: https://reviews.llvm.org/D94913	2021-01-18 17:37:16 -05:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	dc300beba7	[STLExtras] Add a default value to drop_begin This patch adds the default value of 1 to drop_begin. In the llvm codebase, 70% of calls to drop_begin have 1 as the second argument. The interface similar to with std::next should improve readability. This patch converts a couple of calls to drop_begin as examples. Differential Revision: https://reviews.llvm.org/D94858	2021-01-18 10:16:34 -08:00
Xun Li	1d04dc52dd	[Coroutine] Do not CoroElide if there are musttail calls This is to address https://bugs.llvm.org/show_bug.cgi?id=48626. When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal. We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway. Differential Revision: https://reviews.llvm.org/D94834	2021-01-18 09:06:21 -08:00
Sanjay Patel	3dbbadb8ef	[SLP] rename reduction query for min/max ops; NFC This will avoid confusion once we start matching min/max intrinsics. All of these hacks to accomodate cmp+sel idioms should disappear once we canonicalize to min/max intrinsics.	2021-01-18 09:32:57 -05:00
Sanjay Patel	d1c4e859ce	[SLP] reduce opcode API dependency in reduction cost calc; NFC The icmp opcode is now hard-coded in the cost model call. This will make it easier to eventually remove all opcode queries for min/max patterns as we transition to intrinsics.	2021-01-18 09:32:57 -05:00
Florian Hahn	e6d758de82	[InferAttrs] Mark some library functions as willreturn. This patch marks some library functions as willreturn. On the first pass, I excluded most functions that interact with streams/the filesystem. Along with willreturn, it also adds nounwind to a set of math functions. There probably are a few additional attributes we can add for those, but that should be done separately. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94684	2021-01-18 13:40:21 +00:00
Caroline Concatto	36710c38c1	[NFC]Migrate VectorCombine.cpp to use InstructionCost This patch changes these functions: vectorizeLoadInsert isExtractExtractCheap foldExtractedCmps scalarizeBinopOrCmp getShuffleExtract foldBitcastShuf to use the class InstructionCost when calling TTI.get<something>Cost(). This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 ps.:This patch adds the test \|\| !NewCost.isValid(), because we want to return false when: !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive and !NewCost.isValid() && OldCost.isValid() Therefore for simplication we only add test for !NewCost.isValid() Differential Revision: https://reviews.llvm.org/D94069	2021-01-18 13:37:21 +00:00
Dávid Bolvanský	ed396212da	[InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691) ``` unsigned r(int v) { return (1 \| -(v < 0)) * v; } `r` is equivalent to `abs(v)`. ``` ``` define <4 x i8> @src(<4 x i8> %0) { %1: %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 } %3 = or <4 x i8> %2, { 1, 1, 1, undef } %4 = mul nsw <4 x i8> %3, %0 ret <4 x i8> %4 } => define <4 x i8> @tgt(<4 x i8> %0) { %1: %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 } %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0 %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0 ret <4 x i8> %4 } Transformation seems to be correct! ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94874	2021-01-17 17:06:14 +01:00
Nikita Popov	5238e7b302	[InstCombine] Replace one-use select operand based on condition InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-16 23:25:02 +01:00
Roman Lebedev	32fc32317a	[SimplifyCFG] markAliveBlocks(): catchswitch: preserve PostDomTree When removing catchpad's from catchswitch, if that removes a successor, we need to record that in DomTreeUpdater. This fixes PostDomTree preservation failure in an existing test. This appears to be the single issue that i see in my current test coverage.	2021-01-17 01:21:05 +03:00
Sanjay Patel	49b96cd9ef	[SLP] remove opcode field from reduction data class This is NFC-intended and another step towards supporting intrinsics as reduction candidates. The remaining bits of the OperationData class do not make much sense as-is, so I will try to improve that, but I'm trying to take minimal steps because it's still not clear how this was intended to work.	2021-01-16 13:55:52 -05:00
Sanjay Patel	fcfcc3cc6b	[SLP] fix typos; NFC	2021-01-16 13:55:52 -05:00
Sanjay Patel	48dbac5b6b	[SLP] remove unnecessary use of 'OperationData' This is another NFC-intended patch to allow matching intrinsics (example: maxnum) as candidates for reductions. It's possible that the loop/if logic can be reduced now, but it's still difficult to understand how this all works.	2021-01-16 13:55:52 -05:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Dávid Bolvanský	a1500105ee	[SimplifyCFG] Optimize CFG when null is passed to a function with nonnull argument Example: ``` __attribute__((nonnull,noinline)) char * pinc(char p) { return ++p; } char foo(bool b, char a) { return pinc(b ? 0 : a); } ``` optimize to ``` char foo(bool b, char *a) { return pinc(a); } ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94180	2021-01-15 23:53:43 +01:00
Sanjay Patel	ceb3cdccd0	[SLP] remove dead code in reduction matching; NFC To get into this block we had: !A \|\| B \|\| C and we checked C in the first 'if' clause leaving !A \|\| B. But the 2nd 'if' is checking: A && !B --> !(!A \|\| B)	2021-01-15 17:03:26 -05:00
Nick Desaulniers	ed0fd567eb	BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor Otherwise we'll fail the assertion in SplitBlockPredecessors() related to splitting the edges from CallBr's. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1161 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1252 Reviewed By: void, MaskRay, jyknight Differential Revision: https://reviews.llvm.org/D88438	2021-01-15 13:51:47 -08:00
Roman Lebedev	a14c36fe27	[SimplifyCFG] switchToSelect(): don't forget to insert DomTree edge iff needed DestBB might or might not already be a successor of SelectBB, and it wasn't we need to ensure that we record the fact in DomTree. The testcase used to crash in lazy domtree updater mode + non-per-function domtree validity checks disabled.	2021-01-15 23:35:57 +03:00
Roman Lebedev	c6654a4cda	[SimplifyCFG][BasicBlockUtils] Port SplitBlockPredecessors()/SplitLandingPadPredecessors() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	286cf6cb02	[SimplifyCFG] Port SplitBlockAndInsertIfThen() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	c845c724c2	[Utils][SimplifyCFG] Port SplitBlock() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	b81f75fa79	[Utils] splitBlockBefore() always operates on DomTreeUpdater, so take it, not DomTree Even though not all it's users operate on DomTreeUpdater, it itself internally operates on DomTreeUpdater, so it must mean everything is fine with that, so just do that globally.	2021-01-15 23:35:56 +03:00
Sanjay Patel	1f21de535d	[SLP] remove unused reduction functions; NFC These were made obsolete by simplifying the code in recent patches.	2021-01-15 14:59:33 -05:00
Jamie Schmeiser	17d0fb7f57	Set option default for enabling memory ssa for new pass manager loop sink pass to true. Summary: Set the default for the option enabling memory ssa use in the loop sink pass to true for the new pass manager. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D92486	2021-01-15 09:56:44 -05:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	9bcc0d1040	[CodeGen, Transforms] Use llvm::sort (NFC)	2021-01-14 20:30:31 -08:00
Sanjay Patel	b21905dfe3	[SLP] remove unnecessary state in matching reductions This is NFC-intended. I'm still trying to figure out how the loop where this is used works. It does not seem like we require this data at all, but it's hard to confirm given the complicated predicates.	2021-01-14 18:32:37 -05:00
Bjorn Pettersson	d58512b2e3	[SLP] Don't vectorize stores of non-packed types (like i1, i2) In the spirit of commit `fc783e91e0` (llvm-svn: 248943) we shouldn't vectorize stores of non-packed types (i.e. types that has padding between consecutive variables in a scalar layout, but being packed in a vector layout). The problem was detected as a miscompile in a downstream test case. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D94446	2021-01-14 11:30:33 +01:00
Daniel Paoliello	ff5e896425	Fix unused variable in CoroFrame.cpp when building Release with GCC 10 When building with GCC 10, the following warning is reported: ``` /llvm-project/llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1527:28: warning: unused variable ‘CS’ [-Wunused-variable] 1527 \| if (CatchSwitchInst *CS = ``` This change adds a cast to `void` to avoid the warning. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94456	2021-01-13 22:53:25 -08:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Wei Mi	86341247c4	[NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h to Pass.h. In some compiler passes like SampleProfileLoaderPass, we want to know which LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and it also cannot represent phase in full LTO. The patch extends it to include full LTO phases and move it from PassBuilder.h to Pass.h, then it is much easier for PassBuilder to communiate with each pass about current LTO phase. Differential Revision: https://reviews.llvm.org/D94613	2021-01-13 15:55:40 -08:00
Arthur Eubanks	39e6d24237	[NewPM] Only non-trivially loop unswitch at -O3 and for non-optsize functions This matches the legacy pipeline/pass. Reviewed By: asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D94559	2021-01-13 14:54:49 -08:00
Kazu Hirata	fb98a1be43	Fix the warnings on unused variables (NFC)	2021-01-13 13:32:40 -08:00
Sanjay Patel	123674a816	[SLP] simplify type check for reductions This is NFC-intended. The 'valid' call allows int/FP/pointers for other parts of SLP. The difference here is that we can't reduce pointers.	2021-01-13 13:30:46 -05:00
Andrew Litteken	05b1a15f70	[IROutliner] Adapting to hoisted bitcasts in CodeExtractor In commit `700d2417d8` the CodeExtractor was updated so that bitcasts that have lifetime markers that beginning outside of the region are deduplicated outside the region and are not used as an output. This caused a discrepancy in the IROutliner, where in these cases there were arguments added to the aggregate function that were not needed causing assertion errors. The IROutliner queries the CodeExtractor twice to determine the inputs and outputs, before and after `findAllocas` is called with the same ValueSet for the outputs causing the duplication. This has been fixed with a dummy ValueSet for the first call. However, the additional bitcasts prevent us from using the same similarity relationships that were previously defined by the IR Similarity Analysis Pass. In these cases, we check whether the initial version of the region being analyzed for outlining is still the same as it was previously. If it is not, i.e. because of the additional bitcast instructions from the CodeExtractor, we discard the region. Reviewers: yroux Differential Revision: https://reviews.llvm.org/D94303	2021-01-13 11:10:37 -06:00
Nikita Popov	17863614da	[InstCombine] Fold select -> and/or using impliesPoison We can fold a ? b : false to a & b if is_poison(b) implies that is_poison(a), at which point we're able to reuse all the usual fold on ands. In particular, this covers the very common case of icmp X, C && icmp X, C'. The same applies to ors. This currently only has an effect if the -instcombine-unsafe-select-transform=0 option is set. Differential Revision: https://reviews.llvm.org/D94550	2021-01-13 17:45:40 +01:00
David Sherwood	4cd48535ec	[NFC][InstructionCost] Use InstructionCost in Transforms/Scalar/RewriteStatepointsForGC.cpp In places where we calculate costs using TTI.getXXXCost() interfaces I have changed the code to use InstructionCost instead of unsigned. The change is non functional since InstructionCost behaves in the same way as an integer for valid costs. Currently the getXXXCost() functions used in this file do not return invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential revision: https://reviews.llvm.org/D94484	2021-01-13 09:42:58 +00:00
Kazu Hirata	8a20e2b3d3	[llvm] Use Optional::getValueOr (NFC)	2021-01-12 21:43:50 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
Yuanfang Chen	5c7dcd7aea	[Coroutine] Update promise object's final layout index promise is a header field but it is not guaranteed that it would be the third field of the frame due to `performOptimizedStructLayout`. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94137	2021-01-12 17:44:02 -08:00
Luo, Yuanke	055644cc45	[X86][AMX] Prohibit pointer cast on load. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D94372	2021-01-13 09:39:19 +08:00
Hongtao Yu	175288a1af	Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names. Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94455	2021-01-12 15:15:53 -08:00
modimo	2a49b7c64a	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Nikita Popov	23390e7a13	[InstCombine] Handle logical and/or in assume optimization assume(a && b) can be converted to assume(a); assume(b) even if the condition is logical. Same for assume(!(a \|\| b)).	2021-01-12 22:36:40 +01:00
Sanjay Patel	9e7895a868	[SLP] reduce code duplication while processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	92fb5c49e8	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	554be30a42	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	46507a96fc	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
Philip Reames	caafdf07bb	[LV] Weaken spuriously strong assert in LoopVersioning LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here. Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.	2021-01-12 12:57:13 -08:00
Philip Reames	9f61fbd75a	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Florian Hahn	6cd44b204c	[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`. Similar to D94125, derive `willreturn` for functions that are `readonly` and `mustprogress` in FunctionAttrs. To quote the reasoning from D94125: Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: jdoerfert, nikic Differential Revision: https://reviews.llvm.org/D94502	2021-01-12 20:02:34 +00:00
Dávid Bolvanský	0529946b5b	[instCombine] Add (A ^ B) \| ~(A \| B) -> ~(A & B) define i32 @src(i32 %x, i32 %y) { %0: %xor = xor i32 %y, %x %or = or i32 %y, %x %neg = xor i32 %or, 4294967295 %or1 = or i32 %xor, %neg ret i32 %or1 } => define i32 @tgt(i32 %x, i32 %y) { %0: %and = and i32 %x, %y %neg = xor i32 %and, 4294967295 ret i32 %neg } Transformation seems to be correct! https://alive2.llvm.org/ce/z/Cvca4a	2021-01-12 19:29:17 +01:00
Quentin Colombet	905623b64d	[NFC][LICM] Minor improvements to debug output Added a utility function in Value class to print block name and use block labels for unnamed blocks. Changed LICM to call this function in its debug output. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D93577	2021-01-11 18:02:49 -08:00
Roman Lebedev	ec8a6c11db	[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater This boils down to how we deal with early-increment iterator over function's basic blocks: not only we need to early-increment, after that we also need to skip all the blocks that are scheduled for removal, as per DomTreeUpdater.	2021-01-12 02:09:47 +03:00
Roman Lebedev	81afeacd37	[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater Thus supporting lazy DomTreeUpdater mode, where the domtree updates (and thus block removals) aren't applied immediately, but are delayed until last possible moment.	2021-01-12 02:09:47 +03:00
Roman Lebedev	90a92f8b4d	[NFCI][Utils/Local] removeUnreachableBlocks(): cleanup support for lazy DomTreeUpdater When DomTreeUpdater is in lazy update mode, the blocks that were scheduled to be removed, won't be removed until the updates are flushed, e.g. by asking DomTreeUpdater for a up-to-date DomTree. From the function's current code, it is pretty evident that the support for the lazy mode is an afterthought, see e.g. how we roll-back NumRemoved statistic.. So instead of considering all the unreachable blocks as the blocks-to-be-removed, simply additionally skip all the blocks that are already scheduled to be removed	2021-01-12 02:09:47 +03:00
Roman Lebedev	f9ba347706	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): don't insert a DomTree edge if it already exists When we are adding edges to the terminator and potentially turning it into a switch (if it wasn't already), it is possible that the case we're adding will share it's destination with one of the preexisting cases, in which case there is no domtree edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:47 +03:00
Roman Lebedev	c0de0a1b72	[SimplifyCFG] SimplifyBranchOnICmpChain(): don't insert a DomTree edge that already exists BB was already always branching to EdgeBB, there is no edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Roman Lebedev	c22bc5f1f8	[SimplifyCFG] SwitchToLookupTable(): don't insert a DomTree edge that already exists SI is the terminator of BB, so the edge we are adding obviously already existed. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Hongtao Yu	32bcfcda4e	Rename debug linkage name with -funique-internal-linkage-names Functions that are renamed under -funique-internal-linkage-names have their debug linkage name updated as well. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93747	2021-01-11 13:56:07 -08:00
Sanjay Patel	288f3fc5df	[InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test This is a more basic pattern that we should handle before trying to solve: https://llvm.org/PR48640 There might be a better way to think about this because the pre-condition that I came up with (number of sign bits in the compare constant) misses a potential transform for each of ugt and ult as commented on in the test file. Tried to model this is in Alive: https://rise4fun.com/Alive/juX1 ...but I couldn't get the ComputeNumSignBits() pre-condition to work as expected, so replaced with leading 0/1 preconditions instead. Name: ugt Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ugt i8 %a, C2 => %r = icmp slt i8 %x, 0 Name: ult Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ult i4 %a, C2 => %r = icmp sgt i4 %x, -1 Also approximated in Alive2: https://alive2.llvm.org/ce/z/u5hCcz https://alive2.llvm.org/ce/z/__szVL Differential Revision: https://reviews.llvm.org/D94014	2021-01-11 15:53:39 -05:00
Sriraman Tallam	d8c6d24359	-funique-internal-linkage-names appends a hex md5hash suffix to the symbol name which is not demangler friendly, convert it to decimal. Please see D93747 for more context which tries to make linkage names of internal linkage functions to be the uniqueified names. This causes a problem with gdb because breaking using the demangled function name will not work if the new uniqueified name cannot be demangled. The problem is the generated suffix which is a mix of integers and letters which do not demangle. The demangler accepts either all numbers or all letters. This patch simply converts the hash to decimal. There is no loss of uniqueness by doing this as the precision is maintained. The symbol names get longer by a few characters though. Differential Revision: https://reviews.llvm.org/D94154	2021-01-11 11:10:29 -08:00
Giorgis Georgakoudis	9751705512	[OpenMPOpt][WIP] Expand parallel region merging The existing implementation of parallel region merging applies only to consecutive parallel regions that have speculatable sequential instructions in-between. This patch lifts this limitation to expand merging with any sequential instructions in-between, except calls to unmergable OpenMP runtime functions. In-between sequential instructions in the merged region are sequentialized in a "master" region and any output values are broadcasted to the following parallel regions and the sequential region continuation of the merged region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90909	2021-01-11 08:06:23 -08:00
Florian Hahn	eb0371e403	[VPlan] Unify value/recipe printing after VPDef transition. This patch unifies the way recipes and VPValues are printed after the transition to VPDef. VPSlotTracker has been updated to iterate over all recipes and all their defined values to number those. There is no need to number values in Value2VPValue. It also updates a few places that only used slot numbers for VPInstruction. All recipes now can produce numbered VPValues.	2021-01-11 14:42:46 +00:00
Florian Hahn	a94497a342	[VPlan] Move initial quote emission from ::print to ::dumpBasicBlock. This means there will be no stray " when printing individual recipes using print()/dump() in a debugger, for example.	2021-01-11 12:22:15 +00:00
Bjorn Pettersson	675be65106	Require chained analyses in BasicAA and AAResults to be transitive This patch fixes a bug that could result in miscompiles (at least in an OOT target). The problem could be seen by adding checks that the DominatorTree used in BasicAliasAnalysis and ValueTracking was valid (e.g. by adding DT->verify() call before every DT dereference and then running all tests in test/CodeGen). Problem was that the LegacyPassManager calculated "last user" incorrectly for passes such as the DominatorTree when not telling the pass manager that there was a transitive dependency between the different analyses. And then it could happen that an incorrect dominator tree was used when doing alias analysis (which was a pretty serious bug as the alias analysis result could be invalid). Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94138	2021-01-11 11:50:07 +01:00
David Sherwood	40abeb11f4	[NFC][InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost This patch is part of a series of patches that migrate integer instruction costs to use InstructionCost. In the function selectVectorizationFactor I have simply asserted that the cost is valid and extracted the value as is. In future we expect to encounter invalid costs, but we should filter out those vectorization factors that lead to such invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D92178	2021-01-11 09:22:37 +00:00
David Sherwood	b7ccaca537	[NFC] Remove min/max functions from InstructionCost Removed the InstructionCost::min/max functions because it's fine to use std::min/max instead. Differential Revision: https://reviews.llvm.org/D94301	2021-01-11 09:00:12 +00:00
Serguei Katkov	7f69860243	[LoopUnroll] Fix a crash Loop peeling as a last step triggers loop simplification and this can change the loop structure. As a result all cashed values like latch branch becomes invalid. Patch re-structure the code to take into account the possible changes caused by peeling. Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour Reviewed By: Meinersbur, fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D93686	2021-01-11 10:19:26 +07:00
Philip Reames	4739dd67e7	[LoopDeletion] Break backedge of outermost loops when known not taken This is a resubmit of `dd6bb367` (which was reverted due to stage2 build failures in `7c63aac`), with the additional restriction added to the transform to only consider outer most loops. As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better. Original commit message follows... The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-10 16:02:33 -08:00
Roman Lebedev	8e8d214c4a	[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable This has a measurable impact on the number of DomTree recalculations. While this doesn't handle all the cases, it deals with the most obvious ones.	2021-01-11 00:30:44 +03:00
Sanjay Patel	3f09c77d33	[SLP] fix typo in assert This snuck into `0aa75fb12f` , but I didn't catch it locally.	2021-01-10 13:15:04 -05:00
Sanjay Patel	0aa75fb12f	[SLP] put verifyFunction call behind EXPENSIVE_CHECKS A severe compile-time slowdown from this call is noted in: https://llvm.org/PR48689 My naive fix was to put it under LLVM_DEBUG ( `267ff79` ), but that's not limiting in the way we want. This is a quick fix (or we could just remove the call completely and rely on some later pass to discover potentially wrong IR?). A bigger/better fix would be to improve/limit verifyFunction() as noted in: https://llvm.org/PR47712 Differential Revision: https://reviews.llvm.org/D94328	2021-01-10 12:32:21 -05:00
Florian Hahn	c701f85c45	[STLExtras] Use return type from operator* of the wrapped iter. Currently make_early_inc_range cannot be used with iterators with operator* implementations that do not return a reference. Most notably in the LLVM codebase, this means the User iterator ranges cannot be used with make_early_inc_range, which slightly simplifies iterating over ranges while elements are removed. Instead of directly using BaseT::reference as return type of operator, this patch uses decltype to get the actual return type of the operator implementation in WrappedIteratorT. This patch also updates a few places to use make use of make_early_inc_range. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93992	2021-01-10 14:41:13 +00:00
Florian Hahn	d98fc62ae6	[SimplifyCFG] Keep !dgb metadata of moved instruction, if they match. Currently SimplifyCFG drops the debug locations of 'bonus' instructions. Such instructions are moved before the first branch. The reason for the current behavior is that this could lead to surprising debug stepping, if the block that's folded is dead. In case the first branch and the instructions to be folded have the same debug location, this shouldn't be an issue and we can keep the debug location. Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D93662	2021-01-09 19:15:16 +00:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Kazu Hirata	4d92ab1669	[Transforms] Use llvm::find_if (NFC)	2021-01-09 09:24:58 -08:00
Kazu Hirata	9a7c03b800	[SCEV] Remove unused getOrInsertCanonicalInductionVariable (NFC) The last use was removed on Mar 22, 2012 in commit `f47d0af551`.	2021-01-09 09:24:56 -08:00
Florian Hahn	65f578fc0e	[VPlan] Keep start value of VPWidenPHIRecipe as VPValue. Similar to D92129, update VPWidenPHIRecipe to manage the start value as VPValue. This allows adjusting the start value as a VPlan transform, which will be used in a follow-up patch to support reductions during epilogue vectorization. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93975	2021-01-09 16:34:15 +00:00
Kazu Hirata	f62b93b9a2	[SCEV] Remove unused getExactExistingExpansion (NFC) The last use was removed on Sep 4, 2018 in commit `2cbba56337`.	2021-01-08 18:39:57 -08:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00
Arthur Eubanks	756dd70766	[NewPM] Run ObjC ARC passes Match the legacy PM in running various ObjC ARC passes. This requires making some module passes into function passes. These were initially ported as module passes since they add function declarations (e.g. https://reviews.llvm.org/D86178), but that's still up for debate and other passes do so. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D93743	2021-01-08 15:47:11 -08:00
Florian Hahn	c493e9216b	[VPlan] Move reduction start value creation to widenPHIRecipe. This was suggested to prepare for D93975. By moving the start value creation to widenPHInstruction, we set the stage to manage the start value directly in VPWidenPHIRecipe, which be used subsequently to set the 'resume' value for reductions during epilogue vectorization. It also moves RdxDesc to the recipe, so we do not have to rely on Legal to look it up later. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D94175	2021-01-08 17:49:43 +00:00
Alexander Belyaev	bcbdeafa9c	Revert "[SLP]Need shrink the load vector after reordering." This reverts commit `4284afdf94`. This changes computed values in fused_batchnorm_test_cpu. Not equal to tolerance rtol=1e-06, atol=0.001 Mismatched value: a is different from b. not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5])) not close lhs = [-0.6636615 -0.9804948 -1.148275 -0.68193716 -0.8572368 -0.65046215 -0.6993756 -1.2244141 -1.0938729 -0.50369143 -0.51830524 -0.738452 -0.7214286 -0.48115745 -0.9380924 -0.9341769 -0.5916775 -1.2896856 -0.7264182 -0.9746917 -0.783249 -0.7659018 -0.86214024 -0.47784212] not close rhs = [ 0.44102234 0.12418899 -0.04359123 0.42274666 0.24744703 0.45422167 0.40530816 -0.11973029 0.01081094 0.6009924 0.5863786 0.3662318 0.38325527 0.62352633 0.1665914 0.1705069 0.5130063 -0.18500176 0.37826565 0.12999213 0.3214348 0.338782 0.24254355 0.62684166] not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046838] not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045 0.00100041 0.00100012 0.00100001 0.0010006 0.00100059 0.00100037 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]	2021-01-08 14:42:26 +01:00
Sanjay Patel	267ff7901c	[SLP] limit verifyFunction to debug build (PR48689) As noted in PR48689, the verifier may have some kind of exponential behavior that should be addressed separately. For now, only run it in debug mode to prevent problems for release+asserts. That limit is what we had before D80401, and I'm not sure if there was a reason to change it in that patch.	2021-01-08 08:10:17 -05:00
Cullen Rhodes	1e7efd397a	[LV] Legalize scalable VF hints In the following loop: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int a, int b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718	2021-01-08 10:49:44 +00:00
David Green	72fb5ba079	[LV] Don't sink into replication regions The new test case here contains a first order recurrences and an instruction that is replicated. The first order recurrence forces an instruction to be sunk _into_, as opposed to after the replication region. That causes several things to go wrong including registering vector instructions multiple times and failing to create dominance relations correctly. Instead we should be sinking to after the replication region, which is what this patch makes sure happens. Differential Revision: https://reviews.llvm.org/D93629	2021-01-08 09:50:10 +00:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Ruiling Song	8dddcc762d	[Cloning] Copy metadata of global declarations We have modules with metadata on declarations, and out-of-tree passes use that metadata, and we need to clone those modules. We really expect such metadata is kept during the clone operation. Reviewed by: arsenm, aprantl Differential Revision: https://reviews.llvm.org/D93451	2021-01-08 08:21:18 +08:00
Roman Lebedev	f2f81c554b	[SimplifyCFG] markAliveBlocks(): switch to non-permissive DomTree updates No actual changes needed, invoke can't have the same block as an unwind destination and a normal destination.	2021-01-08 02:15:27 +03:00
Roman Lebedev	d59f97bb3a	[SimplifyCFG] removeUnwindEdge(): switch to non-permissive DomTree updates No actual changes needed, Catchswitch cannot unwind to one of its catchpads.	2021-01-08 02:15:27 +03:00
Roman Lebedev	f0eba8ce2d	[SimplifyCFG] changeToCall(): switch to non-permissive DomTree updates No actual changes needed, normal and unwind destinations of an invoke can never be identical.	2021-01-08 02:15:27 +03:00
Roman Lebedev	be0a31d13b	[SimplifyCFG] DeleteDeadBlocks(): switch to non-permissive DomTree updates No actual changes needed, DetatchDeadBlocks() was already doing the right thing.	2021-01-08 02:15:27 +03:00
Roman Lebedev	66189212bb	[SimplifyCFG] MergeBlockIntoPredecessor(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same successor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	05adc73db0	[SimplifyCFG] changeToUnreachable(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	7600d7c7be	[SimplifyCFG] removeUnreachableBlocks(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	1f9b591ee6	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:25 +03:00
Roman Lebedev	b3822728fa	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `indirectbr` handling ... which requires not deleting edges that were just deleted already.	2021-01-08 02:15:25 +03:00
Roman Lebedev	36593a30a4	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `SwitchInst` handling ... which requires not deleting edges that will still be present.	2021-01-08 02:15:24 +03:00
Roman Lebedev	16ab8e5f6d	[SimplifyCFG] ConstantFoldTerminator(): handle matching destinations of condbr earlier We need to handle this case before dealing with the case of constant branch condition, because if the destinations match, latter fold would try to remove the DomTree edge that would still be present. This allows to make that particular DomTree update non-permissive	2021-01-08 02:15:24 +03:00
Arthur Eubanks	1a2eaebc09	[CoroSplit][NewPM] Don't call LazyCallGraph functions to split when no clones Apparently there can be no clones, as happens in coro-retcon-unreachable.ll. The alternative is to allow no split functions in addSplitRefRecursiveFunctions(), but it seems better to have the caller make sure it's not accidentally splitting no functions out. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D94258	2021-01-07 14:06:35 -08:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sanjay Patel	4c7148d75c	[SLP] remove opcode identifier for reduction; NFC Another step towards allowing intrinsics in reduction matching.	2021-01-07 14:07:27 -05:00
Hiroshi Yamauchi	cf5415c727	[PGO][PGSO] Let unroll hints take precedence over PGSO. Differential Revision: https://reviews.llvm.org/D94199	2021-01-07 10:10:31 -08:00
Roman Lebedev	6be1fd6b20	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): drop reachable errneous assert I have added it in `d15d81c` because it seemed correct, was holding for all the tests so far, and was validating the fix added in the same commit, but as David Major is pointing out (with a reproducer), the assertion isn't really correct after all. So remove it. Note that the `d15d81c` still fine.	2021-01-07 18:05:04 +03:00
Sidharth Baveja	048f184ee4	[SplitEdge] Add new parameter to SplitEdge to name the newly created basic block Summary: Currently SplitEdge does not support passing in parameter which allows you to name the newly created BasicBlock. This patch updates the function such that the name of the block can be passed in, if users of this utility decide to do so. Reviewed By: Whitney, bmahjour, asbirlea, jamieschmeiser Differential Revision: https://reviews.llvm.org/D94176	2021-01-07 14:49:23 +00:00
Alexey Bataev	4284afdf94	[SLP]Need shrink the load vector after reordering. After merging the shuffles, we cannot rely on the previous shuffle anymore and need to shrink the final shuffle, if it is required. Reported in D92668 Differential Revision: https://reviews.llvm.org/D93967	2021-01-07 04:50:48 -08:00
Oliver Stannard	76f6b125ce	Revert "[llvm] Use BasicBlock::phis() (NFC)" Reverting because this causes crashes on the 2-stage buildbots, for example http://lab.llvm.org:8011/#/builders/7/builds/1140. This reverts commit `9b228f107d`.	2021-01-07 09:43:33 +00:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Kazu Hirata	9b228f107d	[llvm] Use BasicBlock::phis() (NFC)	2021-01-06 18:27:35 -08:00
Alina Sbirlea	63aeaf754a	[DominatorTree] Add support for mixed pre/post CFG views. Add support for mixed pre/post CFG views. Update usages of the MemorySSAUpdater to use the new DT API by requesting the DT updates to be done by the MSSAUpdater. Differential Revision: https://reviews.llvm.org/D93371	2021-01-06 14:53:09 -08:00
Sanjay Patel	4c022b5a41	[SLP] use reduction kind's opcode to create new instructions; NFC Similar to `5a1d31a28` - This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-06 14:37:44 -05:00
Sanjay Patel	5d24089a70	[SLP] reduce code for propagating flags on reductions; NFC If we add/change to match intrinsics, this might get more wordy, but there's no need to list each kind currently.	2021-01-06 14:37:44 -05:00
Arthur Eubanks	7fea561eb1	[CGSCC][Coroutine][NewPM] Properly support function splitting/outlining Previously when trying to support CoroSplit's function splitting, we added in a hack that simply added the new function's node into the original function's SCC (https://reviews.llvm.org/D87798). This is incorrect since it might be in its own SCC. Now, more similar to the previous design, we have callers explicitly notify the LazyCallGraph that a function has been split out from another one. In order to properly support CoroSplit, there are two ways functions can be split out. One is the normal expected "outlining" of one function into a new one. The new function may only contain references to other functions that the original did. The original function must reference the new function. The new function may reference the original function, which can result in the new function being in the same SCC as the original function. The weird case is when the original function indirectly references the new function, but the new function directly calls the original function, resulting in the new SCC being a parent of the original function's SCC. This form of function splitting works with CoroSplit's Switch ABI. The second way of splitting is more specific to CoroSplit. CoroSplit's Retcon and Async ABIs split the original function into multiple functions that all reference each other and are referenced by the original function. In order to keep the LazyCallGraph in a valid state, all new functions must be processed together, else some nodes won't be populated. To keep things simple, this only supports the case where all new edges are ref edges, and every new function references every other new function. There can be a reference back from any new function to the original function, putting all functions in the same RefSCC. This also adds asserts that all nodes in a (Ref)SCC can reach all other nodes to prevent future incorrect hacks. The original hacks in https://reviews.llvm.org/D87798 are no longer necessary since all new functions should have been registered before calling updateCGAndAnalysisManagerForPass. This fixes all coroutine tests when opt's -enable-new-pm is true by default. This also fixes PR48190, which was likely due to the previous hack breaking SCC invariants. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93828	2021-01-06 11:19:15 -08:00
Francesco Petrogalli	dfd3384fee	[InstCombine] Update valueCoversEntireFragment to use TypeSize * Update valueCoversEntireFragment to use TypeSize. * Add a regression test. * Assertions have been added to protect untested codepaths. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91806	2021-01-06 17:14:59 +00:00
Florian Hahn	494db3816b	[LoopDeletion] Also consider loops with subloops for deletion. Currently, LoopDeletion does skip loops that have sub-loops, but this means we currently fail to remove some no-op loops. One example are inner loops with live-out values. Those cannot be removed by itself. But the containing loop may itself be a no-op and the whole loop-nest can be deleted. The legality checks do not seem to rely on analyzing inner-loops only for correctness. With LoopDeletion being a LoopPass, the change means that we now unfortunately need to do some extra work in parent loops, by checking some conditions we already checked. But there appears to be no noticeable compile time impact: http://llvm-compile-time-tracker.com/compare.php?from=02d11f3cda2ab5b8bf4fc02639fd1f4b8c45963e&to=843201e9cf3b6871e18c52aede5897a22994c36c&stat=instructions This changes patch leads to ~10 more loops being deleted on MultiSource, SPEC2000, SPEC2006 with -O3 & LTO This patch is also required (together with a few others) to eliminate a no-op loop in omnetpp as discussed on llvm-dev 'LoopDeletion / removal of empty loops.' (http://lists.llvm.org/pipermail/llvm-dev/2020-December/147462.html) This change becomes relevant after removing potentially infinite loops is made possible in 'must-progress' loops (D86844). Note that I added a function call with side-effects to an outer loop in `llvm/test/Transforms/LoopDeletion/update-scev.ll` to preserve the original spirit of the test. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D93716	2021-01-06 14:49:00 +00:00
Florian Hahn	816dba48af	[VPlan] Keep start value in VPWidenIntOrFpInductionRecipe (NFC). This patch updates VPWidenIntOrFpInductionRecipe to hold the start value for the induction variable. This makes the start value explicit and allows for adjusting the start value for a VPlan. The flexibility will be used in further patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92129	2021-01-06 11:47:33 +00:00
Florian Hahn	0ce5f402e0	[VPlan] Add getLiveInIRValue accessor to VPValue. This patch adds a new getLiveInIRValue accessor to VPValue, which returns the underlying value, if the VPValue is defined outside of VPlan. This is required to handle scalars in VPTransformState, which requires dealing with scalars defined outside of VPlan. We can simply check VPValue::Def to determine if the value is defined inside a VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92281	2021-01-06 11:20:42 +00:00
Florian Hahn	f73c09caa2	[VPlan] Use public VPValue constructor in VPPRedInstPHIRecipe (NFC). VPPredInstPHIRecipe does not need access to VPValue via friendship. It can just use the public constructor, Discussed as part of D92281.	2021-01-06 10:47:09 +00:00
Juneyoung Lee	29f8628d1f	[Constant] Add containsPoisonElement This patch - Adds containsPoisonElement that checks existence of poison in constant vector elements, - Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved. Thanks! Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94053	2021-01-06 12:10:33 +09:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Roman Lebedev	a14945c1db	[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): really don't delete DomTree edges multiple times	2021-01-06 01:52:39 +03:00
Roman Lebedev	2b437fcd47	[SimplifyCFG] SwitchToLookupTable(): switch to non-permissive DomTree updates ... which requires not deleting a DomTree edge that we just deleted.	2021-01-06 01:52:38 +03:00
Roman Lebedev	fa5447aa3f	[NFC][SimplifyCFG] SwitchToLookupTable(): pull out SI->getParent() into a variable	2021-01-06 01:52:38 +03:00
Roman Lebedev	d15d81ce15	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): deal with each predecessor only once If the predecessor is a switch, and BB is not the default destination, multiple cases could have the same destination. and it doesn't make sense to re-process the predecessor, because we won't make any changes, once is enough. I'm not sure this can be really tested, other than via the assertion being added here, which fires without the fix.	2021-01-06 01:52:37 +03:00
Roman Lebedev	fc96cb2dad	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): switch to non-permissive DomTree updates ... which requires not adding a DomTree edge that we just added.	2021-01-06 01:52:37 +03:00
Roman Lebedev	29ca7d5a1a	[SimplifyCFG] simplifyUnreachable(): fix handling of degenerate same-destination conditional branch One would hope that it would have been already canonicalized into an unconditional branch, but that isn't really guaranteed to happen with SimplifyCFG's visitation order.	2021-01-06 01:52:36 +03:00
Roman Lebedev	3460719f58	[NFC][SimplifyCFG] Add a test with same-destination condidional branch Reported by Mikael Holmén as post-commit feedback on https://reviews.llvm.org/rG2d07414ee5f74a09fb89723b4a9bb0818bdc2e18#968162	2021-01-06 01:52:36 +03:00
Roman Lebedev	f98535686e	[SimplifyCFG] simplifyUnreachable(): switch to non-permissive DomTree updates ... which requires not removing a DomTree edge if the switch's default still points at that destination, because it can't be removed; ... and not processing the same predecessor more than once.	2021-01-06 01:52:36 +03:00
Sanjay Patel	6a03f8ab62	[SLP] reduce code for finding reduction costs; NFC We can get both (vector/scalar) costs in a single switch instead of sequentially.	2021-01-05 17:35:54 -05:00
Arthur Eubanks	8cf1cc578d	[FuncAttrs] Infer noreturn A function is noreturn if all blocks terminating with a ReturnInst contain a call to a noreturn function. Skip looking at naked functions since there may be asm that returns. This can be further refined in the future by checking unreachable blocks and taking into account recursion. It looks like the attributor pass does this, but that is not yet enabled by default. This seems to help with code size under the new PM since PruneEH does not run under the new PM, missing opportunities to mark some functions noreturn, which in turn doesn't allow simplifycfg to clean up dead code. https://bugs.llvm.org/show_bug.cgi?id=46858. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93946	2021-01-05 13:25:42 -08:00
Sanjay Patel	5a1d31a284	[SLP] use reduction kind's opcode for cost model queries; NFC This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-05 15:12:40 -05:00
Sanjay Patel	d4a999b453	[SLP] reduce code duplication; NFC	2021-01-05 15:12:40 -05:00
Atmn Patel	f88a797521	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2021-01-05 09:56:16 -05:00
Sanjay Patel	3b8b2c7da2	[SLP] delete unused pairwise reduction option SLP tries to model 2 forms of vector reductions: pairwise and splitting. From the cost model code comments, those are defined using an example as: /// Pairwise: /// (v0, v1, v2, v3) /// ((v0+v1), (v2+v3), undef, undef) /// Split: /// (v0, v1, v2, v3) /// ((v0+v2), (v1+v3), undef, undef) I don't know the full history of this functionality, but it was partly added back in D29402. There are apparently no users at this point (no regression tests change). X86 might have managed to work-around the need for this through cost model and codegen improvements. Removing this code makes it easier to continue the work that was started in D87416 / D88193. The alternative -- if there is some target that is silently using this option -- is to move this logic into LoopUtils. We have related/duplicate functionality there via llvm::createTargetReduction(). Differential Revision: https://reviews.llvm.org/D93860	2021-01-05 13:23:07 -05:00
Florian Hahn	8a47e6252a	[VPlan] Re-add interleave group members to plan. Creating in-loop reductions relies on IR references to map IR values to VPValues after interleave group creation. Make sure we re-add the updated member to the plan, so the look-ups still work as expected This fixes a crash reported after D90562.	2021-01-05 15:06:47 +00:00
Simon Pilgrim	313d982df6	[IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse.	2021-01-05 11:01:10 +00:00
Florian Hahn	38c6933dcc	[LV] Simplify lambda in all_of to directly return hasVF() result. (NFC) The if in the lambda is not necessary. We can directly return the result of hasVF.	2021-01-05 10:34:06 +00:00
Simon Pilgrim	a000366d05	[SimplifyIndVar] createWideIV - make WideIVInfo arg a const ref. NFCI. The WideIVInfo arg is only ever used as a const. Fixes cppcheck warning.	2021-01-05 10:31:45 +00:00
Simon Pilgrim	7a97eeb197	[Coroutines] checkAsyncFuncPointer - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning.	2021-01-05 10:31:45 +00:00
Jeremy Morse	914066fe38	[DebugInfo] Avoid LSR crash on large integer inputs Loop strength reduction tries to recover debug variable values by looking for simple offsets from PHI values. In really extreme conditions there may be an offset used that won't fit in an int64_t, hitting an APInt assertion. This patch adds a regression test and adjusts the equivalent value collecting code to filter out any values where the offset can't be represented by an int64_t. This means that for very large integers with very large offsets, the variable location will become undef, which is the same behaviour as before `2a6782bb9f` / D87494. Differential Revision: https://reviews.llvm.org/D94016	2021-01-05 10:25:37 +00:00
Simon Pilgrim	84d5768d97	MemProfiler::insertDynamicShadowAtFunctionEntry - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning.	2021-01-05 09:34:01 +00:00
Arthur Eubanks	e30fbbe9a5	[JumpThreading][NewPM] Skip when target has divergent CF Matches the legacy pass. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94028	2021-01-04 16:08:08 -08:00
Roman Lebedev	32c47ebef1	[SimplifyCFG] SimplifyCondBranchToTwoReturns(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted, because we could be dealing with a block that didn't go through ConstantFoldTerminator() yet, and thus has a degenerate cond br with matching true/false destinations.	2021-01-05 01:26:37 +03:00
Roman Lebedev	110b3d7855	[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted.	2021-01-05 01:26:37 +03:00
Roman Lebedev	a8604e3d5b	[SimplifyCFG] simplifyIndirectBr(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted.	2021-01-05 01:26:36 +03:00
Roman Lebedev	ed9de61cc3	[SimplifyCFGPass] mergeEmptyReturnBlocks(): switch to non-permissive DomTree updates ... which requires not inserting an edge that already exists.	2021-01-05 01:26:36 +03:00
Roman Lebedev	3fb57222c4	[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible Notably, this doesn't switch every case, remaining cases don't actually pass sanity checks in non-permissve mode, and therefore require further analysis. Note that SimplifyCFG still defaults to not preserving DomTree by default, so this is effectively a NFC change.	2021-01-05 01:26:36 +03:00
Sanjay Patel	36263a7ccc	[LoopUtils] remove redundant opcode parameter; NFC While here, rename the inaccurate getRecurrenceBinOp() because that was also used to get CmpInst opcodes. The recurrence/reduction kind should always refer to the expected opcode for a reduction. SLP appears to be the only direct caller of createSimpleTargetReduction(), and that calling code ideally should not be carrying around both an opcode and a reduction kind. This should allow us to generalize reduction matching to use intrinsics instead of only binops.	2021-01-04 17:05:28 -05:00
Sanjay Patel	9766957524	[LoopUtils] reduce code for creatng reduction; NFC We can return from each case instead creating a temporary variable just to have a common return.	2021-01-04 16:05:03 -05:00
Sanjay Patel	58b6c5d932	[LoopUtils] reorder logic for creating reduction; NFC If we are using a shuffle reduction, we don't need to go through the switch on opcode - return early.	2021-01-04 16:05:02 -05:00
Whitney Tsang	de6d43f16c	Revert "[LoopNest] Allow empty basic blocks without loops" This reverts commit `9a17bff4f7`.	2021-01-04 20:42:21 +00:00
Whitney Tsang	9a17bff4f7	[LoopNest] Allow empty basic blocks without loops Allow loop nests with empty basic blocks without loops in different levels as perfect. Reviewers: Meinersbur Differential Revision: https://reviews.llvm.org/D93665	2021-01-04 19:59:50 +00:00
Philip Reames	7c63aac7bd	Revert "[LoopDeletion] Break backedge of loops when known not taken" This reverts commit `dd6bb367d1`. Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.	2021-01-04 09:50:47 -08:00
Philip Reames	dd6bb367d1	[LoopDeletion] Break backedge of loops when known not taken The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-04 09:19:29 -08:00
Florian Hahn	c367258b5c	[SimplifyCFG] Enabled hoisting late in LTO pipeline. `bb7d3af113` disabled hoisting in SimplifyCFG by default, but enabled it late in the pipeline. But it appears as if the LTO pipelines got missed. This patch adjusts the LTO pipelines to also enable hoisting in the later stages. Unfortunately there's no easy way to add a test for the change I think. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D93684	2021-01-04 16:26:58 +00:00
Florian Hahn	e0905553b4	[ArgPromotion] Delay dead GEP removal until doPromotion. Currently ArgPromotion removes dead GEPs as part of the legality check in isSafeToPromoteArgument. If no promotion happens, this means the pass claims no modifications happened, even though GEPs were removed. This patch fixes the issue by delaying removal of dead GEPs until doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and the code in doPromotion dealing with GEPs is updated to account for dead GEPs. Once we committed to promotion, it should be safe to remove dead GEPs. Alternatively isSafeToPromoteArgument could return an additional boolean to indicate whether it made changes, but this is quite cumbersome and there should be no real benefit of weeding out some dead GEPs here if we do not perform promotion. I added a test for the case where dead GEPs need to be removed when promotion happens in `578c5a0c6e`. Fixes PR47477. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93991	2021-01-04 09:51:20 +00:00
Andrew Litteken	5c951623bc	[IROutliner] Refactoring errors in the cost model from past patches. There were was the reuse of a variable that should not have been occurred due to confusion during committing patches.	2021-01-04 00:11:18 -06:00
Andrew Litteken	05e6ac4eb8	[IROutliner] Removing a duplicate addition, causing overestimates in IROutliner. There was an extra addition left over from a previous commit for the cost model, this removes it.	2021-01-03 23:36:28 -06:00
Roman Lebedev	98cd1c33e3	[NFC][SimplifyCFG] Hoist 'original' DomTree verification from simplifyOnce() into run() This is NFC since SimplifyCFG still currently defaults to not preserving DomTree. SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(), and can not be called externally, since SimplifyCFGOpt is defined in .cpp This avoids some needless verifications, and is thus a bit faster without sacrificing precision.	2021-01-04 01:02:02 +03:00
Roman Lebedev	a7684940f0	[SimplifyCFG] SimplifyTerminatorOnSelect(): fix/tune DomTree updates We only need to remove non-TrueBB/non-FalseBB successors, and we only need to do that once. We don't need to insert any new edges, because no new successors will be added.	2021-01-04 01:02:02 +03:00
Roman Lebedev	70935b9595	[NFC][SimplifyCFG] SimplifyTerminatorOnSelect(): pull out OldTerm->getParent() into a variable	2021-01-04 01:02:02 +03:00
Kazu Hirata	ba82c0b315	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-01-03 09:57:47 -08:00
Juneyoung Lee	1fc992bd86	[Scalarizer] Use poison as insertelement's placeholder This patch makes Scalarizer to use poison as insertelement's placeholder. It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program. It is because the placeholder value (poison) is already completely hidden by following insertelement instructions. The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times, which is # of elems of MidTy. The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with Count insertelements. The test diffs show that the poison value is never exposed after insertelements. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93989	2021-01-04 00:35:28 +09:00
Roman Lebedev	5fa241a657	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation, take 2	2021-01-03 01:45:48 +03:00
Roman Lebedev	6a3a8d17eb	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation	2021-01-03 01:45:48 +03:00
Roman Lebedev	7c8b8063b6	[SimplifyCFG][AMDGPU] AMDGPUUnifyDivergentExitNodes: SimplifyCFG isn't ready to preserve PostDomTree There is a number of transforms in SimplifyCFG that take DomTree out of DomTreeUpdater, and do updates manually. Until they are fixed, user passes are unable to claim that PDT is preserved. Note that the default for SimplifyCFG is still not to preserve DomTree, so this is still effectively NFC.	2021-01-03 01:45:46 +03:00
Kazu Hirata	530c5af6a4	[Transforms] Construct SmallVector with iterator ranges (NFC)	2021-01-02 09:24:17 -08:00
Florian Hahn	c50f9b2351	[LV] Clean up trailing whitespace (NFC). Clean up some stray whitespace that sneaked in recently.	2021-01-02 16:43:13 +00:00
Roman Lebedev	b9da488ad7	[SimplifyCFG] Don't actually take DomTreeUpdater unless we intend to maintain DomTree validity This guards against unintentional mistakes like the one i just fixed in previous commit.	2021-01-02 14:40:55 +03:00
Roman Lebedev	b4429f3cdd	[SimplifyCFG] Teach removeUndefIntroducingPredecessor to preserve DomTree	2021-01-02 01:01:20 +03:00
Roman Lebedev	657c1e09da	[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 2	2021-01-02 01:01:18 +03:00
Roman Lebedev	f1ce696056	[SimplifyCFG] Teach tryWidenCondBranchToCondBranch() to preserve DomTree	2021-01-02 01:01:17 +03:00
Roman Lebedev	e08fea3b24	[SimplifyCFGPass] Ensure that DominatorTreeWrapperPass is init'd before SimplifyCFG It's probably better than hoping that it will happen to be already initialized.	2021-01-02 01:01:17 +03:00
Kazu Hirata	f43daf1b62	[SSAUpdater] Remove unused code InstrIsPHI (NFC) The last use of this function was removed on Jan 4, 2018 in commit commit `90ecac01e9`.	2021-01-01 12:44:52 -08:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Florian Hahn	d9f306aa52	[LV] Fix crash when generating remarks with multi-exit loops. If DoExtraAnalysis is true (e.g. because remarks are enabled), we continue with the analysis rather than exiting. Update code to conditionally check if the ExitBB has phis or not a single predecessor. Otherwise a nullptr is dereferenced with DoExtraAnalysis.	2021-01-01 13:54:41 +00:00
Roman Lebedev	831636b0e6	[SimplifyCFG] SUCCESS! Teach createUnreachableSwitchDefault() to preserve DomTree This pretty much concludes patch series for updating SimplifyCFG to preserve DomTree. All 318 dedicated `-simplifycfg` tests now pass with `-simplifycfg-require-and-preserve-domtree=1`. There are a few leftovers that apparently don't have good test coverage. I do not yet know what gaps in test coverage will the wider-scale testing reveal, but the default flip might be close.	2021-01-01 03:25:25 +03:00
Roman Lebedev	e1440d43bc	[SimplifyCFG] Teach tryToSimplifyUncondBranchWithICmpInIt() to preserve DomTree	2021-01-01 03:25:25 +03:00
Roman Lebedev	8866583953	[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 2	2021-01-01 03:25:24 +03:00
Roman Lebedev	a815b6b2b2	[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 1	2021-01-01 03:25:24 +03:00
Roman Lebedev	0d2f219d4d	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 3	2021-01-01 03:25:23 +03:00
Roman Lebedev	9f17dab1f4	[SimplifyCFG] Teach simplifyIndirectBr() to preserve DomTree	2021-01-01 03:25:23 +03:00
Roman Lebedev	b7c463d7b8	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 2	2021-01-01 03:25:23 +03:00
Roman Lebedev	c1b825d4b8	[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 1	2021-01-01 03:25:22 +03:00
Andrew Litteken	1a9eb19af9	[IROutliner] Adding consistent function attribute merging When combining extracted functions, they may have different function attributes. We want to make sure that we do not make any assumptions, or lose any information. This attempts to make sure that we consolidate function attributes to their most general case. Tests: llvm/test/Transforms/IROutliner/outlining-compatible-and-attribute-transfer.ll llvm/test/Transforms/IROutliner/outlining-compatible-or-attribute-transfer.ll Reviewers: jdoefert, paquette Differential Revision: https://reviews.llvm.org/D87301	2020-12-31 12:30:23 -06:00
Fangrui Song	a90b42b0fe	[ThinLTO] Default -enable-import-metadata to false The default value is dependent on `-DLLVM_ENABLE_ASSERTIONS={off,on}` (D22167), which is error-prone. The few tests checking `!thinlto_src_module` can specify -enable-import-metadata explicitly. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D93959	2020-12-31 10:04:21 -08:00
Dávid Bolvanský	ae69fa9b9f	[InstCombine] Transform (A + B) - (A & B) to A \| B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = and i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = or i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/2fhW6r	2020-12-31 15:04:32 +01:00
Dávid Bolvanský	742ea77ca4	[InstCombine] Transform (A + B) - (A \| B) to A & B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = or i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = and i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/aQRh2j	2020-12-31 14:03:20 +01:00
Bogdan Graur	8bee4d4e8f	Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops" Test clang/test/Misc/loop-opt-setup.c fails when executed in Release. This reverts commit `6f1503d598`. Reviewed By: SureYeaah Differential Revision: https://reviews.llvm.org/D93956	2020-12-31 11:47:49 +00:00
Atmn Patel	6f1503d598	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2020-12-30 21:43:01 -05:00
Kazu Hirata	95ea86587c	[PGO] Use isa instead of dyn_cast (NFC)	2020-12-30 17:45:38 -08:00
Roman Lebedev	51879a5256	[LoopIdiom] 'left-shift until bittest': don't forget to check that PHI node is in loop header Fixes an issue reported by Peter Collingbourne in https://reviews.llvm.org/D91726#2475301	2020-12-30 23:58:41 +03:00
Roman Lebedev	7f221c9196	[SimplifyCFG] Teach SwitchToLookupTable() to preserve DomTree	2020-12-30 23:58:41 +03:00
Roman Lebedev	a17025aa61	[SimplifyCFG] Teach switchToSelect() to preserve DomTree	2020-12-30 23:58:40 +03:00
Roman Lebedev	c45f765c0d	[SimplifyCFG] Teach SimplifyBranchOnICmpChain() to preserve DomTree	2020-12-30 23:58:40 +03:00
Sanjay Patel	8ca60db40b	[LoopUtils] reduce FMF and min/max complexity when forming reductions I don't know if there's some way this changes what the vectorizers may produce for reductions, but I have added test coverage with `3567908` and `5ced712` to show that both passes already have bugs in this area. Hopefully this does not make things worse before we can really fix it.	2020-12-30 15:22:26 -05:00
Yuanfang Chen	277ebe46c6	Fix `LLVM_ENABLE_MODULES=On` build for commit `480936e741`.	2020-12-30 10:54:04 -08:00
Andrew Litteken	fe431103b6	[IROutliner] Adding option to enable outlining from linkonceodr functions There are functions that the linker is able to automatically deduplicate, we do not outline from these functions by default. This allows for outlining from those functions. Tests: llvm/test/Transforms/IROutliner/outlining-odr.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87309	2020-12-30 12:08:04 -06:00
Sanjay Patel	e90ea76380	[IR] remove 'NoNan' param when creating FP reductions This is no-functional-change-intended (AFAIK, we can't isolate this difference in a regression test). That's because the callers should be setting the IRBuilder's FMF field when creating the reduction and/or setting those flags after creating. It doesn't make sense to override this one flag alone. This is part of a multi-step process to clean up the FMF setting/propagation. See PR35538 for an example.	2020-12-30 09:51:23 -05:00
Juneyoung Lee	420d046d6b	clang-format, address warnings	2020-12-30 23:05:07 +09:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Juneyoung Lee	bfedd5d2b6	[ConstraintElimination] Add support for select form of and/or This patch adds support for select form of and/or. Currently there is an ongoing effort for moving towards using `select a, b, false` instead of `and i1 a, b` and `select a, true, b` instead of `or i1 a, b` as well. D93065 has links to relevant changes. Alive2 proof: (undef input was disabled due to timeout :( ) - and: https://alive2.llvm.org/ce/z/AgvFbQ - or: https://alive2.llvm.org/ce/z/KjLJyb Differential Revision: https://reviews.llvm.org/D93935	2020-12-30 21:27:36 +09:00
Andrew Litteken	30feb93036	[IROutliner] Adding support for swift errors in the IROutliner Since some values can be swift errors, we need to make sure that we correctly propagate the parameter attributes. Tests found at: llvm/test/Transforms/IROutliner/outlining-swift-error.ll Reviewers: jroelofs, paquette Recommit of: `71867ed5e6` Differential Revision: https://reviews.llvm.org/D87742	2020-12-30 01:17:27 -06:00
Andrew Litteken	eeb99c2ac2	Revert "[IROutliner] Adding support for swift errors" This reverts commit `71867ed5e6`. Reverting for lack of commit messages.	2020-12-30 01:17:27 -06:00
Andrew Litteken	71867ed5e6	[IROutliner] Adding support for swift errors	2020-12-30 01:14:55 -06:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Kazu Hirata	16d20e2554	[Transforms/Utils] Construct SmallVector with iterator ranges (NFC)	2020-12-29 19:23:23 -08:00
Andrew Litteken	df4a931c63	[IROutliner] Adding OptRemarks to the IROutliner Pass This prints OptRemarks at each location where a decision is made to not outline, or to outline a specific section for the IROutliner pass. Test: llvm/test/Transforms/IROutliner/opt-remarks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87300	2020-12-29 15:52:08 -06:00
Roman Lebedev	39a56f7f17	[SimplifyCFG] Teach SimplifyTerminatorOnSelect() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	ec0b671a61	[SimplifyCFG] Teach SimplifyCondBranchToCondBranch() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	307156246f	[SimplifyCFG] Teach mergeConditionalStoreToAddress() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	d4c0abb4a3	[SimplifyCFG] Teach FoldCondBranchOnPHI() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	b8121b2e62	[SimplifyCFG] Teach SinkCommonCodeFromPredecessors() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	18c407bf4c	[SimplifyCFG] Teach HoistThenElseCodeToIf() to preserve DomTree	2020-12-30 00:48:10 +03:00
Roman Lebedev	fe9bdd9621	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 2	2020-12-30 00:48:10 +03:00
Roman Lebedev	6027e05dbf	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 1	2020-12-30 00:48:10 +03:00
Sanjay Patel	8d18bc8e6d	[Utils] reduce code in createTargetReduction(); NFC The switch duplicated the translation in getRecurrenceBinOp(). This code is still weird because it translates to the TTI ReductionFlags for min/max, but then createSimpleTargetReduction() converts that back to RecurrenceDescriptor::MinMaxRecurrenceKind.	2020-12-29 15:56:19 -05:00
Sanjay Patel	21a3a0225d	[SLP] replace local reduction enum with RecurrenceKind; NFCI I'm not sure if the SLP enum was created before the IVDescriptor RecurrenceDescriptor / RecurrenceKind existed, but the code in SLP is now redundant with that class, so it just makes things more complicated to have both. We eventually call LoopUtils createSimpleTargetReduction() to create reduction ops, so we might as well standardize on those enum names. There's still a question of whether we need to use TTI::ReductionFlags vs. MinMaxRecurrenceKind, but that can be another clean-up step. Another option would just be to flatten the enums in RecurrenceDescriptor into a single enum. There isn't much benefit (smaller switches?) to having a min/max subset.	2020-12-29 14:52:11 -05:00
Andrew Litteken	6df161a2fb	[IROutliner] Adding a cost model, and debug option to turn the model off. This adds a cost model that takes into account the total number of machine instructions to be removed from each region, the number of instructions added by adding a new function with a set of instructions, and the instructions added by handling arguments. Tests not adding flags: llvm/test/Transforms/IROutliner/outlining-cost-model.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87299	2020-12-29 12:43:41 -06:00
Roman Lebedev	374ef57f13	[InstCombine] 'hoist xor-by-constant from xor-by-value': completely give up on constant exprs As Mikael Holmén is noting in the post-commit review for the first fix https://reviews.llvm.org/rGd4ccef38d0bb#967466 not hoisting constantexprs is not enough, because if the xor originally was a constantexpr (i.e. X is a constantexpr). `SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately undo this transform, thus again causing an infinite combine loop. This transform has resulted in a surprising number of constantexpr failures.	2020-12-29 16:28:18 +03:00
Arthur Eubanks	c2ef06d3dd	[NewPM] Port infer-address-spaces And add it to the AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93880	2020-12-28 19:58:12 -08:00
Kazu Hirata	5d2529f28f	[Scalar] Construct SmallVector with iterator ranges (NFC)	2020-12-28 19:55:18 -08:00
Andrew Litteken	1e23802507	[IROutliner] Merging identical output blocks for extracted functions. Many of the sets of output stores will be the same. When a block is created, we check if there is an output block with the same set of store instructions. If there is, we map the output block of the region back to the block, so that the extra argument controlling the switch statement can be set to the appropriate block value. Tests: - llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87298	2020-12-28 21:01:48 -06:00
Andrew Litteken	e6ae623314	[IROutliner] Adding support for consolidating functions with different output arguments. Certain regions can have values introduced inside the region that are used outside of the region. These may not be the same for each similar region, so we must create one over arching set of arguments for the consolidated function. We do this by iterating over the outputs for each extracted function, and creating as many different arguments to encapsulate the different outputs sets. For each output set, we create a different block with the necessary stores from the value to the output register. There is then one switch statement, controlled by an argument to the function, to differentiate which block to use. Changed Tests for consistency: llvm/test/Transforms/IROutliner/extraction.ll llvm/test/Transforms/IROutliner/illegal-assumes.ll llvm/test/Transforms/IROutliner/illegal-memcpy.ll llvm/test/Transforms/IROutliner/illegal-memmove.ll llvm/test/Transforms/IROutliner/illegal-vaarg.ll Tests to test new functionality: llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87296	2020-12-28 16:17:07 -06:00
Nikita Popov	4a16c507cb	[InstCombine] Disable unsafe select transform behind a flag This disables the poison-unsafe select -> and/or transform behind a flag (we continue to perform the fold by default). This is intended to simplify evaluation and testing while we teach various passes to directly recognize the select pattern. This only disables the main select -> and/or transform. A number of related ones are instead changed to canonicalize to the a ? b : false and a ? true : b forms which represent and/or respectively. This requires a bit of care to avoid infinite loops, as we do not want !a ? b : false to be converted into a ? false : b. The basic idea here is the same as D93065, but keeps the change behind a flag for now. Differential Revision: https://reviews.llvm.org/D93840	2020-12-28 22:43:52 +01:00
Roman Lebedev	ef93f7a11c	[SimplifyCFG] FoldBranchToCommonDest: gracefully handle unreachable code () We might be dealing with an unreachable code, so the bonus instruction we clone might be self-referencing. There is a sanity check that all uses of bonus instructions that are not in the original block with said bonus instructions are PHI nodes, and that is obviously not the case for self-referencing instructions.. So if we find such an use, just rewrite it. Thanks to Mikael Holmén for the reproducer! Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8	2020-12-28 23:31:19 +03:00
Philip Reames	4b33b23877	Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder This reverts commit `4ffcd4fe9a` thus restoring `e4df6a40da`. The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.	2020-12-28 10:13:28 -08:00
Arthur Eubanks	4ffcd4fe9a	Revert "[LV] Vectorize (some) early and multiple exit loops" This reverts commit `e4df6a40da`. Breaks Windows bots, e.g. http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio	2020-12-28 10:05:41 -08:00
Philip Reames	e4df6a40da	[LV] Vectorize (some) early and multiple exit loops This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways: single exit loops which are not bottom tested multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later) The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts. The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration. The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical). Differential Revision: https://reviews.llvm.org/D93317	2020-12-28 09:40:42 -08:00
Roman Lebedev	d4ccef38d0	[InstCombine] 'hoist xor-by-constant from xor-by-value': ignore constantexprs As it is being reported (in post-commit review) in https://reviews.llvm.org/D93857 this fold (as i expected, but failed to come up with test coverage despite trying) has issues with constant expressions. Since we only care about true constants, which constantexprs are not, don't perform such hoisting for constant expressions.	2020-12-28 20:15:20 +03:00
Yevgeny Rouban	d76c1d2247	[RS4GC] Lazily set changed flag when folding single entry phis The function FoldSingleEntryPHINodes() is changed to return if it has changed IR or not. This return value is used by RS4GC to set the MadeChange flag respectively. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D93810	2020-12-28 10:54:21 +07:00
Juneyoung Lee	9d70dbdc2b	[InstCombine] use poison as placeholder for undemanded elems Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement. However, this is problematic because undef isn’t undefined enough. Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison. This makes a few straightforward optimizations incorrect, such as: ``` ; https://bugs.llvm.org/show_bug.cgi?id=44185 define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %xv = insertelement <4 x float> %q, float %x, i32 2 %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef } ret <4 x float> %r ; %r[3] is undef } => define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %r = insertelement <4 x float> %y, float %x, i32 1 ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison } Transformation doesn't verify! ERROR: Target is more poisonous than source ``` I’d like to suggest 1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too) 2. Updating shufflevector’s semantics to return poison element if mask is undef Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay. m_Undef() matches PoisonValue as well, so existing optimizations will still fire. The only concern is hidden miscompilations that will go incorrect when poison constant is given. A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :( Instead, I’ll simply locally maintain the tests and run Alive2. If there is any bug found, I’ll report it. Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93586	2020-12-28 08:58:15 +09:00
Florian Hahn	4ad41902e8	[GVN] Correctly set modified status when doing PRE on indices. This patch updates GVN to correctly return the modified status, if PRE is performed on indices. It fixes a crash when building the test-suite with EXPENSIVE_CHECKS and LTO.	2020-12-27 21:58:31 +00:00
Juneyoung Lee	d3f1f7b6bc	[EarlyCSE] Use m_LogicalAnd/Or matchers to handle branch conditions EarlyCSE's handleBranchCondition says: ``` // If the condition is AND operation, we can propagate its operands into the // true branch. If it is OR operation, we can propagate them into the false // branch. ``` This holds for the corresponding select patterns as well. This is a part of an ongoing work for disabling buggy select->and/or transformations. See llvm.org/pr48353 and D93065 for more context Proof: and: https://alive2.llvm.org/ce/z/MQWodU or: https://alive2.llvm.org/ce/z/9GLbB_ Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93842	2020-12-28 05:36:26 +09:00
Juneyoung Lee	f1d648b973	[GVN] Use m_LogicalAnd/Or to propagate equality from branch conditions This patch makes GVN recognize `select c1, c2, false` as well as `select c1, true, c2` branch condition and propagate equality from these. See llvm.org/pr48353, D93065 Differential Revision: https://reviews.llvm.org/D93841	2020-12-28 05:28:38 +09:00
Florian Hahn	0ea3749b3c	[LV] Set up branch from middle block earlier. Previously the branch from the middle block to the scalar preheader & exit was being set-up at the end of skeleton creation in completeLoopSkeleton. Inserting SCEV or runtime checks may result in LCSSA phis being created, if they are required. Adjusting branches afterwards may break those PHIs. To avoid this, we can instead create the branch from the middle block to the exit after we created the middle block, so we have the final CFG before potentially adjusting/creating PHIs. This fixes a crash for the included test case. For the non-crashing case, this is almost a NFC with respect to the generated code. The only change is the order of the predecessors of the involved branch targets. Note an assertion was moved from LoopVersioning() to LoopVersioning::versionLoop. Adjusting the branches means loop-simplify form may be broken before constructing LoopVersioning. But LV only uses LoopVersioning to annotate the loop instructions with !noalias metadata, which does not require loop-simplify form. This is a fix for an existing issue uncovered by D93317.	2020-12-27 18:21:12 +00:00
Kazu Hirata	8299fb8f25	[Transforms] Use llvm::append_range (NFC)	2020-12-27 09:57:29 -08:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00
Sanjay Patel	badf0f20f3	[SLP] rename reduction variables for readability; NFC I am hoping to extend the reduction matching code, and it is hard to distinguish "ReductionData" from "ReducedValueData". So extend the tree/root metaphor to include leaves. Another problem is that the name "OperationData" does not provide insight into its purpose. I'm not sure if we can alter that underlying data structure to make the code clearer.	2020-12-26 11:20:25 -05:00
Sanjay Patel	c4ca108966	[SLP] use switch to improve readability; NFC This will get more complicated when we handle intrinsics like maxnum.	2020-12-26 10:59:45 -05:00
Kazu Hirata	46bea9b297	[Local] Remove unused function RemovePredecessorAndSimplify (NFC) The last use of the function was removed on Sep 29, 2010 in commit `99c985c37d`.	2020-12-25 09:35:20 -08:00
Roman Lebedev	25aebe2ccf	[LoopIdiom] 'left-shift-until-bittest': keep no-wrap flags on shift, fix edge-case miscompilation for %x.next While `%x.curr` is always safe to compute, because `LoopBackedgeTakenCount` will always be smaller than `bitwidth(X)`, i.e. we never get poison, rewriting `%x.next` is more complicated, however, because `X << LoopTripCount` will be poison iff `LoopTripCount == bitwidth(X)` (which will happen iff `BitPos` is `bitwidth(x) - 1` and `X` is `1`). So unless we know that isn't the case (as alive2 notes, we know it's safe to do iff shift had no-wrap flags, or bitpos does not indicate signbit, or we know that %x is never `1`), we'll need to emit an alternative, safe IR, by either just shifting the `%x.curr`, or conditionally selecting between the computed `%x.next` and `0`.. Former IR looks better so let's do that. While there, ensure that we don't drop no-wrap flags from said shift.	2020-12-24 21:20:52 +03:00
Roman Lebedev	d9ebaeeb46	[InstCombine] Hoist xor-by-constant from xor-by-value This is one of the deficiencies that can be observed in https://godbolt.org/z/YPczsG after D91038 patch set. This exposed two missing folds, one was fixed by the previous commit, another one is `(A ^ B) \| ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`. `-early-cse` will catch it: https://godbolt.org/z/4n1T1v, but isn't meaningful to fix it in InstCombine, because we'd need to essentially do our own CSE, and we can't even rely on `Instruction::isIdenticalTo()`, because there are no guarantees that the order of operands matches. So let's just accept it as a loss.	2020-12-24 21:20:50 +03:00
Roman Lebedev	5b78303433	[InstCombine] Fold `a & ~(a ^ b)` to `x & y` ``` ---------------------------------------- define i32 @and_xor_not_common_op(i32 %a, i32 %b) { %0: %b2 = xor i32 %b, 4294967295 %t2 = xor i32 %a, %b2 %t4 = and i32 %t2, %a ret i32 %t4 } => define i32 @and_xor_not_common_op(i32 %a, i32 %b) { %0: %t4 = and i32 %a, %b ret i32 %t4 } Transformation seems to be correct! ```	2020-12-24 21:20:49 +03:00
Roman Lebedev	b3021a72a6	[IR][InstCombine] Add m_ImmConstant(), that matches on non-ConstantExpr constants, and use it A pattern to ignore ConstantExpr's is quite common, since they frequently lead into infinite combine loops, so let's make writing it easier.	2020-12-24 21:20:47 +03:00
Roman Lebedev	ff3749fc79	[NFC] SimplifyCFGOpt::simplifyUnreachable(): pacify unused variable warning Thanks to Luke Benes for pointing it out.	2020-12-24 21:20:46 +03:00
Kazu Hirata	df812115e3	[CodeGen, Transforms] Use llvm::any_of (NFC)	2020-12-24 09:08:36 -08:00
Simon Pilgrim	89abe1cf83	[InstCombine] foldICmpUsingKnownBits - use KnownBits signed/unsigned getMin/MaxValue helpers. NFCI. Replace the local compute*SignedMinMaxValuesFromKnownBits methods with the equivalent KnownBits helpers to determine the min/max value ranges.	2020-12-24 14:22:26 +00:00
Nikita Popov	ef2f843347	Revert "[InstCombine] Check inbounds in load/store of gep null transform (PR48577)" This reverts commit `899faa50f2`. Upon further consideration, this does not fix the right issue. Doing this fold for non-inbounds GEPs is legal, because the resulting pointer is still based-on null, which has no associated address range, and as such and access to it is UB. https://bugs.llvm.org/show_bug.cgi?id=48577#c3	2020-12-24 12:36:56 +01:00
Nikita Popov	90177912a4	Revert "[InstCombine] Fold gep inbounds of null to null" This reverts commit `eb79fd3c92`. This causes stage2 crashes, possibly due to StringMap being miscompiled. Reverting for now.	2020-12-24 10:20:31 +01:00
Roman Lebedev	f8079355c6	[InstCombine] canonicalizeAbsNabs(): don't propagate NSW flag for NABS patter As Nuno is noting in post-commit review in https://reviews.llvm.org/D87188#2467915 it is not correct to keep NSW for negated abs pattern, so don't do that.	2020-12-24 00:06:09 +03:00
Nikita Popov	759b8c11c3	[InstCombine] Handle different pointer types when folding gep of null The source pointer type is not necessarily the same as the result pointer type, so we can't simply return the original null pointer, it might be a different one.	2020-12-23 21:58:26 +01:00
Nikita Popov	eb79fd3c92	[InstCombine] Fold gep inbounds of null to null Effectively, this is what we were previously already doing when the GEP was used in conjunction with a load or store, but this fold can also be applied more generally: > The only in bounds address for a null pointer in the default > address-space is the null pointer itself.	2020-12-23 21:41:53 +01:00
Nikita Popov	899faa50f2	[InstCombine] Check inbounds in load/store of gep null transform (PR48577) If the GEP isn't inbounds, then accessing a GEP of null location is generally not UB. While this is a minimal fix, the GEP of null handling should probably be its own fold.	2020-12-23 21:03:22 +01:00
Craig Topper	897990e614	[IROutliner] Use isa instead of dyn_cast where the casted value isn't used. NFC Fixes unused variable warnings.	2020-12-23 11:40:15 -08:00
Roman Lebedev	2b61e7c68c	[LoopIdiom] 'left-shift until bittest' idiom: support rewriting loop as countable, allow extra cruft The current state of the transform is still not enough to support my motivational pattern, because it has one more "induction variable". I have delayed posting this patch, because originally even just rewriting the loop as countable wasn't enough to nicely transform my motivational pattern, because i expected that extra IV to be rewritten afterwards, but it wasn't happening until i fixed that in D91800. So, this patch allows the 'left-shift until bittest' loop idiom as long as the inserted ops are cheap, and lifts any and all extra use checks on the instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92754	2020-12-23 22:28:10 +03:00
Roman Lebedev	a0ddc61c5b	[LoopIdiom] 'left-shift until bittest' idiom: support canonical sign bit mask If the bitmask is for sign bit, instcombine would have canonicalized the pattern into a proper sign bit check. Supporting that is still simple, but requires a bit of a roundtrip - we first have to use `decomposeBitTestICmp()`, and the rest again just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91726	2020-12-23 22:28:09 +03:00
Roman Lebedev	cb2e5980ba	[LoopIdiom] 'left-shift until bittest' idiom: support constant bit mask The handing of the case where the mask is a constant is trivial, if said constant is a power of two, the bit in question is log2(mask), rest just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91725	2020-12-23 22:28:09 +03:00
Roman Lebedev	e124844709	[LoopIdiom] Introduce 'left-shift until bittest' idiom The motivation here is the following inner loop in fp16/fp24 -> fp32 expander, that runs as part of the floating-point DNG decompression in RawSpeed library: `cd380bb9a2/src/librawspeed/decompressors/DeflateDecompressor.cpp (L112-L115)` ``` while (!(fp32_fraction & (1 << 23))) { fp32_exponent -= 1; fp32_fraction <<= 1; } ``` (https://godbolt.org/z/r13YMh) As one might notice, that loop is currently uncountable, and that whole code stays scalar. Yet, it is rather trivial to make that loop countable: https://godbolt.org/z/do8WMz and we can prove that via alive2: https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?) ... and that allow for the whole fp16->fp32 code to vectorize: https://godbolt.org/z/7hYr13 Now, while i'd love to get there, i feel like i should take it in steps. For now, this introduces support for the most basic case, where the bit position is known as a variable, and the loop will go away (has no live-outs other than the recurrence, no extra instructions in the loop). I have added sufficient (i believe) test coverage, and alive2 is happy with those transforms. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91038	2020-12-23 22:28:09 +03:00
Andrew Litteken	b1191c8438	[IROutliner] Adding support for elevating constants that are not the same in each region to arguments When there are constants that have the same structural location, but not the same value, between different regions, we cannot simply outline the region. Instead, we find the constants that are not the same in each location, and promote them to arguments to be passed into the respective functions. At each call site, we pass the constant in as an argument regardless of type. Added/Edited Tests: llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll llvm/test/Transforms/IROutliner/outlining-different-constants.ll llvm/test/Transforms/IROutliner/outlining-different-globals.ll Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D87294	2020-12-23 13:03:05 -06:00
Evgeniy Brevnov	9fb074e7bb	[BPI] Improve static heuristics for "cold" paths. Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness of other paths. New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights. One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together. In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers. Reviewed By: yrouban Differential Revision: https://reviews.llvm.org/D79485	2020-12-23 22:47:36 +07:00
Kazu Hirata	3c707d73f2	[NewGVN] Remove for_each_found (NFC) The last use of the function was removed on Sep 30, 2017 in commit `9b926e90d3`.	2020-12-22 20:13:27 -08:00
Sanjay Patel	0d15d4b6f4	[SLP] use operand index abstraction for number of operands I think this is NFC currently, but the bug would be exposed when we allow binary intrinsics (maxnum, etc) as candidates for reductions. The code in matchAssociativeReduction() is using OperationData::getNumberOfOperands() when comparing whether the "EdgeToVisit" iterator is in-bounds, so this code must use the same (potentially offset) operand value to set the "EdgeToVisit".	2020-12-22 16:05:39 -05:00
Arnold Schwaighofer	333108e8be	Add a llvm.coro.end.async intrinsic The llvm.coro.end.async intrinsic allows to specify a function that is to be called as the last action before returning. This function will be inlined after coroutine splitting. This function can contain a 'musttail' call to allow for guaranteed tail calling as the last action. Differential Revision: https://reviews.llvm.org/D93568	2020-12-22 10:52:28 -08:00
Florian Hahn	ef4dbb2b7a	[LV] Use ScalarEvolution::getURemExpr to reduce duplication. ScalarEvolution should be able to handle both constant and variable trip counts using getURemExpr, so we do not have to handle them separately. This is a small simplification of `a56280094e`. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93677	2020-12-22 14:48:42 +00:00
Florian Hahn	c0c0ae16c3	[VPlan] Make VPInstruction a VPDef This patch turns updates VPInstruction to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90565	2020-12-22 09:53:47 +00:00
Gil Rapaport	a56280094e	[LV] Avoid needless fold tail When the trip-count is provably divisible by the maximal/chosen VF, folding the loop's tail during vectorization is redundant. This commit extends the existing test for constant trip-counts to any trip-count known to be divisible by maximal/selected VF by SCEV. Differential Revision: https://reviews.llvm.org/D93615	2020-12-22 10:25:20 +02:00
Ta-Wei Tu	d7a6f3a105	[LoopNest] Extend `LPMUpdater` and adaptor to handle loop-nest passes This is a follow-up patch of D87045. The patch implements "loop-nest mode" for `LPMUpdater` and `FunctionToLoopPassAdaptor` in which only top-level loops are operated. `createFunctionToLoopPassAdaptor` decides whether the returned adaptor is in loop-nest mode or not based on the given pass. If the pass is a loop-nest pass or the pass is a `LoopPassManager` which contains only loop-nest passes, the loop-nest version of adaptor is returned; otherwise, the normal (loop) version of adaptor is returned. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D87531	2020-12-22 08:47:38 +08:00
Congzhe Cao	c60a58f8d4	[InstCombine] Add check of i1 types in select-to-zext/sext transformation When doing select-to-zext/sext transformations, we should not handle TrueVal and FalseVal of i1 type otherwise it would result in zext/sext i1 to i1. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93272	2020-12-21 18:46:24 -05:00
Michael Forster	d56982b6f5	Remove unused variables. Differential Revision: https://reviews.llvm.org/D93635	2020-12-21 16:24:43 +01:00
Simon Pilgrim	88c5b50060	[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts (REAPPLIED) The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns. This should allow us to begin resolving PR46896 et al. Ensure we block poison in a funnel shift value - similar to rG0fe91ad463fea9d08cbcd640a62aa9ca2d8d05e0 Reapplied with fix for PR48068 - we weren't checking that the shift values could be hoisted from their basicblocks. Differential Revision: https://reviews.llvm.org/D90625	2020-12-21 15:22:27 +00:00
Florian Hahn	f250892373	[VPlan] Make VPRecipeBase inherit from VPDef. This patch makes VPRecipeBase a direct subclass of VPDef, moving the SubclassID to VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90564	2020-12-21 13:34:00 +00:00
Florian Hahn	cd608dc8d3	[VPlan] Use VPDef for VPInterleaveRecipe. This patch turns updates VPInterleaveRecipe to manage the values it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90562	2020-12-21 10:56:53 +00:00
David Sherwood	3bf7d47a97	[NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp An earlier patch introduced asserts that the InstructionCost is valid because at that time the ReuseShuffleCost variable was an unsigned. However, now that the variable is an InstructionCost instance the asserts can be removed. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174	2020-12-21 09:12:28 +00:00
Kazu Hirata	5d24935f22	[PGO] Remove dead member variable InstrumentFuncEntry (NFC) This patch removes InstrumentFuncEntry as it is dead. The constructor of FuncPGOInstrumentation passes InstrumentFuncEntry to MST, but it doesn't make a local copy as a member variable.	2020-12-20 09:57:05 -08:00
Andrew Litteken	7c6f28a438	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D86978	2020-12-19 17:34:34 -06:00
Andrew Litteken	b8a2b6af37	Revert "[IROutliner] Deduplicating functions that only require inputs." Missing reviewers and differential revision in commit message. This reverts commit `5cdc4f57e5`.	2020-12-19 17:33:49 -06:00
Andrew Litteken	5cdc4f57e5	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll	2020-12-19 17:26:29 -06:00
Roman Lebedev	c043f5055e	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1 ... for conditional branch case	2020-12-20 00:18:36 +03:00
Roman Lebedev	262ff9c23e	[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree	2020-12-20 00:18:36 +03:00
Roman Lebedev	6a1617d67c	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2 ... for the custom case returning void.	2020-12-20 00:18:36 +03:00
Roman Lebedev	b94520c9ee	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1 ... for the general case of returning a value.	2020-12-20 00:18:35 +03:00
Roman Lebedev	4d87a6ad13	[NFCI][SimplifyCFG] SimplifyCondBranchToTwoReturns(): pull out BI->getParent() into a variable	2020-12-20 00:18:35 +03:00
Roman Lebedev	83659c7076	[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-20 00:18:34 +03:00
Roman Lebedev	b7d00e29b7	[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	c209b88dd4	[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	76e74d9395	[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree	2020-12-20 00:18:33 +03:00
Roman Lebedev	4be8707e64	[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree Still boring, simply drop all edges to successors of DomBlock, and add an edge to to BB instead.	2020-12-20 00:18:33 +03:00
Roman Lebedev	b43b77ff9b	[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation And that exposes that a number of tests don't actually manage to maintain DomTree validity, which is inline with my observations. Once again, SimlifyCFG pass currently does not require/preserve DomTree by default, so this is effectively NFC.	2020-12-20 00:18:32 +03:00
Andrew Litteken	c52bcf3a9b	[IRSim][IROutliner] Limit to extracting regions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch limits the extracted regions to those that only require inputs, and do not have any other special cases. We test that we do not outline the wrong constants in: test/Transforms/IROutliner/outliner-different-constants.ll test/Transforms/IROutliner/outliner-different-globals.ll test/Transforms/IROutliner/outliner-constant-vs-registers.ll We test that correctly outline in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: paquette, plofti Differential Revision: https://reviews.llvm.org/D86977	2020-12-19 13:33:54 -06:00
Kazu Hirata	56edfcada9	[Target, Transforms] Use contains (NFC)	2020-12-19 10:43:19 -08:00
Aditya Kumar	1ab4db0f84	[HotColdSplit] Reflect full cost of parameters in split penalty Make the penalty for splitting a region more accurately reflect the cost of materializing all of the inputs/outputs to/from the region. This almost entirely eliminates code growth within functions which undergo splitting in key internal frameworks, and reduces the size of those frameworks between 2.6% to 3%. rdar://49167240 Patch by: Vedant Kumar(@vsk) Reviewers: hiraditya,rjf,t.p.northover Reviewed By: hiraditya,rjf Differential Revision: https://reviews.llvm.org/D59715	2020-12-18 17:06:17 -08:00
Akira Hatanaka	ffd982f7db	[ObjC][ARC] Fix a bug where the inline-asm retain/claim RV marker wasn't inserted when the original call had a 'returned' argument The code is testing whether the instruction BBI points to is the call that is paired up with the retainRV/claimRV call, but it doesn't work when the call has a 'returned' argument since GetArgRCIdentityRoot looks through 'returned' arguments. rdar://72485383	2020-12-18 16:59:06 -08:00
Sanjay Patel	37d0dda739	[SLP] fix typo; NFC	2020-12-18 16:55:52 -05:00
Nikita Popov	1f1145006b	[DSE] Use correct memory location for read clobber check MSSA DSE starts at a killing store, finds an earlier store and then checks that the earlier store is not read along any paths (without being killed first). However, it uses the memory location of the killing store for that, not the earlier store that we're attempting to eliminate. This has a number of problems: * Mismatches between what BasicAA considers aliasing and what DSE considers an overwrite (even though both are correct in isolation) can result in miscompiles. This is PR48279, which D92045 tries to fix in a different way. The problem is that we're using a location from a store that is potentially not executed and thus may be UB, in which case analysis results can be arbitrary. * Metadata on the killing store may be used to determine aliasing, but there is no guarantee that the metadata is valid, as the specific killing store may not be executed. Using the metadata on the earlier store is valid (it is the store we're removing, so on any execution where its removal may be observed, it must be executed). * The location is imprecise. For full overwrites the killing store will always have a location that is larger or equal than the earlier access location, so it's beneficial to use the earlier access location. This is not the case for partial overwrites, in which case either location might be smaller. There is some room for improvement here. Using the earlier access location means that we can no longer cache which accesses are read for a given killing store, as we may be querying different locations. However, it turns out that simply dropping the cache has no notable impact on compile-time. Differential Revision: https://reviews.llvm.org/D93523	2020-12-18 20:26:53 +01:00
Kazu Hirata	5ac37725df	[GVNHoist] Remove successorDominate (NFC) The function was introduced on Aug 25, 2016 in commit `5f0d0e60d1`. Its last use was removed on Sep 13, 2017 in commit `dfa8741c96`.	2020-12-18 10:29:52 -08:00
Roman Lebedev	897c985e1e	[InstCombine] Canonicalize SPF to abs intrinsic This patch enables canonicalization of SPF_ABS and SPF_ABS to the abs intrinsic. This is a recommit, the original try was `05d4c4ebc2`, but it was reverted due to an apparent miscompile, which since then has just been fixed by the previous commit. Differential Revision: https://reviews.llvm.org/D87188	2020-12-18 21:18:14 +03:00
Whitney Tsang	2a814cd9e1	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-18 17:37:17 +00:00
Arnamoy Bhattacharyya	06d5b1c9ad	[SROA] Remove Dead Instructions while creating speculative instructions The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions. This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet: ``` while (!PN.use_empty()) { LoadInst LI = cast<LoadInst>(PN.user_back()); LI->replaceAllUsesWith(NewPN); LI->eraseFromParent(); } ``` If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line: ``` LoadInst Load = PredBuilder.CreateAlignedLoad( LoadTy, InVal, Alignment, (PN.getName() + ".sroa.speculate.load." + Pred->getName())); ``` This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D92431	2020-12-18 11:47:02 -05:00
Sanjay Patel	47aaa99c0e	[VectorCombine] allow peeking through GEPs when creating a vector load This is an enhancement motivated by https://llvm.org/PR16739 (see D92858 for another). We can look through a GEP to find a base pointer that may be safe to use for a vector load. If so, then we shuffle (shift) the necessary vector element over to index 0. Alive2 proof based on 1 of the regression tests: https://alive2.llvm.org/ce/z/yPJLkh The vector translation is independent of endian (verify by changing to leading 'E' in the datalayout string). Differential Revision: https://reviews.llvm.org/D93229	2020-12-18 09:25:03 -05:00
Yevgeny Rouban	f0e3d1d6ca	[IndVars] Fix adding trunc instructions to unwind blocks Truncate instruction must not be inserted before landing pads. The insertion point is fixed.	2020-12-18 12:52:23 +07:00
Kazu Hirata	b621116716	[Transforms] Use llvm::erase_if (NFC)	2020-12-17 19:53:10 -08:00
Rong Xu	31c0b8700b	Fix clang-ppc64le-rhel buildbot build error ix buildbot build error due to commit 3733463d: [IR][PGO] Add hot func attribute and use hot/cold attribute in func section	2020-12-17 19:14:43 -08:00
Rong Xu	3733463dbb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Andrew Litteken	cea807602a	[IRSim][IROutliner] Adding InstVisitor to disallow certain operations. This adds a custom InstVisitor to return false on instructions that should not be allowed to be outlined. These match the illegal instructions in the IRInstructionMapper with exception of the addition of the llvm.assume intrinsic. Tests all the tests marked: illegal-*-.ll with a test for each kind of instruction that has been marked as illegal. Reviewers: jroelofs, paquette Differential Revisions: https://reviews.llvm.org/D86976	2020-12-17 19:33:57 -06:00
Roman Lebedev	2d07414ee5	[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree Pretty boring, removeUnwindEdge() already known how to update DomTree, so if we are to call it, we must first flush our own pending updates; otherwise, we just stop predecessors from branching to us, and for certain predecessors, stop their predecessors from branching to them also.	2020-12-18 00:37:22 +03:00
Roman Lebedev	2ee724863e	[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:22 +03:00
Roman Lebedev	164e0847a5	[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:21 +03:00
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `d20e0c3444`.	2020-12-17 21:00:37 +00:00
Johannes Doerfert	994bb6eb7d	[OpenMP][NFC] Provide a new remark and documentation If a GPU function is externally reachable we give up trying to find the (unique) kernel it is called from. This can hinder optimizations. Emit a remark and explain mitigation strategies. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93439	2020-12-17 14:38:26 -06:00
Andrew Litteken	dae34463e3	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Recommit of `bf899e8913` fixing memory leaks. Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-12-17 11:27:26 -06:00
Nabeel Omer	df2b9a3e02	[DebugInfo] Avoid re-ordering assignments in LCSSA The LCSSA pass makes use of a function insertDebugValuesForPHIs() to propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty behaviour occurs when the parent PHI of a newly inserted PHI is not the most recent assignment to a source variable. insertDebugValuesForPHIs ends up propagating a value that isn't the most recent assignemnt. This change removes the call to insertDebugValuesForPHIs() from LCSSA, preventing incorrect dbg.value intrinsics from being propagated. Propagating variable locations between blocks will occur later, during LiveDebugValues. Differential Revision: https://reviews.llvm.org/D92576	2020-12-17 16:17:32 +00:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Florian Hahn	01089c876b	[InstCombine] Preserve !annotation on newly created instructions. If the source instruction has !annotation metadata, all instructions created during combining should also have it. Tell the builder to add it. The !annotation system was discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) This patch is based on an earlier patch by Francis Visoiu Mistrih. Reviewed By: thegameg, lebedev.ri Differential Revision: https://reviews.llvm.org/D91444	2020-12-17 15:20:23 +00:00
Florian Hahn	75c04bfc61	[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest. When folding a branch to a common destination, preserve !annotation on the created instruction, if the terminator of the BB that is going to be removed has !annotation. This should ensure that !annotation is attached to the instructions that 'replace' the original terminator. Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D93410	2020-12-17 14:06:58 +00:00
Jun Ma	0138399903	[InstCombine] Remove scalable vector restriction in InstCombineCasts Differential Revision: https://reviews.llvm.org/D93389	2020-12-17 22:02:33 +08:00
Florian Hahn	29077ae860	[IRBuilder] Generalize debug loc handling for arbitrary metadata. This patch extends IRBuilder to allow adding/preserving arbitrary metadata on created instructions. Instead of using references to specific metadata nodes (like DebugLoc), IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which are added to each created instruction. The patch itself is a NFC and only moves the existing debug location handling over to the new system. In a follow-up patch it will be used to preserve !annotation metadata besides !dbg. The current approach requires iterating over MetadataToCopy to avoid adding duplicates, but given that the number of metadata kinds to copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2 (!dbg and !annotation)) that should not matter. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D93400	2020-12-17 13:27:43 +00:00
Cullen Rhodes	1fd3a04775	[LV] Disable epilogue vectorization for scalable VFs Epilogue vectorization doesn't support scalable vectorization factors yet, disable it for now. Reviewed By: sdesmalen, bmahjour Differential Revision: https://reviews.llvm.org/D93063	2020-12-17 12:14:03 +00:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Barry Revzin	92310454bf	Make LLVM build in C++20 mode Part of the <=> changes in C++20 make certain patterns of writing equality operators ambiguous with themselves (sorry!). This patch goes through and adjusts all the comparison operators such that they should work in both C++17 and C++20 modes. It also makes two other small C++20-specific changes (adding a constructor to a type that cases to be an aggregate, and adding casts from u8 literals which no longer have type const char*). There were four categories of errors that this review fixes. Here are canonical examples of them, ordered from most to least common: // 1) Missing const namespace missing_const { struct A { #ifndef FIXED bool operator==(A const&); #else bool operator==(A const&) const; #endif }; bool a = A{} == A{}; // error } // 2) Type mismatch on CRTP namespace crtp_mismatch { template <typename Derived> struct Base { #ifndef FIXED bool operator==(Derived const&) const; #else // in one case changed to taking Base const& friend bool operator==(Derived const&, Derived const&); #endif }; struct D : Base<D> { }; bool b = D{} == D{}; // error } // 3) iterator/const_iterator with only mixed comparison namespace iter_const_iter { template <bool Const> struct iterator { using const_iterator = iterator<true>; iterator(); template <bool B, std::enable_if_t<(Const && !B), int> = 0> iterator(iterator<B> const&); #ifndef FIXED bool operator==(const_iterator const&) const; #else friend bool operator==(iterator const&, iterator const&); #endif }; bool c = iterator<false>{} == iterator<false>{} // error \|\| iterator<false>{} == iterator<true>{} \|\| iterator<true>{} == iterator<false>{} \|\| iterator<true>{} == iterator<true>{}; } // 4) Same-type comparison but only have mixed-type operator namespace ambiguous_choice { enum Color { Red }; struct C { C(); C(Color); operator Color() const; bool operator==(Color) const; friend bool operator==(C, C); }; bool c = C{} == C{}; // error bool d = C{} == Red; } Differential revision: https://reviews.llvm.org/D78938	2020-12-17 10:44:10 +00:00
Florian Hahn	eba09a2db9	[InstCombine] Preserve !annotation for newly created instructions. When replacing an instruction with !annotation with a newly created replacement, add the !annotation metadata to the replacement. This mostly covers cases where the new instructions are created using the ::Create helpers. Instructions created by IRBuilder will be handled by D91444. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D93399	2020-12-17 09:06:51 +00:00
Kazu Hirata	4ad5b634f6	[GCN] Remove unused function handleNewInstruction (NFC) The function was added without a user on Dec 22, 2016 in commit `7e274e02ae`. It seems to be unused since then.	2020-12-16 21:57:48 -08:00
Hongtao Yu	ac068e014b	[CSSPGO] Consume pseudo-probe-based AutoFDO profile This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`. Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites). Adjustment to profile format A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums. Differential Revision: https://reviews.llvm.org/D92347	2020-12-16 15:57:18 -08:00
alex-t	35ec3ff76d	Disable Jump Threading for the targets with divergent control flow Details: Jump Threading does not make sense for the targets with divergent CF since they do not use branch prediction for speculative execution. Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform. This may cause errors in further CF lowering. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93302	2020-12-17 02:40:54 +03:00
Roman Lebedev	d22a47e9ff	[SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree A first real transformation that didn't already knew how to do that, but it's pretty tame - either change successor of all the predecessors of a block and carefully delay deletion of the block until afterwards the DomTree updates are appled, or add a successor to the block. There wasn't a great test coverage for this, so i added extra, to be sure.	2020-12-17 01:03:50 +03:00
Roman Lebedev	5cce4aff18	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	49dac4aca0	[SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-17 01:03:49 +03:00
Roman Lebedev	4fc169f664	[SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-17 01:03:49 +03:00
Rong Xu	0abd744597	[PGO] Use the sum of profile counts to fix the function entry count Raw profile count values for each BB are not kept after profile annotation. We record function entry count and branch weights and use them to compute the count when needed. This mechanism works well in a perfect world, but often breaks in real programs, because of number prevision, inconsistent profile, or bugs in BFI). This patch uses sum of profile count values to fix function entry count to make the BFI count close to real profile counts. Differential Revision: https://reviews.llvm.org/D61540	2020-12-16 13:37:43 -08:00
Nikita Popov	e728024808	[DSE] Pass MemoryLocation by const ref (NFC)	2020-12-16 21:47:46 +01:00
Sanjay Patel	38ebc1a13d	[VectorCombine] optimize alignment for load transform Here's another minimal step suggested by D93229 / D93397 . (I'm trying to be extra careful in these changes because load transforms are easy to get wrong.) We can optimistically choose the greater alignment of a load and its pointer operand. As the test diffs show, this can improve what would have been unaligned vector loads into aligned loads. When we enhance with gep offsets, we will need to adjust the alignment calculation to include that offset. Differential Revision: https://reviews.llvm.org/D93406	2020-12-16 15:25:45 -05:00
Sanjay Patel	aaaf0ec72b	[VectorCombine] loosen alignment constraint for load transform As discussed in D93229, we only need a minimal alignment constraint when querying whether a hypothetical vector load is safe. We still pass/use the potentially stronger alignment attribute when checking costs and creating the new load. There's already a test that changes with the minimum code change, so splitting this off as a preliminary commit independent of any gep/offset enhancements. Differential Revision: https://reviews.llvm.org/D93397	2020-12-16 12:25:18 -05:00
Whitney Tsang	fa3693ad0b	[LoopNest] Handle loop-nest passes in LoopPassManager Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this patch (and other following patches) is to create facilities that allow implementing loop nest passes that run on top-level loop nests for the New Pass Manager. This patch extends the functionality of LoopPassManager to handle loop-nest passes by specializing the definition of LoopPassManager that accepts both kinds of passes in addPass. Only loop passes are executed if L is not a top-level one, and both kinds of passes are executed if L is top-level. Currently, loop nest passes should have the following run method: PreservedAnalyses run(LoopNest &, LoopAnalysisManager &, LoopStandardAnalysisResults &, LPMUpdater &); Reviewed By: Whitney, ychen Differential Revision: https://reviews.llvm.org/D87045	2020-12-16 17:07:14 +00:00
Caroline Concatto	be9184bc55	[SLPVectorizer]Migrate getEntryCost to return InstructionCost This patch also changes: the return type of getGatherCost and the signature of the debug function dumpTreeCosts to use InstructionCost. This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 Depends on D93049 Differential Revision: https://reviews.llvm.org/D93127	2020-12-16 14:18:40 +00:00
Caroline Concatto	07217e0a1b	[CostModel]Migrate getTreeCost() to use InstructionCost This patch changes the type of cost variables (for instance: Cost, ExtractCost, SpillCost) to use InstructionCost. This patch also changes the type of cost variables to InstructionCost in other functions that use the result of getTreeCost() This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D91174 Differential Revision: https://reviews.llvm.org/D93049	2020-12-16 13:08:37 +00:00
Bangtian Liu	c10757200d	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `cf638d793c`.	2020-12-16 11:52:30 +00:00
Philip Reames	1f6e15566f	[LV] Weaken a unnecessarily strong assert [NFC] Account for the fact that (in the future) the latch might be a switch not a branch. The existing code is correct, minus the assert.	2020-12-15 19:07:53 -08:00
Philip Reames	af7ef895d4	[LV] Extend dead instruction detection to multiple exiting blocks Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch. I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will.	2020-12-15 18:46:32 -08:00
Bangtian Liu	cf638d793c	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-15 23:32:29 +00:00
Johannes Doerfert	dcaec81211	[OpenMP] Use assumptions during ICV tracking The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us to ignore calls that would otherwise prevent ICV tracking. Once we track more ICVs we might need to distinguish the ones that could be impacted even with `no_openmp_routines`. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D92050	2020-12-15 16:51:34 -06:00
Johannes Doerfert	d08d490a4c	[OpenMPOpt][NFC] Clang format	2020-12-15 16:51:34 -06:00
Roman Lebedev	e113317958	[NFCI][SimplifyCFG] Add basic scaffolding for gradually making the pass DomTree-aware Two observations: 1. Unavailability of DomTree makes it impossible to make `FoldBranchToCommonDest()` transform in certain cases, where the successor is dominated by predecessor, because we then don't have PHI's, and can't recreate them, well, without handrolling 'is dominated by' check, which doesn't really look like a great solution to me. 2. Avoiding invalidating DomTree in SimplifyCFG will decrease the number of `Dominator Tree Construction` by 5 (from 28 now, i.e. -18%) in `-O3` old-pm pipeline (as per `llvm/test/Other/opt-O3-pipeline.ll`) This might or might not be beneficial for compile time. So the plan is to make SimplifyCFG preserve DomTree, and then eventually make DomTree fully required and preserved by the pass. Now, SimplifyCFG is ~7KLOC. I don't think it will be nice to do all this uplifting in a single mega-commit, nor would it be possible to review it in any meaningful way. But, i believe, it should be possible to do this in smaller steps, introducing the new behavior, in an optional way, off-by-default, opt-in option, and gradually fixing transforms one-by-one and adding the flag to appropriate test coverage. Then, eventually, the default should be flipped, and eventually^2 the flag removed. And that is what is happening here - when the new off-by-default option is specified, DomTree is required and is claimed to be preserved, and SimplifyCFG-internal assertions verify that the DomTree is still OK.	2020-12-16 00:38:00 +03:00
Philip Reames	a81db8b315	[LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC] This should be purely non-functional. When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing. Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing.	2020-12-15 12:38:13 -08:00
Simon Pilgrim	a3bd67f222	SeparateConstOffsetFromGEP::lowerToSingleIndexGEPs - don't use dyn_cast_or_null. NFCI. ResultPtr is guaranteed to be non-null - and using dyn_cast_or_null causes unnecessary static analyzer warnings. We can't say the same for FirstResult AFAICT, so keep dyn_cast_or_null for that.	2020-12-15 17:27:25 +00:00
Florian Hahn	7ea3932ab1	[AnnotationRemarks] Also generate annotation remarks when using -O0. The AnnotationRemarks pass is already run at the end of the module pipeline. This patch also adds it before bailing out for -O0, so remarks are also generated with -O0.	2020-12-15 14:46:52 +00:00
Florian Hahn	7186a3965a	[VPlan] Use VPDef for VPWidenSelectRecipe. This patch turns updates VPWidenSelectRecipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90560	2020-12-15 14:15:01 +00:00
Jun Ma	52a3267ffa	[InstCombine] Remove scalable vector restriction in foldVectorBinop Differential Revision: https://reviews.llvm.org/D93289	2020-12-15 21:14:59 +08:00
Jun Ma	ffe84d90e9	[InstCombine][NFC] Change cast of FixedVectorType to dyn_cast.	2020-12-15 20:36:57 +08:00
Jun Ma	e12f584578	[InstCombine] Remove scalable vector restriction in InstCombineCompares Differential Revision: https://reviews.llvm.org/D93269	2020-12-15 20:36:57 +08:00
Jun Ma	2ac58e21a1	[InstCombine] Remove scalable vector restriction when fold SelectInst Differential Revision: https://reviews.llvm.org/D93083	2020-12-15 20:36:57 +08:00
Florian Hahn	318f5798d8	[VPlan] Use VPDef for VPWidenGEPRecipe. This patch turns updates VPWidenGEPRecipe to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90561	2020-12-15 09:30:14 +00:00
Florian Hahn	ad1161f9b5	[VPlan] Use VPdef for VPWidenCall. This patch turns updates VPWidenREcipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90559	2020-12-15 09:20:07 +00:00
Nico Weber	a852ee199c	Reland "[MachineDebugify] Insert synthetic DBG_VALUE instructions" This reverts commit `841f9c937f`. The change landed many months ago; something else broke those tests.	2020-12-14 22:34:23 -05:00
Nico Weber	841f9c937f	Revert "[MachineDebugify] Insert synthetic DBG_VALUE instructions" This reverts commit `2a5675f11d`. The tests it adds fail: https://reviews.llvm.org/D78135#2453736	2020-12-14 22:14:48 -05:00
Reid Kleckner	d2ed9d6b7e	Revert "ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC" We determined that the MSVC implementation of std::aligned* isn't suited to our needs. It doesn't support 16 byte alignment or higher, and it doesn't really guarantee 8 byte alignment. See https://github.com/microsoft/STL/issues/1533 Also reverts "ADT: Change AlignedCharArrayUnion to an alias of std::aligned_union_t, NFC" Also reverts "ADT: Remove AlignedCharArrayUnion, NFC" to bring back AlignedCharArrayUnion. This reverts commit `4d8bf870a8`. This reverts commit `d10f9863a5`. This reverts commit `4b5dc150b9`.	2020-12-14 17:04:06 -08:00
Rong Xu	54e03d03a7	[PGO] Verify BFI counts after loading profile data This patch adds the functionality to compare BFI counts with real profile counts right after reading the profile. It will print remarks under -Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo. Differential Revision: https://reviews.llvm.org/D91813	2020-12-14 15:56:10 -08:00
Gulfem Savrun Yeniceri	7c0e3a77bc	[clang][IR] Add support for leaf attribute This patch adds support for leaf attribute as an optimization hint in Clang/LLVM. Differential Revision: https://reviews.llvm.org/D90275	2020-12-14 14:48:17 -08:00
Sanjay Patel	d399f870b5	[VectorCombine] make load transform poison-safe As noted in D93229, the transform from scalar load to vector load potentially leaks poison from the extra vector elements that are being loaded. We could use freeze here (and x86 codegen at least appears to be the same either way), but we already have a shuffle in this logic to optionally change the vector size, so let's allow that instruction to serve both purposes. Differential Revision: https://reviews.llvm.org/D93238	2020-12-14 17:42:01 -05:00
Craig Topper	25067f179f	[LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing. This adds support for loops like unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT; while (x) { w--; x >>= 1; } return w; } and unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT - 1; while (x >>= 1) { w--; } return w; } To support these we look for add x, -1 as well as add x, 1 that we already matched. If the value was -1 we need to subtract from the initial counter value instead of adding to it. Fixes PR48404. Differential Revision: https://reviews.llvm.org/D92745	2020-12-14 14:25:05 -08:00
Philip Reames	f5fe8493e5	[LAA] Relax restrictions on early exits in loop structure his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist). Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression. Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so. Differential Revision: https://reviews.llvm.org/D92066	2020-12-14 12:44:01 -08:00
Roman Lebedev	59560e8589	[SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450) Even though `d38205144f` was mostly a correct fix for the external non-PHI users, it's not a generally correct fix, because the 'placeholder' values in those trivial PHI's we create shouldn't be always 'undef', but the PHI itself for the backedges, else we end up with wrong value, as the `@pr48450_2` test shows. But we can't just do that, because we can't check that the PHI can be it's own incoming value when coming from certain predecessor, because we don't have a dominator tree. So until we can address this correctness problem properly, ensure that we don't perform the transformation if there are such problematic external uses. Making dominator tree available there is going to be involved, since `-simplifycfg` pass currently does not preserve/update domtree...	2020-12-14 20:14:31 +03:00
Roman Lebedev	e8360a8e1e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): pull out 'common successor' into a variable Makes it easier to use it elsewhere	2020-12-14 20:14:31 +03:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Markus Lavin	2a6782bb9f	Reland [DebugInfo] Improve dbg preservation in LSR. Use SCEV to salvage additional @llvm.dbg.value that have turned into referencing undef after transformation (and traditional salvageDebugInfo). Before rewrite (but after introduction of new induction variables) use SCEV to compute an equivalent set of values for each @llvm.dbg.value in the loop body (among the loop header PHI-nodes). After rewrite (and dead PHI elimination) update those @llvm.dbg.value now referencing undef by picking a remaining value from its equivalence set. Allow match with offset by inserting compensation code in the DIExpression. Fixes : PR38815 Differential Revision: https://reviews.llvm.org/D87494	2020-12-14 16:15:18 +01:00
Florian Hahn	e42e5263bd	[VPlan] Make VPWidenMemoryInstructionRecipe a VPDef. This patch updates VPWidenMemoryInstructionRecipe to use VPDef to manage the value it produces instead of inheriting from VPValue. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90563	2020-12-14 14:13:59 +00:00
Anton Afanasyev	fac7c7ec3c	[SLP] Fix vector element size for the store chains Vector element size could be different for different store chains. This patch prevents wrong computation of maximum number of elements for that case. Differential Revision: https://reviews.llvm.org/D93192	2020-12-14 15:51:43 +03:00
Kazu Hirata	5891ad4e22	[Transforms] Use llvm::erase_value (NFC)	2020-12-13 09:48:47 -08:00
Florian Hahn	533f85767c	[VPlan] Use interleaveComma in printOperands() (NFC).	2020-12-13 16:29:16 +00:00
Roman Lebedev	d38205144f	[SimplifyCFG] FoldBranchToCommonDest(): bonus instrns must only be used by PHI nodes in successors (PR48450) In particular, if the successor block, which is about to get a new predecessor block, currently only has a single predecessor, then the bonus instructions will be directly used within said successor, which is fine, since the block with bonus instructions dominates that successor. But once there's a new predecessor, the IR is no longer valid, and we don't fix it, because we only update PHI nodes. Which means, the live-out bonus instructions must be exclusively used by the PHI nodes in successor blocks. So we have to form trivial PHI nodes. which will then be successfully updated to recieve cloned bonus instns. This all works fine, except for the fact that we don't have access to the dominator tree, and we don't ignore unreachable code, so we sometimes do end up having to deal with some weird IR. Fixes https://bugs.llvm.org/show_bug.cgi?id=48450	2020-12-13 00:06:57 +03:00
Nikita Popov	afbb6d97b5	[CVP] Simplify and generalize switch handling CVP currently handles switches by checking an equality predicate on all edges from predecessor blocks. Of course, this can only work if the value being switched over is defined in a different block. Replace this implementation with a call to getPredicateAt(), which also does the predecessor edge predicate check (if not defined in the same block), but can also do quite a bit more: It can reason about phi-nodes by checking edge predicates for incoming values, it can reason about assumes, and it can reason about block values. As such, this makes the implementation both simpler and more powerful. The compile-time impact on CTMark is in the noise.	2020-12-12 21:12:27 +01:00
Kazu Hirata	215c1b1935	[Transforms] Use is_contained (NFC)	2020-12-12 09:37:49 -08:00
David Green	ab97c9bdb7	[LV] Fix scalar cost for tail predicated loops When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop. Original patch by Anna Welker Differential Revision: https://reviews.llvm.org/D86452	2020-12-12 14:21:40 +00:00
Fangrui Song	b5ad32ef5c	Migrate deprecated DebugLoc::get to DILocation::get This migrates all LLVM (except Kaleidoscope and CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get. The CodeGen/StackProtector.cpp usage may have a nullptr Scope and can trigger an assertion failure, so I don't migrate it. Reviewed By: #debug-info, dblaikie Differential Revision: https://reviews.llvm.org/D93087	2020-12-11 12:45:22 -08:00
Marco Elver	c28b18af19	[KernelAddressSanitizer] Fix globals exclusion for indirect aliases GlobalAlias::getAliasee() may not always point directly to a GlobalVariable. In such cases, try to find the canonical GlobalVariable that the alias refers to. Link: https://github.com/ClangBuiltLinux/linux/issues/1208 Reviewed By: dvyukov, nickdesaulniers Differential Revision: https://reviews.llvm.org/D92846	2020-12-11 12:20:40 +01:00
David Sherwood	9b76160e53	[Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174	2020-12-11 08:12:54 +00:00
Hongtao Yu	705a4c149d	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 17:29:28 -08:00
Mitch Phillips	7ead5f5aa3	Revert "[CSSPGO] Pseudo probe encoding and emission." This reverts commit `b035513c06`. Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269	2020-12-10 15:53:39 -08:00
Zequan Wu	b5216b2950	[PGO] Enable preinline and cleanup when optimize for size Differential Revision: https://reviews.llvm.org/D91673	2020-12-10 12:29:17 -08:00
Sanjay Patel	4f051fe374	[InstCombine] avoid crash sinking to unreachable block The test is reduced from the example in D82005. Similar to `94f6d365e`, the test here would assert in the DomTree when we tried to convert a select to a phi with an unreachable block operand. We may want to add some kind of guard code in DomTree itself to avoid this sort of problem.	2020-12-10 13:10:26 -05:00
Sanjay Patel	12b684ae02	[VectorCombine] improve readability; NFC If we are going to allow adjusting the pointer for GEPs, rearranging the code a bit will make it easier to follow.	2020-12-10 13:10:26 -05:00
Hongtao Yu	b035513c06	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 09:50:08 -08:00
Jun Ma	137674f882	[TruncInstCombine] Remove scalable vector restriction Differential Revision: https://reviews.llvm.org/D92819	2020-12-10 18:00:19 +08:00
Jianzhou Zhao	ea981165a4	[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char, int> get_token(char p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************ Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. ************************* * An potential implementation plan ************************* The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. ************* * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert\|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261	2020-12-09 19:38:35 +00:00
Sanjay Patel	b2ef264096	[VectorCombine] allow peeking through an extractelt when creating a vector load This is an enhancement to load vectorization that is motivated by a pattern in https://llvm.org/PR16739. Unfortunately, it's still not enough to make a difference there. We will have to handle multi-use cases in some better way to avoid creating multiple overlapping loads. Differential Revision: https://reviews.llvm.org/D92858	2020-12-09 10:36:14 -05:00
Roman Lebedev	e6f2a79d7a	[InstCombine] canonicalizeSaturatedAdd(): last fold is only valid for strict comparison (PR48390) We could create uadd.sat under incorrect circumstances if a select with -1 as the false value was canonicalized by swapping the T/F values. Unlike the other transforms in the same function, it is not invariant to equality. Some alive proofs: https://alive2.llvm.org/ce/z/emmKKL Based on original patch by David Green! Fixes https://bugs.llvm.org/show_bug.cgi?id=48390 Differential Revision: https://reviews.llvm.org/D92717	2020-12-09 18:19:09 +03:00
Anton Afanasyev	e5bf2e8989	[SLP] Use the width of value truncated just before storing For stores chain vectorization we choose the size of vector elements to ensure we fit to minimum and maximum vector register size for the number of elements given. This patch corrects vector element size choosing the width of value truncated just before storing instead of the width of value stored. Fixes PR46983 Differential Revision: https://reviews.llvm.org/D92824	2020-12-09 16:38:45 +03:00
Sander de Smalen	d568cff696	[LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. * Steps are scaled by `vscale`, a runtime value. * Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately. This can vectorize the following loop [1]: void loop(int N, double a, double b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } [1] This source-level example is based on the pragma proposed separately in D89031. This patch only implements the LLVM part. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91077	2020-12-09 11:25:21 +00:00
Sander de Smalen	adc37145de	[LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable. This patch removes a number of asserts that VF is not scalable, even though the code where this assert lives does nothing that prevents VF being scalable. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91060	2020-12-09 11:25:21 +00:00
Joe Ellis	80c33de2d3	[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics This commit adds two new intrinsics. - llvm.experimental.vector.insert: used to insert a vector into another vector starting at a given index. - llvm.experimental.vector.extract: used to extract a subvector from a larger vector starting from a given index. The codegen work for these intrinsics has already been completed; this commit is simply exposing the existing ISD nodes to LLVM IR. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D91362	2020-12-09 11:08:41 +00:00
Philip Reames	5171b7b40e	[indvars] Common a bit of code [NFC]	2020-12-08 15:25:48 -08:00
Anna Thomas	29356e3279	[ScalarizeMaskedMemIntrin] Add new PM support This patch adds new PM support for the pass and the pass can be now used during middle-end transforms. The old pass is remamed to ScalarizeMaskedMemIntrinLegacyPass. Reviewed-By: skatkov, aeubanks Differential Revision: https://reviews.llvm.org/D92743	2020-12-08 17:15:22 -05:00
Benjamin Kramer	5f18e2f31e	Move createScalarizeMaskedMemIntrinPass to Scalar.h	2020-12-08 19:08:09 +01:00
Benjamin Kramer	10987e30be	Remove unused include. NFC. This is also a layering violation.	2020-12-08 19:03:56 +01:00
Anna Thomas	09f2f9605f	[ScalarizeMaskedMemIntrinsic] Move from CodeGen into Transforms ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass is actually operating on IR level and does not use any code gen specific passes. It is useful to move it into transforms directory so that it can be more widely used as a mid-level transform as well (apart from usage in codegen pipeline). In particular, we have a usecase downstream where we would like to use this pass in our mid-level pipeline which operates on IR level. The next change will be to add support for new PM. Reviewers: craig.topper, apilipenko, skatkov Reviewed-By: skatkov Differential Revision: https://reviews.llvm.org/D92407	2020-12-08 12:25:58 -05:00
Xun Li	31e60b9133	[coroutine] should disable inline before calling coro split This is a rework of D85812, which didn't land. When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue test plan: check-llvm, check-clang In D85812, there was suggestions on moving the macros to Attributes.td to avoid circular header dependency issue. I believe it's not worth doing just to be able to use one constant string in one place. Today, there are already 3 possible attribute values for "coroutine.presplit": `c6543cc6b8/llvm/lib/Transforms/Coroutines/CoroInternal.h (L40-L42)` If we move them into Attributes.td, we would be adding 3 new attributes to EnumAttr, just to support this, which I think is an overkill. Instead, I think the best way to do this is to add an API in Function class that checks whether this function is a coroutine, by checking the attribute by name directly. Differential Revision: https://reviews.llvm.org/D92706	2020-12-08 08:53:08 -08:00
Teresa Johnson	77b509710c	[ICP] Don't promote when target not defined in module This guards against cases where the symbol was dead code eliminated in the binary by ThinLTO, and we have a sample profile collected for one binary but used to optimize another. Most of the benefit from ICP comes from inlining the target, which we can't do with only a declaration anyway. If this is in the pre-ThinLTO link step (e.g. for instrumentation based PGO), we will attempt the promotion again in the ThinLTO backend after importing anyway, and we don't need the early promotion to facilitate that. Differential Revision: https://reviews.llvm.org/D92804	2020-12-08 07:45:36 -08:00
Sjoerd Meijer	1e260f955d	[LICM][docs] Document that LICM is also a canonicalization transform. NFC. This documents that LICM is a canonicalization transform, which we discussed recently in: http://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html but which was also discused earlier, e.g. in: http://lists.llvm.org/pipermail/llvm-dev/2019-September/135058.html	2020-12-08 11:56:35 +00:00
Evgeniy Brevnov	2d1b024d06	[DSE][NFC] Need to be carefull mixing signed and unsigned types Currently in some places we use signed type to represent size of an access and put explicit casts from unsigned to signed. For example: int64_t EarlierSize = int64_t(Loc.Size.getValue()); Even though it doesn't loos bits (immidiatly) it may overflow and we end up with negative size. Potentially that cause later code to work incorrectly. A simple expample is a check that size is not negative. I think it would be safer and clearer if we use unsigned type for the size and handle it appropriately. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D92648	2020-12-08 16:53:37 +07:00
Valentin Churavy	700cf7dcc9	[VNCoercion] Disallow coercion between different ni addrspaces I'm not sure if it would be legal by the IR reference to introduce an addrspacecast here, since the IR reference is a bit vague on the exact semantics, but at least for our usage of it (and I suspect for many other's usage) it is not. For us, addrspacecasts between non-integral address spaces carry frontend information that the optimizer cannot deduce afterwards in a generic way (though we have frontend specific passes in our pipline that do propagate these). In any case, I'm sure nobody is using it this way at the moment, since it would have introduced inttoptrs, which are definitely illegal. Fixes PR38375 Co-authored-by: Keno Fischer <keno@alumni.harvard.edu> Reviewed By: reames Differential Revision: https://reviews.llvm.org/D50010	2020-12-07 20:19:48 -05:00
Sanjay Patel	5fe1a49f96	[SLP] fix typo in debug string; NFC	2020-12-07 15:09:21 -05:00
Bardia Mahjour	4db9b78c81	[LV] Epilogue Vectorization with Optimal Control Flow - Default Enablement This patch enables epilogue vectorization by default per reviewer requests. Differential Revision: https://reviews.llvm.org/D89566	2020-12-07 14:29:36 -05:00
Florian Hahn	32825e8636	[ConstraintElimination] Tweak placement in pipeline. This patch adds the ConstraintElimination pass to the LTO pipeline and also runs it after SCCP in the function simplification pipeline. This increases the number of cases we can elimination. Pending further tuning.	2020-12-07 19:08:40 +00:00
Simon Pilgrim	50dd1dba6e	[IPO] Fix operator precedence warning. NFCI. Check the entire assertion condition before && with the message.	2020-12-07 18:23:54 +00:00
Alexey Bataev	438682de6a	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the ivectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D92668	2020-12-07 07:50:00 -08:00
Jun Ma	216689ace7	[Coroutines] Add DW_OP_deref for transformed dbg.value intrinsic. Differential Revision: https://reviews.llvm.org/D92462	2020-12-07 10:24:44 +08:00
Craig Topper	305fcc9122	[LoopIdiomRecognize] Merge a conditional operator with an earlier if and remove an extra temporary variable. NFC The CountPrev variable was only used to forward a value from the if statement to the conditional operator under the same condition. While there move some variable declarations to their first assignment.	2020-12-06 15:23:18 -08:00
Fangrui Song	2832f3528c	[Transforms] Delete unused declarations from NewGVN/CoroSplit/ValueMapper	2020-12-06 13:04:01 -08:00
Wenlei He	6b989a1710	[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon. Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO. Interface There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile. - Query base profile (`getBaseSamplesFor`) Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function. - Query context profile (`getContextSamplesFor`) Context profile is a function's CFG profile for a given calling context. We can query context profile by context string. - Track inlined context profile (`markContextSamplesInlined`) When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function. - Track not-inlined context profile (`promoteMergeContextSamplesTree`) When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination. Implementation Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state. On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes. Integration `SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`. Differential Revision: https://reviews.llvm.org/D90125	2020-12-06 11:49:18 -08:00
Kazu Hirata	ddb002d7c7	[InstCombine] Remove replacePointer (NFC) The declaration was introduced on Feb 10, 2017 in commit `ba01ed00fe` without a corresponding definition.	2020-12-06 10:24:08 -08:00
Sanjay Patel	94f6d365e4	[InstCombine] avoid crash on phi with unreachable incoming block (PR48369)	2020-12-06 09:31:47 -05:00
Fangrui Song	204d0d51b3	[MemProf] Make __memprof_shadow_memory_dynamic_address dso_local in static relocation model The x86-64 backend currently has a bug which uses a wrong register when for the GOTPCREL reference. The program will crash without the dso_local specifier.	2020-12-05 21:36:31 -08:00
Florian Hahn	4ceecc820b	[ConstraintElimination] Handle constraints with all zero var coeffs. Constraints where all variable coefficients are 0 do not add any useful information. When checking, we can check if they are always true/false.	2020-12-05 12:06:53 +00:00
Kazu Hirata	8006043b13	[IRCE] Remove unused IsSigned and its accessor (NFC) IsSigned and its accessor, isSigned, were introduced on Oct 25, 2017 in commit `9ac7021a25`. The last use was removed on Nov 20, 2017 in commit `268467869b`.	2020-12-04 21:26:12 -08:00
Jianzhou Zhao	a28db8b27a	[dfsan] Add empty APIs for field-level shadow This is a child diff of D92261. This diff adds APIs that return shadow type/value/zero from origin objects. For the time being these APIs simply returns primitive shadow type/value/zero. The following diff will be implementing the conversion. As D92261 explains, some cases still use primitive shadow during the incremential changes. The cases include 1) alloca/load/store 2) custom function IO 3) vectors At the cases this diff does not use the new APIs, but uses primitive shadow objects explicitly. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92629	2020-12-04 21:42:07 +00:00
Duncan P. N. Exon Smith	d10f9863a5	ADT: Migrate users of AlignedCharArrayUnion to std::aligned_union_t, NFC Prepare to delete `AlignedCharArrayUnion` by migrating its users over to `std::aligned_union_t`. I will delete `AlignedCharArrayUnion` and its tests in a follow-up commit so that it's easier to revert in isolation in case some downstream wants to keep using it. Differential Revision: https://reviews.llvm.org/D92516	2020-12-04 12:34:49 -08:00
Duncan P. N. Exon Smith	5b267fb796	ADT: Stop peeking inside AlignedCharArrayUnion, NFC Update all the users of `AlignedCharArrayUnion` to stop peeking inside (to look at `buffer`) so that a follow-up patch can replace it with an alias to `std::aligned_union_t`. This was reviewed as part of https://reviews.llvm.org/D92512, but I'm splitting this bit out to commit first to reduce churn in case the change to `AlignedCharArrayUnion` needs to be reverted for some unexpected reason.	2020-12-04 11:07:42 -08:00
Hiroshi Yamauchi	f9c3954a6e	Fix for Bug 48055. Differential Revision: https://reviews.llvm.org/D92599	2020-12-04 11:05:01 -08:00
Arthur Eubanks	7f6f9f4cf9	[NewPM] Make pass adaptors less templatey Currently PassBuilder.cpp is by far the file that takes longest to compile. This is due to tons of templates being instantiated per pass. Follow PassManager by using wrappers around passes to avoid making the adaptors templated on the pass type. This allows us to move various adaptors' run methods into .cpp files. This reduces the compile time of PassBuilder.cpp on my machine from 66 to 39 seconds. It also reduces the size of opt from 685M to 676M. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D92616	2020-12-04 08:30:50 -08:00
Evgeniy Brevnov	061cebb46f	[NFC][NARY-REASSOCIATE] Restructure code to aviod isPotentiallyReassociatable Currently we have to duplicate the same checks in isPotentiallyReassociatable and tryReassociate. With simple pattern like add/mul this may be not a big deal. But the situation gets much worse when I try to add support for min/max. Min/Max may be represented by several instructions and can take different forms. In order reduce complexity for upcoming min/max support we need to restructure the code a bit to avoid mentioned code duplication. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88286	2020-12-04 16:19:43 +07:00
Evgeniy Brevnov	f61c29b3a7	[NARY-REASSOCIATE] Simplify traversal logic by post deleting dead instructions Currently we delete optimized instructions as we go. That has several negative consequences. First it complicates traversal logic itself. Second if newly generated instruction has been deleted the traversal is repeated from scratch. But real motivation for the change is upcoming change with support for min/max reassociation. Here we employ SCEV expander to generate code. As a result newly generated instructions may be inserted not right before original instruction (because SCEV may do hoisting) and there is no way to know 'next' instruction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88285	2020-12-04 16:17:50 +07:00
Kazu Hirata	e2fc11cf9f	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. Differential Revision: https://reviews.llvm.org/D92608	2020-12-03 23:50:17 -08:00
Max Kazantsev	12b6c5e682	Return "[IndVars] ICmpInst should not prevent IV widening" This reverts commit `4bd35cdc3a`. The patch was reverted during the investigation. The investigation shown that the patch did not cause any trouble, but just exposed the existing problem that is addressed by the previous patch "[IndVars] Quick fix LHS/RHS bug". Returning without changes.	2020-12-04 12:34:43 +07:00
Max Kazantsev	3df0daceb2	[IndVars] Quick fix LHS/RHS bug The code relies on fact that LHS is the NarrowDef but never really checks it. Adding the conservative restrictive check, will follow-up with handling of case where RHS is a NarrowDef.	2020-12-04 12:34:42 +07:00
Jianzhou Zhao	80e326a8c4	[dfsan] Support passing non-i16 shadow values in TLS mode This is a child diff of D92261. It extended TLS arg/ret to work with aggregate types. For a function t foo(t1 a1, t2 a2, ... tn an) Its arguments shadow are saved in TLS args like a1_s, a2_s, ..., an_s TLS ret simply includes r_s. By calculating the type size of each shadow value, we can get their offset. This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp. Note that this change does not add test cases for overflowed TLS arg/ret because this is hard to test w/o supporting aggregate shdow types. We will be adding them after supporting that. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92440	2020-12-04 02:45:07 +00:00
Philip Reames	0c866a3d6a	[LoopVec] Support non-instructions as argument to uniform mem ops The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists. This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of it's operand. Only instructions within the loop have their uses scanned.) In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction. This allows detection of uniform mem ops involving global addresses. Differential Revision: https://reviews.llvm.org/D92056	2020-12-03 14:51:44 -08:00
dfukalov	2ce38b3f03	[NFC] Reduce include files dependency. 1. Removed #include "...AliasAnalysis.h" in other headers and modules. 2. Cleaned up includes in AliasAnalysis.h. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92489	2020-12-03 18:25:05 +03:00
Max Kazantsev	4bd35cdc3a	Revert "[IndVars] ICmpInst should not prevent IV widening" This reverts commit `0c9c6ddf17`. We are seeing some failures with this patch locally. Not clear if it's causing them or just triggering a problem in another place. Reverting while investigating.	2020-12-03 18:01:41 +07:00
modimo	c1ba991e8d	[NFC] Fix typo	2020-12-02 22:23:57 -08:00
Jianzhou Zhao	bd726d2796	[dfsan] Rename ShadowTy/ZeroShadow with prefix Primitive This is a child diff of D92261. After supporting field/index-level shadow, the existing shadow with type i16 works for only primitive types. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92459	2020-12-03 05:31:01 +00:00
Florian Hahn	2304528bb5	[ConstraintElimination] Make sure arguments of std:pow match. This should fix a build failure on some systems, e.g. solaris11-sparcv9 http://lab.llvm.org:8014/#/builders/22	2020-12-02 22:23:26 +00:00
Hongtao Yu	24d4291ca7	[CSSPGO] Pseudo probes for function calls. An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work. One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution. With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst. To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use. Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`. Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91756	2020-12-02 13:45:20 -08:00
Jianzhou Zhao	dad5d95883	[dfsan] Rename CachedCombinedShadow to be CachedShadow At D92261, this type will be used to cache both combined shadow and converted shadow values. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92458	2020-12-02 21:39:16 +00:00
jasonliu	a65d8c5d72	[XCOFF][AIX] Generate LSDA data and compact unwind section on AIX Summary: AIX uses the existing EH infrastructure in clang and llvm. The major differences would be 1. AIX do not have CFI instructions. 2. AIX uses a new personality routine, named __xlcxx_personality_v1. It doesn't use the GCC personality rountine, because the interoperability is not there yet on AIX. 3. AIX do not use eh_frame sections. Instead, it would use a eh_info section (compat unwind section) to store the information about personality routine and LSDA data address. Reviewed By: daltenty, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D91455	2020-12-02 18:42:44 +00:00
Bardia Mahjour	a7e2c26939	[LV] Epilogue Vectorization with Optimal Control Flow (Recommit) This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-02 10:09:56 -05:00
Sanjay Patel	56fd29e93b	[SLP] use 'match' for binop/select; NFC This might be a small improvement in readability, but the real motivation is to make it easier to adapt the code to deal with intrinsics like 'maxnum' and/or integer min/max. There is potentially help in doing that with D92086, but we might also just add specialized wrappers here to deal with the expected patterns.	2020-12-02 09:04:08 -05:00
Alex Zinenko	240dd92432	[OpenMPIRBuilder] forward arguments as pointers to outlined function OpenMPIRBuilder::createParallel outlines the body region of the parallel construct into a new function that accepts any value previously defined outside the region as a function argument. This function is called back by OpenMP runtime function __kmpc_fork_call, which expects trailing arguments to be pointers. If the region uses a value that is not of a pointer type, e.g. a struct, the produced code would be invalid. In such cases, make createParallel emit IR that stores the value on stack and pass the pointer to the outlined function instead. The outlined function then loads the value back and uses as normal. Reviewed By: jdoerfert, llitchev Differential Revision: https://reviews.llvm.org/D92189	2020-12-02 14:59:41 +01:00
David Sherwood	71bd59f0cb	[SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute In this patch I have added support for a new loop hint called vectorize.scalable.enable that says whether we should enable scalable vectorization or not. If a user wants to instruct the compiler to vectorize a loop with scalable vectors they can now do this as follows: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2 ... !2 = !{!2, !3, !4} !3 = !{!"llvm.loop.vectorize.width", i32 8} !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true} Setting the hint to false simply reverts the behaviour back to the default, using fixed width vectors. Differential Revision: https://reviews.llvm.org/D88962	2020-12-02 13:23:43 +00:00
Chen Zheng	3cb7d62452	[LSR][NFC] don't collect chains when isNumRegsMajorCostOfLSR is false. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D92159	2020-12-01 22:29:33 -05:00
Jianzhou Zhao	405ea2b93d	[msan] Replace 8 by kShadowTLSAlignment Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D92275	2020-12-02 01:09:49 +00:00
Fangrui Song	a5309438fe	static const char *const foo => const char foo[] By default, a non-template variable of non-volatile const-qualified type having namespace-scope has internal linkage, so no need for `static`.	2020-12-01 10:33:18 -08:00
Bardia Mahjour	c94af03f7f	Revert "[LV] Epilogue Vectorization with Optimal Control Flow" This reverts commit `9c5504adce`. Reverting to investigate build failure in http://lab.llvm.org:8011/#/builders/98/builds/1461/steps/9	2020-12-01 12:50:36 -05:00
Bardia Mahjour	9c5504adce	[LV] Epilogue Vectorization with Optimal Control Flow This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-01 12:04:29 -05:00
Nikita Popov	624af932a8	[MemCpyOpt] Port to MemorySSA This is a straightforward port of MemCpyOpt to MemorySSA following the approach of D26739. MemDep queries are replaced with MSSA queries without changing the overall structure of the pass. Some care has to be taken to account for differences between these APIs (MemDep also returns reads, MSSA doesn't). Differential Revision: https://reviews.llvm.org/D89207	2020-12-01 17:57:41 +01:00
Clement Courbet	735e6c888e	[MergeICmps] Fix missing split. We were not correctly splitting a blocks for chains of length 1. Before that change, additional instructions for blocks in chains of length 1 were not split off from the block before removing (this was done correctly for chains of longer size). If this first block contained an instruction referenced elsewhere, deleting the block, would result in invalidation of the produced value. This caused a miscompile which motivated D92297 (before D17993, nonnull and dereferenceable attributed were not added so MergeICmps were not triggered.) The new test gep-references-bb.ll demonstrate the issue. The regression was introduced in rG0efadbbcdeb82f5c14f38fbc2826107063ca48b2. This supersedes D92364. Test case by MaskRay (Fangrui Song). Differential Revision: https://reviews.llvm.org/D92375	2020-12-01 16:50:55 +01:00
Sanjay Patel	9f60b8b3d2	[InstCombine] canonicalize sign-bit-shift of difference to ext(icmp) icmp is the preferred spelling in IR because icmp analysis is expected to be better than any other analysis. This should lead to more follow-on folding potential. It's difficult to say exactly what we should do in codegen to compensate. For example on AArch64, which of these is preferred: sub w8, w0, w1 lsr w0, w8, #31 vs: cmp w0, w1 cset w0, lt If there are perf regressions, then we should deal with those in codegen on a case-by-case basis. A possible motivating example for better optimization is shown in: https://llvm.org/PR43198 but that will require other transforms before anything changes there. Alive proof: https://rise4fun.com/Alive/o4E Name: sign-bit splat Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = ashr %s, C1 => %c = icmp slt %x, %y %r = sext %c Name: sign-bit LSB Pre: C1 == (width(%x) - 1) %s = sub nsw %x, %y %r = lshr %s, C1 => %c = icmp slt %x, %y %r = zext %c	2020-12-01 09:58:11 -05:00
Florian Hahn	7a4f1d59b8	[ConstraintElimination] Decompose GEP %ptr, ZEXT(SHL()). Add support to decompose a GEP with a ZEXT(SHL()) operand.	2020-12-01 14:23:21 +00:00
Bhramar Vatsa	fd679107d6	[InstCombine] Optimize away the unnecessary multi-use sign-extend C.f. https://bugs.llvm.org/show_bug.cgi?id=47765 Added a case for handling the sign-extend (Shl+AShr) for multiple uses, to optimize it away for an individual use, when the demanded bits aren't affected by sign-extend. https://rise4fun.com/Alive/lgf Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D91343	2020-12-01 16:54:00 +03:00
Roman Lebedev	94ead0190f	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold, 2 If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 16:54:00 +03:00
Roman Lebedev	52533b52b8	Revert "[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold" It seems i have missed checklines, temporairly reverting, will reland momentairly.. This reverts commit `aa1aa13509`.	2020-12-01 15:47:04 +03:00
Roman Lebedev	aa1aa13509	[InstCombine] Improve vector undef handling for sext(ashr(shl(trunc()))) fold If the shift amount was undef for some lane, the shift amount in opposite shift is irrelevant for that lane, and the new shift amount for that lane can be undef.	2020-12-01 15:13:08 +03:00
Roman Lebedev	8e29e20e0d	[InstCombine] Evaluate new shift amount for sext(ashr(shl(trunc()))) fold in wide type (PR48343) It is not correct to compute that new shift amount in it's narrow type and only then extend it into the wide type: ---------------------------------------- Optimization: PR48343 good Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sext %Y %n1 = sub width(%o0), %n0 %n2 = sub width(%X), %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 2016 Optimization is correct! ---------------------------------------- Optimization: PR48343 bad Precondition: (width(%X) == width(%r)) %o0 = trunc %X %o1 = shl %o0, %Y %o2 = ashr %o1, %Y %r = sext %o2 => %n0 = sub width(%o0), %Y %n1 = sub width(%X), %n0 %n2 = sext %n1 %n3 = shl %X, %n2 %r = ashr %n3, %n2 Done: 1 ERROR: Domain of definedness of Target is smaller than Source's for i9 %r Example: %X i9 = 0x000 (0) %Y i4 = 0x3 (3) %o0 i4 = 0x0 (0) %o1 i4 = 0x0 (0) %o2 i4 = 0x0 (0) %n0 i4 = 0x1 (1) %n1 i4 = 0x8 (8, -8) %n2 i9 = 0x1F8 (504, -8) %n3 i9 = 0x000 (0) Source value: 0x000 (0) Target value: undef I.e. we should be computing it in the wide type from the beginning. Fixes https://bugs.llvm.org/show_bug.cgi?id=48343	2020-12-01 15:13:07 +03:00
Roman Lebedev	15f8060f6f	[SimplifyCFG] FoldBranchToCommonDest: don't require that cmp of br is last instruction There is no correctness need for that, and since we allow live-out uses, this could theoretically happen, because currently nothing will move the cond to right before the branch in those tests. But regardless, lifting that restriction even makes the transform easier to understand. This makes the transform happen in 81 more cases (+0.55%) )	2020-12-01 15:13:06 +03:00
Cullen Rhodes	cba4accda0	[LV] Clamp VF hint when unsafe In the following loop the dependence distance is 2 and can only be vectorized if the vector length is no larger than this. void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(4) for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } However, when specifying a VF of 4 via a loop hint this loop is vectorized. According to [1][2], loop hints are ignored if the optimization is not safe to apply. This patch introduces a check to bail of vectorization if the user specified VF is greater than the maximum feasible VF, unless explicitly forced with '-force-vector-width=X'. [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave [2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations Reviewed By: sdesmalen, fhahn, Meinersbur Differential Revision: https://reviews.llvm.org/D90687	2020-12-01 11:30:34 +00:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Florian Hahn	efa9728a50	[ConstraintElimination] Decompose GEP %ptr, SHL(). Add support the decompose a GEP with an SHL operand.	2020-12-01 10:58:36 +00:00
Sjoerd Meijer	f44ba25135	ExtractValue instruction costs Instruction ExtractValue wasn't handled in LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled as a mul which is not really accurate. Since it is free (most of the times), this now gets a cost of 0 using getInstructionCost. This is a follow-up of D92208, that required changing this regression test. In a follow up I will look at InsertValue which also isn't handled yet. Differential Revision: https://reviews.llvm.org/D92317	2020-12-01 10:42:23 +00:00
Greg Parker	bcc802fa36	[DSE] Remove a redundant call to getLocForWriteEx() Differential Revision: https://reviews.llvm.org/D92263	2020-11-30 21:12:24 -08:00
Mircea Trofin	5fe10263ab	[llvm][inliner] Reuse the inliner pass to implement 'always inliner' Enable performing mandatory inlinings upfront, by reusing the same logic as the full inliner, instead of the AlwaysInliner. This has the following benefits: - reduce code duplication - one inliner codebase - open the opportunity to help the full inliner by performing additional function passes after the mandatory inlinings, but before th full inliner. Performing the mandatory inlinings first simplifies the problem the full inliner needs to solve: less call sites, more contextualization, and, depending on the additional function optimization passes run between the 2 inliners, higher accuracy of cost models / decision policies. Note that this patch does not yet enable much in terms of post-always inline function optimization. Differential Revision: https://reviews.llvm.org/D91567	2020-11-30 12:03:39 -08:00
Hongtao Yu	64fa8cce22	[CSSPGO] Pseudo probe instrumentation pass This change introduces a pseudo probe instrumentation pass for block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86499	2020-11-30 10:16:54 -08:00
Florian Hahn	fe83adb05a	[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC). VPPredInstPHIRecipe is one of the recipes that was missed during the initial conversion. This patch adjusts the recipe to also manage its operand using VPUser.	2020-11-30 13:09:58 +00:00
Roman Lebedev	b0e9b7c59f	[NFC][SimplifyCFG] Add STATISTIC() to the FoldValueComparisonIntoPredecessors() fold	2020-11-30 12:27:16 +03:00
Max Kazantsev	0c9c6ddf17	[IndVars] ICmpInst should not prevent IV widening If we decided to widen IV with zext, then unsigned comparisons should not prevent widening (same for sext/sign comparisons). The result of comparison in wider type does not change in this case. Differential Revision: https://reviews.llvm.org/D92207 Reviewed By: nikic	2020-11-30 10:51:31 +07:00
Fangrui Song	5408fdcd78	[VPlan] Fix -Wunused-variable after `a813090072`	2020-11-29 10:38:01 -08:00
Florian Hahn	4bc9b909d7	[VPlan] Use VPValue and VPUser ops to print VPReplicateRecipe.	2020-11-29 18:28:27 +00:00
Florian Hahn	a813090072	[VPlan] Manage stored values of interleave groups using VPUser (NFC) Interleave groups also depend on the values they store. Manage the stored values as VPUser operands. This is currently a NFC, but is required to allow VPlan transforms and to manage generated vector values exclusively in VPTransformState.	2020-11-29 17:24:36 +00:00
Andrew Litteken	a8a43b6338	Revert "[IRSim][IROutliner] Adding the extraction basics for the IROutliner." Reverting commit due to address sanitizer errors. > Extracting the similar regions is the first step in the IROutliner. > > Using the IRSimilarityIdentifier, we collect the SimilarityGroups and > sort them by how many instructions will be removed. Each > IRSimilarityCandidate is used to define an OutlinableRegion. Each > region is ordered by their occurrence in the Module and the regions that > are not compatible with previously outlined regions are discarded. > > Each region is then extracted with the CodeExtractor into its own > function. > > We test that correctly extract in: > test/Transforms/IROutliner/extraction.ll > test/Transforms/IROutliner/address-taken.ll > test/Transforms/IROutliner/outlining-same-globals.ll > test/Transforms/IROutliner/outlining-same-constants.ll > test/Transforms/IROutliner/outlining-different-structure.ll > > Reviewers: paquette, jroelofs, yroux > > Differential Revision: https://reviews.llvm.org/D86975 This reverts commit `bf899e8913`.	2020-11-27 19:55:57 -06:00
Andrew Litteken	bf899e8913	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-11-27 19:08:29 -06:00
Florian Hahn	ae008798a4	[VPlan] Use VPTransformState::set in widenGEP. This patch updates widenGEP to manage the resulting vector values using the VPValue of VPWidenGEP recipe.	2020-11-27 17:01:55 +00:00
Francesco Petrogalli	8e0148dff7	[AllocaInst] Update `getAllocationSizeInBits` to return `TypeSize`. Reviewed By: peterwaller-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D92020	2020-11-27 16:39:10 +00:00
Sjoerd Meijer	10ad64aa3b	[SLP] Dump Tree costs. NFC. This adds LLVM_DEBUG messages to dump the (intermediate) tree cost calculations, which is useful to trace and see how the final cost is calculated.	2020-11-27 11:37:33 +00:00
Roman Lebedev	b33fbbaa34	Reland [SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions This was orginally committed in `2245fb8aaa`. but was immediately reverted in `f3abd54958` because of a PHI handling issue. Original commit message: 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-27 12:47:15 +03:00
Wang, Pengfei	8dcf8d1da5	[msan] Fix bugs when instrument x86.avx512_cvt intrinsics. Scalar intrinsics x86.avx512_cvt have an extra rounding mode operand. We can directly ignore it to reuse the SSE/AVX math. This fix the bug https://bugs.llvm.org/show_bug.cgi?id=48298. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92206	2020-11-27 16:33:14 +08:00
Markus Lavin	808fcfe594	Revert "[DebugInfo] Improve dbg preservation in LSR." This reverts commit `06758c6a61`. Bug: https://bugs.llvm.org/show_bug.cgi?id=48166 Additional discussion in: https://reviews.llvm.org/D91711	2020-11-27 08:52:32 +01:00
Max Kazantsev	faf183874c	[IndVars] LCSSA Phi users should not prevent widening When widening an IndVar that has LCSSA Phi users outside the loop, we can safely widen it as usual and then truncate the result outside the loop without hurting the performance. Differential Revision: https://reviews.llvm.org/D91593 Reviewed By: skatkov	2020-11-27 11:19:54 +07:00
Roman Lebedev	f3abd54958	Revert "[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions" Many bots are unhappy, at the very least missed a few codegen tests, and possibly this has a logic hole inducing a miscompile (will be really awesome to have ready reproducer..) Need to investigate. This reverts commit `2245fb8aaa`.	2020-11-26 23:13:43 +03:00
Roman Lebedev	2245fb8aaa	[SimplifyCFG] FoldBranchToCommonDest: lift use-restriction on bonus instructions 1. It doesn't make sense to enforce that the bonus instruction is only used once in it's basic block. What matters is whether those user instructions fit within our budget, sure, but that is another question. 2. It doesn't make sense to enforce that said bonus instructions are only used within their basic block. Perhaps the branch condition isn't using the value computed by said bonus instruction, and said bonus instruction is simply being calculated to be used in successors? So iff we can clone bonus instructions, to lift these restrictions, we just need to carefully update their external uses to use the new cloned instructions. Notably, this transform (even without this change) appears to be poison-unsafe as per alive2, but is otherwise (including the patch) legal. We don't introduce any new PHI nodes, but only "move" the instructions around, i'm not really seeing much potential for extra cost modelling for the transform, especially since now we allow at most one such bonus instruction by default. This causes the fold to fire +11.4% more (13216 -> 14725) as of vanilla llvm test-suite + RawSpeed. The motivational pattern is IEEE-754-2008 Binary16->Binary32 extension code: `ca57d77fb2/src/librawspeed/common/FloatingPoint.h (L115-L120)` ^ that should be a switch, but it is not now: https://godbolt.org/z/bvja5v That being said, even thought this seemed like this would fix it: https://godbolt.org/z/xGq3TM apparently that fold is happening somewhere else afterall, so something else also has a similar 'artificial' restriction.	2020-11-26 22:51:22 +03:00
Roman Lebedev	65db7d38e0	[NFC][SimplifyCFG] Add statistic to `FoldBranchToCommonDest()` fold	2020-11-26 22:51:21 +03:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Florian Hahn	bd0b1311db	[VPlan] Turn VPReplicateRecipe into a VPValue. Update VPReplicateRecipe to inherit from VPValue. This still does not update scalarizeInstruction to set the result for the VPValue of VPReplicateRecipe, because this first requires tracking scalar values in VPTransformState. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D91500	2020-11-26 13:50:24 +00:00
David Stenberg	384996f9e1	[IndVarSimplify] Fix Modified status when handling dead PHI nodes When bailing out in rewriteLoopExitValues() you could be left with PHI nodes in the DeadInsts vector. Those would be not handled by the use of RecursivelyDeleteTriviallyDeadInstructions() in IndVarSimplify. This resulted in the IndVarSimplify pass returning an incorrect modified status. This was caught by the expensive check introduced in D86589. This patches changes IndVarSimplify so that it deletes those PHI nodes, using RecursivelyDeleteDeadPHINode(). This fixes PR47486. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91153	2020-11-26 14:28:21 +01:00
Zhengyang Liu	345fcccb33	Fix use-of-uninitialized-value in rG75f50e15bf8f Differential Revision: https://reviews.llvm.org/D71126	2020-11-26 01:39:22 -07:00
Max Kazantsev	664e1da485	[LoopLoadElim] Make sure all loops are in simplify form. PR48150 LoopLoadElim may end up expanding an AddRec from a loop which is not the current loop. This loop may not be in simplify form. We figure it out after the no-return point, so cannot bail in this case. AddRec requires simplify form to expand. The only way to ensure this does not crash is to simplify all loops beforehand. The issue only exists in new PM. Old PM requests LoopSimplify required pass and it simplifies all loops before the opt begins. Differential Revision: https://reviews.llvm.org/D91525 Reviewed By: asbirlea, aeubanks	2020-11-26 10:51:11 +07:00
Roman Lebedev	a8d74517dc	[PassManager] Run Induction Variable Simplification pass after Recognize loop idioms pass, not before Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does. I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand, but i'm very sure that `-indvars` requires `-loop-idiom` to run afterwards, as it can be seen in the phase-ordering test. LoopIdiom runs on two types of loops: countable ones, and uncountable ones. For uncountable ones, IndVars obviously didn't make any change to them, since they are uncountable, so for them the order should be irrelevant. For countable ones, well, they should have been countable before IndVars for IndVars to make any change to them, and since SCEV is used on them, it shouldn't matter if IndVars have already canonicalized them. So i don't really see why we'd want the current ordering. Should this cause issues, it will give us a reproducer test case that shows flaws in this logic, and we then could adjust accordingly. While this is quite likely beneficial in-the-wild already, it's a required part for the full motivational pattern behind `left-shift-until-bittest` loop idiom (D91038). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91800	2020-11-25 19:20:07 +03:00
Cullen Rhodes	1ba4b82f67	[LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits MaxSafeRegisterWidth is a misnomer since it actually returns the maximum safe vector width. Register suggests it relates directly to a physical register where it could be a vector spanning one or more physical registers. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91727	2020-11-25 13:06:26 +00:00
Florian Hahn	ad5b83ddcf	[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs. This is a follow-up to `00a6601136` to make isa<VPReductionRecipe> work and unifies the VPValue ID names, by making sure they all consistently start with VPV*.	2020-11-25 11:08:25 +00:00
David Green	e0c479cd0e	[VPlan] Switch VPWidenRecipe to be a VPValue Similar to other patches, this makes VPWidenRecipe a VPValue. Because of the way it interacts with the reduction code it also slightly alters the way that VPValues are registered, removing the up front NeedDef and using getOrAddVPValue to create them on-demand if needed instead. Differential Revision: https://reviews.llvm.org/D88447	2020-11-25 08:25:06 +00:00
David Green	00a6601136	[VPlan] Turn VPReductionRecipe into a VPValue This converts the VPReductionRecipe into a VPValue, like other VPRecipe's in preparation for traversing def-use chains. It also makes it a VPUser, now storing the used VPValues as operands. It doesn't yet change how the VPReductionRecipes are created. It will need to call replaceAllUsesWith from the original recipe they replace, but that is not done yet as VPWidenRecipe need to be created first. Differential Revision: https://reviews.llvm.org/D88382	2020-11-25 08:25:05 +00:00
Kazu Hirata	1c82d32089	[CHR] Use pred_size (NFC)	2020-11-24 22:52:30 -08:00
Max Kazantsev	28d7ba1543	[IndVars] Use more precise context when eliminating narrowing When deciding to widen narrow use, we may need to prove some facts about it. For proof, the context is used. Currently we take the instruction being widened as the context. However, we may be more precise here if we take as context the point that dominates all users of instruction being widened. Differential Revision: https://reviews.llvm.org/D90456 Reviewed By: skatkov	2020-11-25 11:47:39 +07:00
Philip Reames	10ddb927c1	[SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC] Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.	2020-11-24 18:47:49 -08:00
Sanjay Patel	678b9c5dde	[InstCombine] try difference-of-shifts factorization before negator We need to preserve wrapping flags to allow better folds. The cases with geps may be non-intuitive, but that appears to agree with Alive2: https://alive2.llvm.org/ce/z/JQcqw7 We create 'nsw' ops independent from the original wrapping on the sub.	2020-11-24 13:56:30 -05:00
Philip Reames	075468621c	[LoopVec] Add a minor clarifying comment	2020-11-24 10:45:06 -08:00
Teresa Johnson	6e4c1cf293	[ThinLTO/WPD] Enable -wholeprogramdevirt-skip in ThinLTO backends Previously this option could be used to skip devirtualizations of the given functions in regular LTO and in the ThinLTO indexing step. This change allows them to be skipped in the backend as well, which is useful when debugging WPD in a distributed ThinLTO backend. Differential Revision: https://reviews.llvm.org/D91812	2020-11-24 09:35:07 -08:00
Ayal Zaks	32d9a386bf	[LV] Keep Primary Induction alive when folding tail by masking Fix PR47390. The primary induction should be considered alive when folding tail by masking, because it will be used by said masking; even when it may otherwise appear useless: feeding only its own 'bump', which is correctly considered dead, and as the 'bump' of another induction variable, which may wrongfully want to consider its bump = the primary induction, dead. Differential Revision: https://reviews.llvm.org/D92017	2020-11-24 15:12:54 +02:00
Arthur Eubanks	932e4f8815	[FunctionAttrs][NPM] Fix handling of convergent The legacy pass didn't properly detect indirect calls. We can still remove the convergent attribute when there are indirect calls. The LangRef says: > When it appears on a call/invoke, the convergent attribute indicates that we should treat the call as though we’re calling a convergent function. This is particularly useful on indirect calls; without this we may treat such calls as though the target is non-convergent. So don't skip handling of convergent when there are unknown calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89826	2020-11-23 21:09:41 -08:00
Philip Reames	1a9c72f8a8	[LoopVec] Reuse a lambda [NFC] Minor code refactor to improve readability.	2020-11-23 21:07:34 -08:00
Philip Reames	b06a2ad94f	[LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE) A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane. This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic. In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs. The discussion on that item remains unsettled and is pending larger architectural discussion. We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled. Differential Revision: https://reviews.llvm.org/D91398	2020-11-23 15:32:17 -08:00
Sanjay Patel	ab29f091eb	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps This is a retry of `324a53205`. I cautiously reverted that at `6aa3fc4` because the rules about gep math were not clear. Since then, we have added this line to LangRef for gep inbounds: "The successive addition of offsets (without adding the base address) does not wrap the pointer index type in a signed sense (nsw)." See D90708 and post-commit comments on the revert patch for more details.	2020-11-23 16:50:09 -05:00
Arthur Eubanks	3c811ce4f3	[NPM] Share pass building options with legacy PM We should share options when possible. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91741	2020-11-23 13:04:05 -08:00
Sjoerd Meijer	33b2c88fa8	[LoopFlatten] Widen IV, support ZExt. I disabled the widening in `fa5cb4b` because it run in an assert, which was related to replacing values with different types. I forgot that an extend could also be a zero-extend, which I have added now. This means that the approach now is to create and insert a trunc value of the outerloop for each user, and use that to replace IV values. Differential Revision: https://reviews.llvm.org/D91690	2020-11-23 08:57:19 +00:00
Kazu Hirata	df73b8c174	[ValueMapper] Remove unused declaration remapFunction (NFC) The function declaration with two parameters was introduced on Apr 16 2016 in commit `f0d73f95c1` without a corresponding definition.	2020-11-22 21:52:03 -08:00
Kazu Hirata	186d129320	[hwasan] Remove unused declaration shadowBase (NFC) The function was introduced on Jan 23, 2019 in commit `73078ecd38`. Its definition was removed on Oct 27, 2020 in commit `0930763b4b`, leaving the declaration unused.	2020-11-22 20:08:51 -08:00
Kazu Hirata	def7cfb7ff	[InstCombine] Use is_contained (NFC)	2020-11-21 15:47:11 -08:00
Alexey Bataev	0b420d674a	[SLP][NFC]Fix assert condition in newTreeEntry, NFC.	2020-11-20 13:25:21 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Jamie Schmeiser	7f6360cdc6	Reland: Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug. Summary: Expand existing loopsink testing to also test loopsinking using new pass manager. Enable memoryssa for loopsink with new pass manager. This combination exposed a bug that was previously fixed for loopsink without memoryssa. When sinking an instruction into a loop, the source block may not be part of the loop but still needs to be checked for pointer invalidation. This is the fix for bugzilla #39695 (PR 54659) expanded to also work with memoryssa. Respond to review comments. Enable Memory SSA in legacy Loop Sink pass under EnableMSSALoopDependency option control. Update tests accordingly. Respond to review comments. Add options controlling whether memoryssa is used for loop sink, defaulting to off. Expand testing based on these options. Respond to review comments. Properly indicated preserved analyses. This relanding addresses a compile-time performance problem by moving test for profile data earlier to avoid unnecessary computations. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90249	2020-11-20 10:26:33 -05:00
Arthur Eubanks	b77436047a	[PGO] Make -disable-preinline work with NPM Fixes cspgo_profile_summary.ll under NPM. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D91826	2020-11-19 22:58:55 -08:00
Arthur Eubanks	513d165b80	Port -lower-matrix-intrinsics-minimal to NPM This reuses the existing lower-matrix-intrinsics pass rather than going the legacy pass route of creating a new pass. Use this new variant in the NPM -O0 pipeline. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91811	2020-11-19 17:42:48 -08:00
Florian Hahn	7fa14a7c69	[ConstraintElimination] Decompose GEP with arbitrary offsets. This patch decomposes `GEP %x, %offset` as 0 + 1 * %x + 1 * %off.	2020-11-19 22:49:21 +00:00
Geoffrey Martin-Noble	b156514f8d	Remove unused private fields Unused since https://reviews.llvm.org/D91762 and triggering -Wunused-private-field ``` llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field] Constant GetArgTLS; ^ llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field] Constant GetRetvalTLS; ``` Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D91820	2020-11-19 13:54:54 -08:00
Roman Lebedev	a91e96702a	[InstCombine] Fold `and(shl(zext(x), width(SIGNMASK) - width(%x)), SIGNMASK)` to `and(sext(%x), SIGNMASK)` One less instruction and reducing use count of zext. As alive2 confirms, we're fine with all the weird combinations of undef elts in constants, but unless the shift amount was undef for a lane, we must sanitize undef mask to zero, since sign bits are no longer zeros. https://rise4fun.com/Alive/d7r ``` ---------------------------------------- Optimization: zz Precondition: ((C1 == (width(%r) - width(%x))) && isSignBit(C2)) %o0 = zext %x %o1 = shl %o0, C1 %r = and %o1, C2 => %n0 = sext %x %r = and %n0, C2 Done: 2016 Optimization is correct! ```	2020-11-20 00:31:27 +03:00
Jianzhou Zhao	6c1c308c0e	Remove deadcode from DFSanFunction::getTLS() clean more deadcode after D84704 Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91762	2020-11-19 21:10:37 +00:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Sander de Smalen	41c9f4c1ce	[LoopVectorize] NFC: Fix unused variable warning for MaxSafeDepDist rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused variable for MaxSafeDepDist (written but not used). It seems this variable and assignment can be safely removed.	2020-11-19 17:41:35 +00:00
Joseph Huber	da8bec47ab	[OpenMP] Add Location Fields to Libomptarget Runtime for Debugging Summary: Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values. Reviewers: jdoerfert grokos Differential Revision: https://reviews.llvm.org/D87946	2020-11-19 12:01:53 -05:00
Simon Moll	a1de391dae	[LV][NFC-ish] Allow vector widths over 256 elements The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91518	2020-11-19 10:58:29 +01:00
Max Kazantsev	515105f46b	[NFC] Remove comment (commited ahead of time by mistake)	2020-11-19 16:28:34 +07:00
Max Kazantsev	7c601d09a7	[NFC] Move code earlier as preparation for further changes	2020-11-19 16:27:23 +07:00
Andrew Wei	ea7ab5a42c	[IndVarSimplify] Notify top most loop to drop cached exit counts Some nested loops may share the same ExitingBB, so after we finishing FoldExit, we need to notify OuterLoop and SCEV to drop any stored trip count. Patched by: guopeilin Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D91325	2020-11-19 15:37:54 +08:00
Kazu Hirata	43c0e4f665	[Transforms] Use llvm::is_contained (NFC)	2020-11-18 20:42:22 -08:00
Jamie Schmeiser	cff479b145	Revert "Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug.""" This reverts commit `e29292969b`. This apparently causes a regression in compile time (ie, it slows down).	2020-11-18 16:07:16 -05:00
Roman Lebedev	7bf89c2174	[NFC][Reassociate] Delay checking isLoadCombineCandidate() until after ShouldConvertOrWithNoCommonBitsToAdd() but before haveNoCommonBitsSet() This appears to improve -O3 compile-time performance somewhat: https://llvm-compile-time-tracker.com/compare.php?from=87369c626114ae17f4c637635c119e6de0856a9a&to=c04b8271e1609b0dfb20609b40844b0c4324517e&stat=instructions It doesn't look like delaying it until after haveNoCommonBitsSet() is better: https://llvm-compile-time-tracker.com/compare.php?from=c04b8271e1609b0dfb20609b40844b0c4324517e&to=b2943d450eaf41b5f76d2dc7350f0a279f64cd99&stat=instructions	2020-11-18 23:57:12 +03:00
Jamie Schmeiser	e29292969b	Revert "Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug."" This reverts commit `562addba65`. Reverted change too quickly, the failing test cases passed on the next build. So reverting revert (to include the changes).	2020-11-18 15:33:02 -05:00
Florian Hahn	2fead1ac61	[ConstraintElimination] Decompose add nuw/sub nuw. Make use of the more flexible constraint handling added in `a8a79c9069` to decompose add nuw/sub nuw.	2020-11-18 20:29:30 +00:00
Joseph Huber	97e55cfef5	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;" Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D89802	2020-11-18 15:28:39 -05:00
Nikita Popov	f4a3969bff	[Inline] Fix incorrectly dropped noalias metadata This is the same fix as `23aeadb89d`, just for CloneScopedAliasMetadata rather than PropagateCallSiteMetadata. In this case the previous outcome was incorrectly dropped metadata, as it was not part of the computed metadata map. The real change in the test is that the first load now retains metadata, the rest of the changes are due to changes in metadata numbering.	2020-11-18 21:22:50 +01:00
Jamie Schmeiser	562addba65	Revert "Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug." This reverts commit `d4ba28bddc`.	2020-11-18 15:17:53 -05:00
Nikita Popov	23aeadb89d	[Inline] Fix incorrect noalias metadata application (PR48209) The VMap also contains a mapping from Argument => Instruction, where the instruction is part of the original function, not the inlined one. The code was assuming that all the instructions in the VMap were inlined. This was a pre-existing problem for the loop access metadata, but was extended to the more common noalias metadata by `27f647d117`, thus causing miscompiles. There is a similar assumption inside CloneAliasScopeMetadata(), so that one likely needs to be fixed as well.	2020-11-18 20:52:58 +01:00
Jamie Schmeiser	d4ba28bddc	Expand existing loopsink testing to also test loopsinking using new pass manager and fix LICM bug. Summary: Expand existing loopsink testing to also test loopsinking using new pass manager. Enable memoryssa for loopsink with new pass manager. This combination exposed a bug that was previously fixed for loopsink without memoryssa. When sinking an instruction into a loop, the source block may not be part of the loop but still needs to be checked for pointer invalidation. This is the fix for bugzilla #39695 (PR 54659) expanded to also work with memoryssa. Respond to review comments. Enable Memory SSA in legacy Loop Sink pass under EnableMSSALoopDependency option control. Update tests accordingly. Respond to review comments. Add options controlling whether memoryssa is used for loop sink, defaulting to off. Expand testing based on these options. Respond to review comments. Properly indicated preserved analyses. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90249	2020-11-18 14:08:42 -05:00
Piotr Sobczak	b3b9be4ae7	SpeculativeExecution: Allow speculating more instruction types Support more instructions in SpeculativeExecution pass: - ExtractValue - InsertValue - Trunc - Freeze Differential Revision: https://reviews.llvm.org/D91688	2020-11-18 17:00:19 +01:00
Roman Lebedev	34ff90ad5d	[Reassociate] Don't convert add-like-or's into add's if they appear to be part of load-combining idiom As Wei Mi is reporting in post-commit review https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201116/853479.html teaching -reassociate about add-like-or's (`70472f3`) results in breaking apart load widening patterns, and reassociating them. For now, simply exclude any such `or` that appears to be a root of load widening idiom from the or->add transformation. Note that the heuristic is greedy, it doesn't ensure that loads can actually be widened into a single load.	2020-11-18 17:55:02 +03:00
Florian Hahn	a8a79c9069	[ConstraintElimination] Refactor constraint extraction (NFC). This patch generalizes the extraction of a constraint for a given condition. It allows decompose to return a vector of c * X pairs, which allows de-composing multiple instructions in the future. It also adds more clarifying comments.	2020-11-18 13:59:18 +00:00
Benjamin Kramer	4dbe12e866	[SLP] Use the minimum alignment of the load bundle when forming a masked.gather Instead of the first load. That works when vectorizing contiguous loads, but not for gathers. Fixes a miscompile introduced in `fcad8d3635`.	2020-11-18 12:53:39 +01:00
Max Kazantsev	f33118c61c	[IndVars] Support different types of ExitCount when optimizing exit conds In some cases we can handle IV and iter count of different types. It's a typical situation after IV have been widened. This patch adds support for such cases, when legal. Differential Revision: https://reviews.llvm.org/D88528 Reviewed By: skatkov	2020-11-18 18:20:05 +07:00
Piotr Sobczak	c173f1b8eb	SpeculativeExecution: Allow speculating more instruction types Support more instructions in SpeculativeExecution pass: - ExtractElement - InsertElement - ShuffleVector Differential Revision: https://reviews.llvm.org/D91633	2020-11-18 09:46:43 +01:00
Arthur Eubanks	9e3b4f4941	[JumpThreading] Make -print-lvi-after-jump-threading work with NPM	2020-11-17 23:15:20 -08:00
Arthur Eubanks	ee7d315cd9	[DCE] Always get TargetLibraryInfo I don't see any reason not to unconditionally retrieve TLI, it's fairly cheap. Fixes calls-errno.ll under NPM. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91476	2020-11-17 20:41:05 -08:00
Nick Desaulniers	f4c6080ab8	Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch" This reverts commit `b7926ce6d7`. Going with a simpler approach.	2020-11-17 17:27:14 -08:00
Sanjay Patel	08834979e3	[SLP] avoid unreachable code crash/infloop Example based on the post-commit comments for D88735.	2020-11-17 15:10:23 -05:00
Sanjay Patel	4a66a1d17a	[InstCombine] allow vectors for masked-add -> xor fold https://rise4fun.com/Alive/I4Ge Name: add with pow2 mask Pre: isPowerOf2(C2) && (C1 & C2) != 0 && (C1 & (C2-1)) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %n = and i8 %x, C2 %r = xor i8 %n, C2	2020-11-17 13:36:08 -05:00
Simon Pilgrim	f7ebdec987	[InstCombine] visitAnd - remove unnecessary Value X, Y shadow variables. NFCI. Fixes a number of Wshadow warnings.	2020-11-17 17:59:21 +00:00
Simon Pilgrim	abf29d9862	[InstCombine] visitAnd - use m_SpecificInt instead of m_APInt + comparison. NFCI. m_SpecificInt has the same 'no undef element' behaviour as m_APInt so no change there, and anyway we have test coverage for undef elements in the fold. Noticed while fixing a Wshadow warning about shadow Value X, Y variables.	2020-11-17 17:37:10 +00:00
Sanjay Patel	f791ad7e1e	[InstCombine] remove scalar constraint for mask-of-add fold https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2	2020-11-17 12:13:45 -05:00
Sanjay Patel	433696911a	[InstCombine] relax constraints on mask-of-add There are 2 changes: 1. Remove the unnecessary one-use check. 2. Remove the unnecessary power-of-2 check. https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2	2020-11-17 12:13:44 -05:00
Florian Hahn	52f3714dae	[VPlan] Add VPDef class. This patch introduces a new VPDef class, which can be used to manage VPValues defined by recipes/VPInstructions. The idea here is to mirror VPUser for values defined by a recipe. A VPDef can produce either zero (e.g. a store recipe), one (most recipes) or multiple (VPInterleaveRecipe) result VPValues. To traverse the def-use chain from a VPDef to its users, one has to traverse the users of all values defined by a VPDef. VPValues now contain a pointer to their corresponding VPDef, if one exists. To traverse the def-use chain upwards from a VPValue, we first need to check if the VPValue is defined by a VPDef. If it does not have a VPDef, this means we have a VPValue that is not directly defined iniside the plan and we are done. If we have a VPDef, it is defined inside the region by a recipe, which is a VPUser, and the upwards def-use chain traversal continues by traversing all its operands. Note that we need to add an additional field to to VPVAlue to link them to their defs. The space increase is going to be offset by being able to remove the SubclassID field in future patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D90558	2020-11-17 16:18:11 +00:00
Matt Arsenault	c5ce6036c1	Linker: Fix linking of byref types This wasn't properly remapping the type like with the other attributes, so this would end up hitting a verifier error after linking different modules using byref.	2020-11-17 11:02:04 -05:00
Anton Afanasyev	0a1d315f9f	[SLPVectorizer] Fix assert	2020-11-17 18:46:31 +03:00
Anton Afanasyev	fcad8d3635	[SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic For the scattered operands of load instructions it makes sense to use gathering load intrinsic, which can lower to native instruction for X86/AVX512 and ARM/SVE. This also enables building vectorization tree with entries containing scattered operands. The next step is to add scattered store. Fixes PR47629 and PR47623 Differential Revision: https://reviews.llvm.org/D90445	2020-11-17 18:11:45 +03:00
Florian Hahn	13042da5cb	[ConstraintElimination] Add support for And. When processing conditional branches, if the condition is an AND of 2 compares and the true successor only has the current block as predecessor, queue both conditions for the true successor.	2020-11-17 14:12:15 +00:00
Sander de Smalen	f571fe6df5	Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This relands https://reviews.llvm.org/D91059 and reverts commit `30fded75b4`. GetRegUsage now returns 0 when Ty is not a valid vector element type.	2020-11-17 13:45:10 +00:00
Yevgeny Rouban	a57fe210ff	[JumpThreading] Fix branch probabilities in DuplicateCondBranchOnPHIIntoPred() When instructions are cloned from block BB to PredBB in the method DuplicateCondBranchOnPHIIntoPred() number of successors of PredBB changes from 1 to number of successors of BB. So we have to copy branch probabilities from BB to PredBB. Reviewed By: Kazu Hirata Differential Revision: https://reviews.llvm.org/D90841	2020-11-17 14:40:50 +07:00
Max Kazantsev	63dd1734b2	[NFC] Collect ext users into vector instead of finding them twice	2020-11-17 14:01:43 +07:00
Kazu Hirata	1da60f1d44	[Transforms] Use pred_empty (NFC)	2020-11-16 22:09:14 -08:00
Kazu Hirata	5935952c31	[SanitizerCoverage] Use [&] for lambdas (NFC)	2020-11-16 21:45:21 -08:00
Arthur Eubanks	7de6dcd246	[Debugify] Skip debugifying on special/immutable passes With a function pass manager, it would insert debuginfo metadata before getting to function passes while processing the pass manager, causing debugify to skip while running the function passes. Skip special passes + verifier + printing passes. Compared to the legacy implementation of -debugify-each, this additionally skips verifier passes. Probably no need to update the legacy version since it will be obsolete soon. This fixes 2 instcombine tests using -debugify-each under NPM. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D91558	2020-11-16 20:39:46 -08:00
Sjoerd Meijer	fa5cb4b936	[LoopFlatten] Disable IV widening Disable widening of the IV in LoopFlatten while I investigate an assertion failures. Please note that the pass is also not yet enabled by default.	2020-11-16 22:30:52 +00:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Florian Hahn	5a4ca8b550	[ConstraintElimination] Add support for Or. When processing conditional branches, if the condition is an OR of 2 compares and the false successor only has the current block as predecessor, queue both negated conditions for the false successor	2020-11-16 21:48:38 +00:00
Philip Reames	2240d3d054	[LoopVec] Introduce an api for detecting uniform memory ops Split off D91398 at request of reviewer.	2020-11-16 13:30:48 -08:00
Arnold Schwaighofer	d861cc0e43	[coro] Async coroutines: Make sure we can handle control flow in suspend point dispatch function Create a valid basic block with a terminator before we call InlineFunction. Differential Revision: https://reviews.llvm.org/D91547	2020-11-16 11:59:02 -08:00
Sanjay Patel	4e68bc0999	Revert "[InstCombine] add multi-use demanded bits fold for add with low-bit mask" This reverts commit `e56103d250`. There is a stage2 msan failure blamed on this commit: http://lab.llvm.org:8011/#/builders/74/builds/888/steps/9/logs/stdio	2020-11-16 14:48:09 -05:00
Arthur Eubanks	aeb0fdff35	[SimplifyCFG] Respect optforfuzzing in NPM pass Regression caused by refactoring in `cdd006eec9`. See discussion in https://reviews.llvm.org/D89917. Reviewed By: arsenm, morehouse Differential Revision: https://reviews.llvm.org/D91473	2020-11-16 09:56:37 -08:00
Xun Li	985c524001	[Coroutine] Allocas used by StoreInst does not always escape In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped. This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away. These used should be handled without conservatively marking them escaped. This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still consider the original alloca not escaping and keep it on the stack instead of putting it on the frame. Differential Revision: https://reviews.llvm.org/D91305	2020-11-16 09:14:44 -08:00
Florian Hahn	8dbe44cb29	Add pass to add !annotate metadata from @llvm.global.annotations. This patch adds a new pass to add !annotation metadata for entries in @llvm.global.anotations, which is generated using __attribute__((annotate("_name"))) on functions in Clang. This has been discussed on llvm-dev as part of RFC: Combining Annotation Metadata and Remarks http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D91195	2020-11-16 14:57:11 +00:00
Benjamin Kramer	2e7455f00a	[LoopFlatten] Fold variable into assert. NFC.	2020-11-16 11:51:39 +01:00
Sjoerd Meijer	9aa773381b	[LoopFlatten] Widen the IV Widen the IV to the widest available and legal integer type, which makes this transformations always safe so that we can skip overflow checks. Motivation is to let this pass trigger on 64-bit targets too, and this is the last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just before IndVarSimplify so that IVs are not already widened, D90421 factors out widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use it in LoopFlatten. Differential Revision: https://reviews.llvm.org/D90640	2020-11-16 10:20:13 +00:00
Max Kazantsev	b4624f65cf	Recommit "[NFC] Move code between functions as a preparation step for further improvement" The bug should be fixed now.	2020-11-16 14:30:34 +07:00
Kazu Hirata	147ccc848a	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. This patch is one of the steps before we can safely re-apply D91017. Differential Revision: https://reviews.llvm.org/D91511	2020-11-15 22:29:30 -08:00
Kazu Hirata	0888eaf3fd	[Loop Fusion] Use pred_empty and succ_empty (NFC)	2020-11-15 20:32:57 -08:00
Kazu Hirata	0c03d1328c	[ADCE] Use succ_empty (NFC)	2020-11-15 19:52:59 -08:00
Kazu Hirata	43a6a1e928	[TRE] Use successors(BB) (NFC)	2020-11-15 19:12:49 -08:00
Kazu Hirata	918e3439e2	[SanitizerCoverage] Use llvm::all_of (NFC)	2020-11-15 19:01:20 -08:00
Serguei Katkov	400f6edce7	[IRCE] Use the same min runtime iteration threshold for BPI and BFI checks In the last change to IRCE the BPI is ignored if BFI is present, however BFI and BPI have a different thresholds. Specifically BPI approach checks only latch exit probability so it is expected if the loop has only one exit block (latch) the behavior with BFI and BPI should be the same, BPI approach by default uses threshold 10, so it considers the loop with estimated number of iterations less then 10 should not be considered for IRCE optimization. BFI approach uses the default value 3 and this is inconsistent. The CL modifies the code to use the same threshold for both approaches.. The test is updated due to it has two side-exits (except latch) and each of them has a probability 1/16, so BFI estimates the number of runtime iteration is about to 7 (1/16 + 1/16 + some for latch) and test fails. Reviewers: mkazantsev, ebrevnov Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D91230	2020-11-16 09:21:50 +07:00
Sanjay Patel	6ddc237766	[InstCombine] reduce code for flip of masked bit; NFC There are 1-2 potential follow-up NFC commits to reduce this further on the way to generalizing this for vectors. The operand replacing path should be dead code because demanded bits handles that more generally (D91415).	2020-11-15 15:43:34 -05:00
Sanjay Patel	e56103d250	[InstCombine] add multi-use demanded bits fold for add with low-bit mask I noticed an add example like the one from D91343, so here's a similar patch. The logic is based on existing code for the single-use demanded bits fold. But I only matched a constant instead of using compute known bits on the operands because that was the motivating patterni that I noticed. I think this will allow removing a special-case (but incomplete) dedicated fold within visitAnd(), but I need to untangle the existing code to be sure. https://rise4fun.com/Alive/V6fP Name: add with low mask Pre: (C1 & (-1 u>> countLeadingZeros(C2))) == 0 %a = add i8 %x, C1 %r = and i8 %a, C2 => %r = and i8 %x, C2 Differential Revision: https://reviews.llvm.org/D91415	2020-11-15 15:09:49 -05:00
Florian Hahn	0c119ba8a8	[VPlan] Use VPValue def for VPWidenGEPRecipe. This patch turns VPWidenGEPRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84683	2020-11-15 15:12:47 +00:00
Arthur Eubanks	6e04da0a5a	[DCE] Port -redundant-dbg-inst-elim to NPM This is used to test RemoveRedundantDbgInstrs(), which is used by other passes. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D91477	2020-11-14 16:55:20 -08:00
Florian Hahn	a70b511e78	Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts the revert commit `c8d73d939f`. It includes a fix for cases where we missed inserting VPValues for some selects, which should fix PR48142.	2020-11-14 20:00:25 +00:00
Arnold Schwaighofer	8fb73cecfd	[Coroutines] Make sure that async coroutine context size is a multiple of the alignment requirement This simplifies the code the allocator has to executed Differential Revision: https://reviews.llvm.org/D91471	2020-11-14 04:56:56 -08:00
Roman Lebedev	6861d938e5	Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485 the implementation is known-broken for certain inputs, the bugreport was up for a significant amount of timer, and there has been no activity to address it. Therefore, just completely rip out all of misexpect handling. I suspect, fixing it requires redesigning the internals of MD_misexpect. Should anyone commit to fixing the implementation problem, starting from clean slate may be better anyways. This reverts commit `7bdad08429`, and some of it's follow-ups, that don't stand on their own.	2020-11-14 13:12:38 +03:00
Akira Hatanaka	2ed3a76745	[ObjC][ARC] Add and use a function which finds and returns the single dependency. NFC Use findSingleDependency in place of FindDependencies and stop passing a set of Instructions around. Modify FindDependencies to return a boolean flag which indicates whether the dependencies it has found are all valid.	2020-11-13 14:02:58 -08:00
Akira Hatanaka	00d0974e62	Move variable declarations to functions in which they are used. NFC	2020-11-13 14:02:58 -08:00
Guozhi Wei	a20220d25b	[AlwaysInliner] Call mergeAttributesForInlining after inlining Like inlineCallIfPossible and InlinerPass, after inlining mergeAttributesForInlining should be called to merge callee's attributes to caller. But it is not called in AlwaysInliner, causes caller's attributes inconsistent with inlined code. Attached test case demonstrates that attribute "min-legal-vector-width"="512" is not merged into caller without this patch, and it causes failure in SelectionDAG when lowering the inlined AVX512 intrinsic. Differential Revision: https://reviews.llvm.org/D91446	2020-11-13 12:01:35 -08:00
Jianzhou Zhao	06c9b4aaa9	Extend the dfsan store/load callback with write/read address This helped debugging. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91236	2020-11-13 19:46:32 +00:00
Nikita Popov	02dda1c659	[Local] Clean up EmitGEPOffset Handle the emission of the add in a single place, instead of three different ones. Don't emit an unnecessary add with zero to start with. It will get dropped by InstCombine, but we may as well not create it in the first place. This also means that InstCombine does not need to specially handle this extra add. This is conceptually NFC, but can affect worklist order etc.	2020-11-13 18:30:56 +01:00
David Zarzycki	5a327f3337	Revert "[NFC] Move code between functions as a preparation step for further improvement" This reverts commit `08016ac32b`. A bunch of tests are failing my local two stage builder.	2020-11-13 10:52:49 -05:00
serge-sans-paille	95537f4508	llvmbuildectomy - compatibility with ocaml bindings Use exact component name in add_ocaml_library. Make expand_topologically compatible with new architecture. Fix quoting in is_llvm_target_library. Fix LLVMipo component name. Write release note.	2020-11-13 14:35:52 +01:00
Florian Hahn	8bb6347939	Add !annotation metadata and remarks pass. This patch adds a new !annotation metadata kind which can be used to attach annotation strings to instructions. It also adds a new pass that emits summary remarks per function with the counts for each annotation kind. The intended uses cases for this new metadata is annotating 'interesting' instructions and the remarks should provide additional insight into transformations applied to a program. To motivate this, consider these specific questions we would like to get answered: * How many stores added for automatic variable initialization remain after optimizations? Where are they? * How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated? Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) Reviewed By: thegameg, jdoerfert Differential Revision: https://reviews.llvm.org/D91188	2020-11-13 13:24:10 +00:00
Max Kazantsev	08016ac32b	[NFC] Move code between functions as a preparation step for further improvement	2020-11-13 18:12:45 +07:00
Max Kazantsev	185cface2e	[NFC] Refactor lambda into static function	2020-11-13 17:42:23 +07:00
Max Kazantsev	68490aec4e	[NFC] Move lambdae into static functions	2020-11-13 17:07:25 +07:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Max Kazantsev	9224d322a2	[IndVars] Fix branches exiting by true with invariant conditions Forgot to invert the condition for them.	2020-11-13 15:52:00 +07:00
Max Kazantsev	0a1d394bf3	[NFC] Refactor loop-invariant getters to return Optional	2020-11-13 15:03:10 +07:00
Arthur Eubanks	b9406121a0	[NFC] Removed unused variable Obsolete as of https://reviews.llvm.org/D91046.	2020-11-12 22:24:57 -08:00
Akira Hatanaka	09266e4af0	[ObjC][ARC] Clear the lists of basic blocks and instructions before continuing the loop This fixes a bug introduced in `c6f1713c46`.	2020-11-12 22:20:02 -08:00
Max Kazantsev	77efb73c67	[IndVars] Replace checks with invariants if we cannot remove them If we cannot prove that the check is trivially true, but can prove that it either fails on the 1st iteration or never fails, we can replace it with first iteration check. Differential Revision: https://reviews.llvm.org/D88527 Reviewed By: skatkov	2020-11-13 12:23:12 +07:00
Sanjay Patel	0abde4bc92	[InstCombine] fold sub of low-bit masked value from offset of same value There might be some demanded/known bits way to generalize this, but I'm not seeing it right now. This came up as a regression when I was looking at a different demanded bits improvement. https://rise4fun.com/Alive/5fl Name: general Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0 %a1 = add i8 %x, C1 %a2 = and i8 %x, C2 %r = sub i8 %a1, %a2 => %r = and i8 %a1, ~C2 Name: test 1 %a1 = add i8 %x, 192 %a2 = and i8 %x, 10 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -11 Name: test 2 %a1 = add i8 %x, -108 %a2 = and i8 %x, 3 %r = sub i8 %a1, %a2 => %r = and i8 %a1, -4	2020-11-12 20:10:28 -05:00
Jianzhou Zhao	2d96859ea6	[msan] Break the getShadow loop after matching an argument Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D91320	2020-11-12 19:48:59 +00:00
Alexander Kornienko	76b6cb515b	Fix unused variable warning in release builds	2020-11-12 18:14:06 +01:00
Jamie Schmeiser	f79b483385	[NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals Summary: Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other classes to construct flags with meaningful defaults while not exposing LICM internal details. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D90482	2020-11-12 15:06:59 +00:00
Xun Li	94a45a8098	Revert "[Coroutine] Allocas used by StoreInst does not always escape" This reverts commit `8bc7b9278e`, which landed by accident.	2020-11-11 21:09:39 -08:00
Max Kazantsev	d6dd938589	[IndVars] IV user should not prevent use widening Sometimes the an instruction we are trying to widen is used by the IV (which means the instruction is the IV increment). Currently this may prevent its widening. We should ignore such user because it will be dead once the transform is done anyways. Differential Revision: https://reviews.llvm.org/D90920 Reviewed By: fhahn	2020-11-12 12:02:01 +07:00
Xun Li	8bc7b9278e	[Coroutine] Allocas used by StoreInst does not always escape In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped. This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away. These used should be handled without conservatively marking them escaped. This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still consider the original alloca not escaping and keep it on the stack instead of putting it on the frame. Differential Revision: https://reviews.llvm.org/D91305	2020-11-11 20:53:51 -08:00
Max Kazantsev	2e01ceafaa	[IndVars] Recognize 'sub nuw' expressed as 'add' for widening InstCombine canonicalizes 'sub nuw' instructions to 'add' without the `nuw` flag. The typical case where we see it is decrementing induction variables. For them, IndVars fails to prove that it's legal to widen them, and inserts unprofitable `zext`'s. This patch adds recognition of such pattern using SCEV. Differential Revision: https://reviews.llvm.org/D89550 Reviewed By: fhahn, skatkov	2020-11-12 10:51:29 +07:00
Arnold Schwaighofer	431337662e	[coro] Async coroutines: Allow more than 3 arguments in the dispatch function We need to be able to call function pointers. Inline the dispatch function. Also inline the context projection function. Transfer debug locations from the suspend point to the inlined functions. Use the function argument index instead of the function argument in coro.id.async. This solves any spurious use issues. Coerce the arguments of the tail call function at a suspend point. The LLVM optimizer seems to drop casts leading to a vararg intrinsic. rdar://70097093 Differential Revision: https://reviews.llvm.org/D91098	2020-11-11 15:25:28 -08:00
Arthur Eubanks	d9cbceb041	[CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass Previously the inliner did a bit of a hack by adding ref edges for all new edges introduced by performing an inline before calling updateCGAndAnalysisManagerForPass(). This was because updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call edges. This adds handling of non-trivial call edges to updateCGAndAnalysisManagerForPass(). The inliner called updateCGAndAnalysisManagerForFunctionPass() since it was handling adding newly introduced edges (so updateCGAndAnalysisManagerForPass() would only have to handle promotion), but now it needs to call updateCGAndAnalysisManagerForCGSCCPass() since updateCGAndAnalysisManagerForPass() is now handling the new call edges and function passes cannot add new edges. We follow the previous path of adding trivial ref edges then letting promotion handle changing the ref edges to call edges and the CGSCC updates. So this still does not allow adding call edges that result in an addition of a non-trivial ref edge. This is in preparation for better detecting devirtualization. Previously since the inliner itself would add ref edges, updateCGAndAnalysisManagerForPass() would think that promotion and thus devirtualization had happened after any sort of inlining. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91046	2020-11-11 13:43:49 -08:00
Jianzhou Zhao	0dd87825db	Add a flag to control whether to propagate labels from condition values to results Before the change, DFSan always does the propagation. W/o origin tracking, it is harder to understand such flows. After the change, the flag is off by default. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D91234	2020-11-11 20:41:42 +00:00
Akira Hatanaka	5e85d00ed6	Move variable declarations to functions in which they are used. NFC	2020-11-11 10:58:43 -08:00
Sander de Smalen	30fded75b4	Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost." This reverts commits: * [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. `b873aba394`. * [LoopVectorizer] Silence warning in GetRegUsage. `9ff701100a`.	2020-11-11 14:41:55 +00:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Sander de Smalen	9ff701100a	[LoopVectorizer] Silence warning in GetRegUsage. This patch silences the warning: error: lambda capture 'DL' is not used [-Werror,-Wunused-lambda-capture] auto GetRegUsage = [&DL, &TTI=TTI](Type *Ty, ElementCount VF) { ~^~~ 1 error generated. Introduced in: https://reviews.llvm.org/rGb873aba3943c067a5efd5303cbdf5aeb0732cf88	2020-11-11 10:54:20 +00:00
Sander de Smalen	b873aba394	[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This is more accurate than dividing the bitwidth based on the element count by the maximum register size, as it can just reuse whatever has been calculated for legalization of these types. This change is also necessary when calculating register usage for scalable vectors, where the legalization of these types cannot be done based on the widest register size, because that does not take the 'vscale' component into account. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91059	2020-11-11 10:18:50 +00:00
Sander de Smalen	0141f5a49d	[LoopVectorizer] NFC: Return ElementCount from compute[Feasible]MaxVF Interfaces changed to return `ElementCount`: * LoopVectorizationCostModel::computeMaxVF * LoopVectorizationCostModel::computeFeasibleMaxVF This is NFC for fixed-width vectors. Reviewed By: dmgreen, ctetreau Differential Revision: https://reviews.llvm.org/D90880	2020-11-11 09:55:06 +00:00
Chen Zheng	4eb8359e74	[EarlyCSE] delete abs/nabs handling delete abs/nabs handling in earlycse pass to avoid bugs related to hashing values. After abs/nabs is canonicalized to intrinsics in D87188, we should get CSE ability for abs/nabs back. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D90734	2020-11-10 21:10:58 -05:00
Florian Hahn	c8d73d939f	Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts commit `a8e50f1c6e`. This reportedly breaks building the Linux kernel. https://bugs.llvm.org/show_bug.cgi?id=48142	2020-11-10 22:50:46 +00:00
Bruno Cardoso Lopes	dc14542a71	[Coroutines] Add missing llvm.dbg.declare's to cover for more allocas Tracking local variables across suspend points is still somewhat incomplete. Consider this coroutine snippet: ``` resumable foo() { int x[10] = {}; int a = 3; co_await std::experimental::suspend_always(); a++; x[0] = 1; a += 2; x[1] = 2; a += 3; x[2] = 3; } ``` Can't manage to print `a` or `x` if they turn out to be allocas during CoroSplit (which happens if you build this code with `-O0` prior to this commit): ``` * thread #1, queue = 'com.apple.main-thread', stop reason = step over frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5 40 co_await std::experimental::suspend_always(); 41 a++; 42 x[0] = 1; -> 43 a += 2; 44 x[1] = 2; 45 a += 3; 46 x[2] = 3; (lldb) p x error: <user expression 21>:1:1: use of undeclared identifier 'x' x ^ ``` The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of `x` uses and we lose debugging quality. Add `llvm.dbg.value`s to all relevant basic blocks such that if later transformations break the dominance the reliable debug info is already in place. For instance, this BB: ``` await.ready: ... %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760 ... %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763 ... %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766 ``` becomes: ``` await.ready: ... call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753 ... %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760 ... %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763 ... %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766 ``` Differential Revision: https://reviews.llvm.org/D90772	2020-11-10 12:36:07 -08:00
Sjoerd Meijer	2ef47910d5	[LoopFlatten] Run it earlier, just before IndVarSimplify This is a prep step for widening induction variables in LoopFlatten if this is posssible (D90640), to avoid having to perform certain overflow checks. Since IndVarSimplify may already widen induction variables, we want to run LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both passes were already close after each other. Differential Revision: https://reviews.llvm.org/D90402	2020-11-10 20:22:41 +00:00
Sjoerd Meijer	706ead0e87	[LoopFlatten] Make it a FunctionPass This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't run into problems of a loop pass deleting a (inner)loop. Differential Revision: https://reviews.llvm.org/D90940	2020-11-10 20:03:31 +00:00
Florian Hahn	a8e50f1c6e	[VPlan] Use VPValue def for VPWidenSelectRecipe. This patch turns VPWidenSelectRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84682	2020-11-10 19:39:37 +00:00
Jonas Paulsson	89a1042b6a	Make inferLibFuncAttributes() add SExt attribute on second arg to ldexp. This was missing as discovered by the SystemZ multistage bot: http://lab.llvm.org:8011/#/builders/8, where wrong code resulted when this extension was not performed. Thanks for review by Ulrich Weigand and Roman Lebedev. Differential Revision: https://reviews.llvm.org/D90760	2020-11-10 18:32:15 +01:00
David Green	c7e275388e	[ARM] Don't aggressively unroll vector remainder loops We already do not unroll loops with vector instructions under MVE, but that does not include the remainder loops that the vectorizer produces. These remainder loops will be rarely executed and are not worth unrolling, as the trip count is likely to be low if they get executed at all. Luckily they get llvm.loop.isvectorized to make recognizing them simpler. We have wanted to do this for a while but hit issues with low overhead loops being reverted due to difficult registry allocation. With recent changes that seems to be less of an issue now. Differential Revision: https://reviews.llvm.org/D90055	2020-11-10 17:01:31 +00:00
Sanne Wouda	dd03881bd5	Add loop distribution to the LTO pipeline The LoopDistribute pass is missing from the LTO pipeline, so -enable-loop-distribute has no effect during post-link. The pre-link loop distribution doesn't seem to survive the LTO pipeline either. With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43% uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is unaffected. Differential Revision: https://reviews.llvm.org/D89896	2020-11-10 12:04:32 +00:00
Sander de Smalen	f47573f9bf	[LoopVectorizer] NFC: Propagate ElementCount to more interfaces. Interfaces changed to take `ElementCount` as parameters: * LoopVectorizationPlanner::buildVPlans * LoopVectorizationPlanner::buildVPlansWithVPRecipes * LoopVectorizationCostModel::selectVectorizationFactor This patch is NFC for fixed-width vectors. Reviewed By: dmgreen, ctetreau Differential Revision: https://reviews.llvm.org/D90879	2020-11-10 11:11:02 +00:00
Max Kazantsev	25755a0159	[NFC] Add flag to disable IV widening in indvar instance This allows us to have control over IV widening in the pipeline.	2020-11-10 15:10:44 +07:00
Arthur Eubanks	1cbf8e89b5	[NewPM] Port -separate-const-offset-from-gep Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D91095	2020-11-09 17:42:36 -08:00
Michael Kruse	e5dba2d7e5	[OMPIRBuilder] Start 'Create' methods with lower case. NFC. For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews. This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers. I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller. Reviewed By: mehdi_amini, fghanim Differential Revision: https://reviews.llvm.org/D91109	2020-11-09 19:35:11 -06:00
Xun Li	c2cb093d9b	[Coroutine] Move all used local allocas to the .resume function Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function. However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function. When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used. This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry. Differential Revision: https://reviews.llvm.org/D90977	2020-11-09 17:24:49 -08:00
Sjoerd Meijer	e2dcea4489	[LoopFlatten] FlattenInfo bookkeeping. NFC. Introduce struct FlattenInfo to group some of the bookkeeping. Besides this being a bit of a clean-up, it is a prep step for next additions (D90640). I could take things a bit further, but thought this was a good first step also not to make this change too large. Differential Revision: https://reviews.llvm.org/D90408	2020-11-09 14:50:26 +00:00
Florian Hahn	f0d76275cb	[VPlan] Print result value for loads in VPWidenMemoryInst (NFC). For loads, print the result value.	2020-11-09 14:01:29 +00:00
Florian Hahn	537829f2a7	[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC). Move logic to check if the recipe is a store to a helper for easier reuse.	2020-11-09 14:01:29 +00:00
Florian Hahn	fec64de261	[VPlan] Use VPValue def for VPWidenCall. This patch turns VPWidenCall into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84681	2020-11-09 13:29:41 +00:00
Florian Hahn	091c5c9a18	[VPlan] Add printOperands helper to VPUser (NFC). Factor out the code for printing operands of a VPUser so it can be re-used when printing other recipes.	2020-11-09 12:30:57 +00:00
LemonBoy	42732d33cc	[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated. This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D89628	2020-11-09 14:41:07 +03:00
Tim Northover	f7fe7ea24d	[MergeFunctions] fix function attribute comparison in FunctionComparator The comparison of AttributeSets stopped after seeing a matching type attribute. Subsequent mismatching attributes were not detected causing a crash.	2020-11-09 09:19:11 +00:00
Simon Pilgrim	b11eaf5617	[DSE] Don't dereference a dyn_cast<> result - use cast<> instead. NFCI. We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.	2020-11-08 13:07:45 +00:00
Simon Pilgrim	0fe91ad463	[InstCombine] foldSelectFunnelShift - block poison in funnel shift value As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.	2020-11-08 12:58:30 +00:00
Florian Hahn	e8dc17a2b7	[LoopInterchange] Skip non SCEV-able operands in cost function. This fixes a crash when trying to get a SCEV expression for operands that are not SCEV-able.	2020-11-08 11:41:19 +00:00
Pedro Tammela	5e8ecff0d8	[Reg2Mem] add support for the new pass manager This patch refactors the pass to accomodate the new pass manager boilerplate. Differential Revision: https://reviews.llvm.org/D91005	2020-11-08 11:14:05 +00:00
Kazu Hirata	75e46c6328	[Mem2Reg] Use llvm::count instead of std::count (NFC)	2020-11-07 20:18:47 -08:00
Kazu Hirata	c95fff5be7	[JumpThreading] Fix function names (NFC)	2020-11-07 19:35:03 -08:00
Atmn Patel	04a0896487	Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops" This reverts commit `0b17c6e447`. This patch causes a compile-time error in SCEV.	2020-11-07 00:32:12 -05:00
Atmn Patel	0b17c6e447	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2020-11-06 22:06:58 -05:00
Atmn Patel	babc224c5d	[LoopDeletion] Remove dead loops with no exit blocks Currently, LoopDeletion refuses to remove dead loops with no exit blocks because it cannot statically determine the control flow after it removes the block. This leads to miscompiles if the loop is an infinite loop and should've been removed. Differential Revision: https://reviews.llvm.org/D90115	2020-11-06 17:08:34 -05:00
Quentin Colombet	a585228027	Prevent LICM and machineLICM from hoisting convergent operations Results of convergent operations are implicitly affected by the enclosing control flows and should not be hoisted out of arbitrary loops. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D90361	2020-11-06 10:26:39 -08:00
Arnold Schwaighofer	c6543cc6b8	llvm.coro.id.async lowering: Parameterize how-to restore the current's continutation context and restart the pipeline after splitting The `llvm.coro.suspend.async` intrinsic takes a function pointer as its argument that describes how-to restore the current continuation's context from the context argument of the continuation function. Before we assumed that the current context can be restored by loading from the context arguments first pointer field (`first_arg->caller_context`). This allows for defining suspension points that reuse the current context for example. Also: llvm.coro.id.async lowering: Add llvm.coro.preprare.async intrinsic Blocks inlining until after the async coroutine was split. Also, change the async function pointer's context size position struct async_function_pointer { uint32_t relative_function_pointer_to_async_impl; uint32_t context_size; } And make the position of the `async context` argument configurable. The position is specified by the `llvm.coro.id.async` intrinsic. rdar://70097093 Differential Revision: https://reviews.llvm.org/D90783	2020-11-06 06:22:46 -08:00
Florian Hahn	d8d1cc647d	[SLP] Also try to vectorize incoming values of PHIs . Currently we do not consider incoming values of PHIs as roots for SLP vectorization. This means we miss scenarios like the one in the test case and PR47670. It appears quite straight-forward to consider incoming values of PHIs as roots for vectorization, but I might be missing something that makes this problematic. In terms of vectorized instructions, this applies to quite a few benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto Same hash: 185 (filtered out) Remaining: 52 Metric: SLP.NumVectorInstructions Program base patch diff test-suite...ProxyApps-C++/HPCCG/HPCCG.test 9.00 27.00 200.0% test-suite...C/CFP2000/179.art/179.art.test 8.00 22.00 175.0% test-suite...T2006/458.sjeng/458.sjeng.test 14.00 30.00 114.3% test-suite...ce/Benchmarks/PAQ8p/paq8p.test 11.00 18.00 63.6% test-suite...s/FreeBench/neural/neural.test 12.00 18.00 50.0% test-suite...rimaran/enc-3des/enc-3des.test 65.00 95.00 46.2% test-suite...006/450.soplex/450.soplex.test 63.00 89.00 41.3% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 177.00 250.00 41.2% test-suite...nchmarks/McCat/18-imp/imp.test 13.00 18.00 38.5% test-suite.../Applications/sgefa/sgefa.test 26.00 35.00 34.6% test-suite...pplications/oggenc/oggenc.test 100.00 133.00 33.0% test-suite...6/482.sphinx3/482.sphinx3.test 103.00 134.00 30.1% test-suite...oxyApps-C++/miniFE/miniFE.test 169.00 213.00 26.0% test-suite.../Benchmarks/Olden/tsp/tsp.test 59.00 73.00 23.7% test-suite...TimberWolfMC/timberwolfmc.test 503.00 622.00 23.7% test-suite...T2006/456.hmmer/456.hmmer.test 65.00 79.00 21.5% test-suite...libquantum/462.libquantum.test 58.00 68.00 17.2% test-suite...ternal/HMMER/hmmcalibrate.test 84.00 98.00 16.7% test-suite...ications/JM/ldecod/ldecod.test 351.00 401.00 14.2% test-suite...arks/VersaBench/dbms/dbms.test 52.00 57.00 9.6% test-suite...ce/Benchmarks/Olden/bh/bh.test 118.00 128.00 8.5% test-suite.../Benchmarks/Bullet/bullet.test 6355.00 6880.00 8.3% test-suite...nsumer-lame/consumer-lame.test 480.00 519.00 8.1% test-suite...000/183.equake/183.equake.test 226.00 244.00 8.0% test-suite...chmarks/Olden/power/power.test 105.00 113.00 7.6% test-suite...6/471.omnetpp/471.omnetpp.test 92.00 99.00 7.6% test-suite...ications/JM/lencod/lencod.test 1173.00 1261.00 7.5% test-suite...0/253.perlbmk/253.perlbmk.test 55.00 59.00 7.3% test-suite...oxyApps-C/miniAMR/miniAMR.test 92.00 98.00 6.5% test-suite...chmarks/MallocBench/gs/gs.test 446.00 473.00 6.1% test-suite.../CINT2006/403.gcc/403.gcc.test 464.00 491.00 5.8% test-suite...6/464.h264ref/464.h264ref.test 998.00 1055.00 5.7% test-suite...006/453.povray/453.povray.test 5711.00 6007.00 5.2% test-suite...FreeBench/distray/distray.test 102.00 107.00 4.9% test-suite...:: External/Povray/povray.test 4184.00 4378.00 4.6% test-suite...DOE-ProxyApps-C/CoMD/CoMD.test 112.00 117.00 4.5% test-suite...T2006/445.gobmk/445.gobmk.test 104.00 108.00 3.8% test-suite...CI_Purple/SMG2000/smg2000.test 789.00 819.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 233.00 241.00 3.4% test-suite...marks/7zip/7zip-benchmark.test 417.00 428.00 2.6% test-suite...arks/mafft/pairlocalalign.test 627.00 643.00 2.6% test-suite.../Benchmarks/nbench/nbench.test 259.00 265.00 2.3% test-suite...006/447.dealII/447.dealII.test 4641.00 4732.00 2.0% test-suite...lications/ClamAV/clamscan.test 106.00 108.00 1.9% test-suite...CFP2000/177.mesa/177.mesa.test 1639.00 1664.00 1.5% test-suite...oxyApps-C/RSBench/rsbench.test 66.00 65.00 -1.5% test-suite.../CINT2000/252.eon/252.eon.test 3416.00 3444.00 0.8% test-suite...CFP2000/188.ammp/188.ammp.test 1846.00 1861.00 0.8% test-suite.../CINT2000/176.gcc/176.gcc.test 152.00 153.00 0.7% test-suite...CFP2006/444.namd/444.namd.test 3528.00 3544.00 0.5% test-suite...T2006/473.astar/473.astar.test 98.00 98.00 0.0% test-suite...frame_layout/frame_layout.test NaN 39.00 nan% On ARM64, there appears to be a slight regression on SPEC2006, which might be interesting to investigate: test-suite...T2006/473.astar/473.astar.test 0.9% Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D88735	2020-11-06 12:50:32 +00:00
Sander de Smalen	4a3bb9ea6c	[VPlan] NFC: Change VFRange to take ElementCount This patch changes the type of Start, End in VFRange to be an ElementCount instead of `unsigned`. This is done as preparation to make VPlans for scalable vectors, but is otherwise NFC. Reviewed By: dmgreen, fhahn, vkmr Differential Revision: https://reviews.llvm.org/D90715	2020-11-06 09:50:20 +00:00
Roman Lebedev	8d0fdd36a3	[IR] CmpInst: Add getFlippedSignednessPredicate() And refactor a few places to use it	2020-11-06 11:31:09 +03:00
Giorgis Georgakoudis	700d2417d8	[CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers CodeExtractor handles bitcasts in the extracted region that have lifetime markers users in the outer region as outputs. That creates unnecessary alloca/reload instructions and extra lifetime markers. The patch identifies those cases, and replaces uses in out-of-region lifetime markers with new bitcasts in the outer region. Example ``` define void @foo() { entry: %0 = alloca i32 br label %extract extract: %1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) call void @use(i32* %0) br label %exit exit: call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %1) ret void } ``` Current extraction ``` define void @foo() { entry: %.loc = alloca i8, align 8 %0 = alloca i32, align 4 br label %codeRepl codeRepl: ; preds = %entry %lt.cast = bitcast i8* %.loc to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast) %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0, i8** %.loc) %.reload = load i8, i8* %.loc, align 8 call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast) br label %exit exit: ; preds = %codeRepl call void @use(i32* %0) call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload) ret void } define internal void @foo.extract(i32* %0, i8** %.out) { newFuncRoot: br label %extract exit.exitStub: ; preds = %extract ret void extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* store i8* %1, i8** %.out, align 8 call void @use(i32* %0) br label %exit.exitStub } ``` Extraction with patch ``` define void @foo() { entry: %0 = alloca i32, align 4 br label %codeRepl codeRepl: ; preds = %entry %lt.cast1 = bitcast i32* %0 to i8* call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1) call void @foo.extract(i32* %0) br label %exit exit: ; preds = %codeRepl call void @use(i32* %0) %lt.cast = bitcast i32* %0 to i8* call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast) ret void } define internal void @foo.extract(i32* %0) { newFuncRoot: br label %extract exit.exitStub: ; preds = %extract ret void extract: ; preds = %newFuncRoot %1 = bitcast i32* %0 to i8* call void @use(i32* %0) br label %exit.exitStub } ``` Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D90689	2020-11-05 17:01:08 -08:00
Sjoerd Meijer	7eb70158e4	[IndVarSimplify][SimplifyIndVar] Move WidenIV to Utils/SimplifyIndVar. NFCI. This moves WidenIV from IndVarSimplify to Utils/SimplifyIndVar so that we have createWideIV available as a generic helper utility. I.e., this is not only useful in IndVarSimplify, but could be useful for loop transformations. For example, motivation for this refactoring is the loop flatten transformation: if induction variables in a loop nest can be widened, we can avoid having to perform certain overflow checks, enabling this transformation. Differential Revision: https://reviews.llvm.org/D90421	2020-11-05 16:52:47 +00:00
Florian Hahn	be0578f0b4	[GVN] Fix MemorySSA update when replacing assume(false) with stores. When replacing an assume(false) with a store, we have to be more careful with the order we insert the new access. This patch updates the code to look at the accesses in the block to find a suitable insertion point. Alterantively we could check the defining access of the assume, but IIRC there has been some discussion about making assume() readnone, so looking at the access list might be more future proof. Fixes PR48072. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90784	2020-11-05 12:09:32 +00:00
Simon Pilgrim	4b2be681f4	[InstCombine] Remove orphan InstCombinerImpl method declarations. NFCI.	2020-11-05 10:13:16 +00:00
Arnold Schwaighofer	ea5989b43a	Start of an llvm.coro.async implementation This patch adds the `async` lowering of coroutines. This will be used by the Swift frontend to lower async functions. In contrast to the `retcon` lowering the frontend needs to be in control over control-flow at suspend points as execution might be suspended at these points. This is very much work in progress and the implementation will change as it evolves with the frontend. As such the documentation is lacking detail as some of it might change. rdar://70097093 Reapply with fix for memory sanitizer failure and sphinx failure. Differential Revision: https://reviews.llvm.org/D90612	2020-11-04 10:29:21 -08:00
Arnold Schwaighofer	42f1916640	Revert "Start of an llvm.coro.async implementation" This reverts commit `ea606cced0`. This patch causes memory sanitizer failures sanitizer-x86_64-linux-fast.	2020-11-04 08:26:20 -08:00
Arnold Schwaighofer	ea606cced0	Start of an llvm.coro.async implementation This patch adds the `async` lowering of coroutines. This will be used by the Swift frontend to lower async functions. In contrast to the `retcon` lowering the frontend needs to be in control over control-flow at suspend points as execution might be suspended at these points. This is very much work in progress and the implementation will change as it evolves with the frontend. As such the documentation is lacking detail as some of it might change. rdar://70097093 Differential Revision: https://reviews.llvm.org/D90612	2020-11-04 07:32:29 -08:00
Roman Lebedev	93f3d7f7b3	[Reassociate] Guard `add`-like `or` conversion into an `add` with profitability check This is slightly better compile-time wise, since we avoid potentially-costly knownbits analysis that will ultimately not allow us to actually do anything with said `add`.	2020-11-04 16:10:34 +03:00
Martin Storsjö	36cf1e7d0e	Revert "[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts" This reverts commit `59b22e495c`. That commit broke building for ARM and AArch64, reproducible like this: $ cat apedec-reduced.c a; b(e) { int c; unsigned d = f(); c = d >> 32 - e; return c; } g() { int h = i(); if (a) h = h << a \| b(a); return h; } $ clang -target aarch64-linux-gnu -w -c -O3 apedec-reduced.c clang: ../lib/Transforms/InstCombine/InstructionCombining.cpp:3656: bool llvm::InstCombinerImpl::run(): Assertion `DT.dominates(BB, UserParent) && "Dominance relation broken?"' failed. Same thing for e.g. an armv7-linux-gnueabihf target.	2020-11-04 08:39:32 +02:00
Xun Li	7f34aca083	[musttail] Unify musttail call preceding return checking There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction. Use it instead of manually checking in each place. Differential Revision: https://reviews.llvm.org/D90693	2020-11-03 11:39:27 -08:00
Roman Lebedev	70472f34b2	[Reassociate] Convert `add`-like `or`'s into an `add`'s to allow reassociation InstCombine is quite aggressive in doing the opposite transform, folding `add` of operands with no common bits set into an `or`, and that not many things support that new pattern.. In this case, teaching Reassociate about it is easy, there's preexisting art for `sub`/`shl`: just convert such an `or` into an `add`: https://rise4fun.com/Alive/Xlyv	2020-11-03 22:30:51 +03:00
Sanne Wouda	2ec26d3a23	Revert "Add loop distribution to the LTO pipeline" This reverts commit `6e80318eec`.	2020-11-03 19:29:27 +00:00
Sanne Wouda	6e80318eec	Add loop distribution to the LTO pipeline The LoopDistribute pass is missing from the LTO pipeline, so -enable-loop-distribute has no effect during post-link. The pre-link loop distribution doesn't seem to survive the LTO pipeline either. With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43% uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is unaffected. Differential Revision: https://reviews.llvm.org/D89896	2020-11-03 18:54:24 +00:00
Jameson Nash	59a6ab28c4	[GVN] small improvements to comments	2020-11-03 13:21:48 -05:00
Roman Lebedev	c009d11bda	[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator In particular, it makes it fire for C=0, because negator doesn't want to perform that fold since in general it's not beneficial.	2020-11-03 16:06:52 +03:00
Roman Lebedev	e465f9c303	[InstCombine] Negator: - (C - %x) --> %x - C (PR47997) This relaxes one-use restriction on that `sub` fold, since apparently the addition of Negator broke preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.	2020-11-03 16:06:51 +03:00
Florian Hahn	d68bed0fa9	[SCCP] Handle bitcast of vector constants. Vectors where all elements have the same known constant range are treated as a single constant range in the lattice. When bitcasting such vectors, there is a mis-match between the width of the lattice value (single constant range) and the original operands (vector). Go to overdefined in that case. Fixes PR47991.	2020-11-03 12:58:39 +00:00
Simon Pilgrim	59b22e495c	[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns. This should allow us to begin resolving PR46896 et al. Differential Revision: https://reviews.llvm.org/D90625	2020-11-03 10:49:49 +00:00
Florian Hahn	d9cbf39a37	[SLP] Pass VecPred argument to getCmpSelInstrCost. Check if all compares in VL have the same predicate and pass it to getCmpSelInstrCost, to improve cost-modeling on targets that only support compare/select combinations for certain uniform predicates. This leads to additional vectorization in some cases ``` Same hash: 217 (filtered out) Remaining: 19 Metric: SLP.NumVectorInstructions Program base slp2 diff test-suite...marks/SciMark2-C/scimark2.test 11.00 26.00 136.4% test-suite...T2006/445.gobmk/445.gobmk.test 79.00 135.00 70.9% test-suite...ediabench/gsm/toast/toast.test 54.00 71.00 31.5% test-suite...telecomm-gsm/telecomm-gsm.test 54.00 71.00 31.5% test-suite...CI_Purple/SMG2000/smg2000.test 426.00 542.00 27.2% test-suite...ch/g721/g721encode/encode.test 30.00 24.00 -20.0% test-suite...000/186.crafty/186.crafty.test 116.00 138.00 19.0% test-suite...ications/JM/ldecod/ldecod.test 697.00 765.00 9.8% test-suite...6/464.h264ref/464.h264ref.test 822.00 886.00 7.8% test-suite...chmarks/MallocBench/gs/gs.test 154.00 162.00 5.2% test-suite...nsumer-lame/consumer-lame.test 621.00 651.00 4.8% test-suite...lications/ClamAV/clamscan.test 223.00 231.00 3.6% test-suite...marks/7zip/7zip-benchmark.test 680.00 695.00 2.2% test-suite...CFP2000/177.mesa/177.mesa.test 2121.00 2129.00 0.4% test-suite...:: External/Povray/povray.test 2406.00 2412.00 0.2% test-suite...TimberWolfMC/timberwolfmc.test 634.00 634.00 0.0% test-suite...CFP2006/433.milc/433.milc.test 1036.00 1036.00 0.0% test-suite.../Benchmarks/nbench/nbench.test 321.00 321.00 0.0% test-suite...ctions-flt/Reductions-flt.test NaN 5.00 nan% ``` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D90124	2020-11-03 10:16:43 +00:00
Max Kazantsev	46b2e85f0f	[NFC] Refactor code in IndVars, preparing for further improvement	2020-11-03 15:08:12 +07:00
Max Kazantsev	a44b7322a2	[NFC] Split lambda into 2 parts for further reuse	2020-11-03 14:13:55 +07:00
Max Kazantsev	f847094c24	[IndVars] Use knowledge about execution on last iteration when removing checks If we know that some check will not be executed on the last iteration, we can use this fact to eliminate its check. Differential Revision: https://reviews.llvm.org/D88210 Reviwed By: ebrevnov	2020-11-03 13:38:58 +07:00
Alina Sbirlea	f514b32a89	[LICM] Add assert of AST/MSSA exclusiveness. The API `canSinkOrHoistInst` may be called by LoopSink. Add assert to avoid having two analyses passed in.	2020-11-02 18:04:43 -08:00
Akira Hatanaka	b0f1d7d562	Remove unused parameter	2020-11-02 17:40:06 -08:00
Ettore Tiotto	4274cbba1c	[PartialInliner]: Handle code regions in a switch stmt cases This patch enhances computeOutliningColdRegionsInfo() to allow it to consider regions containing a single basic block and a single predecessor as candidate for partial inlining. Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D89911	2020-11-02 14:32:45 -05:00
Simon Pilgrim	55f15f99cb	[AggressiveInstCombine] foldGuardedRotateToFunnelShift - generalize rotation to funnel shift matcher. Replace matchRotate with a more general matchFunnelShift - at the moment this is still just used for rotation patterns.	2020-11-02 17:09:17 +00:00
Fangrui Song	98b9338588	[Debugify] Port -debugify-each to NewPM Preemptively switch 2 tests to the new PM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D90365	2020-11-02 08:16:43 -08:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Teresa Johnson	0949f96dc6	[MemProf] Pass down memory profile name with optional path from clang Similar to -fprofile-generate=, add -fmemory-profile= which takes a directory path. This is passed down to LLVM via a new module flag metadata. LLVM in turn provides this name to the runtime via the new __memprof_profile_filename variable. Additionally, always pass a default filename (in $cwd if a directory name is not specified vi the = form of the option). This is also consistent with the behavior of the PGO instrumentation. Since the memory profiles will generally be fairly large, it doesn't make sense to dump them to stderr. Also, importantly, the memory profiles will eventually be dumped in a compact binary format, which is another reason why it does not make sense to send these to stderr by default. Change the existing memprof tests to specify log_path=stderr when that was being relied on. Depends on D89086. Differential Revision: https://reviews.llvm.org/D89087	2020-11-01 17:38:23 -08:00
Florian Hahn	ca38652b9a	[VPlan] Assert no users remaining when deleting a VPValue. When deleting a VPValue, all users must already by deleted. Add an assertion to make sure and catch violations.	2020-11-01 17:44:53 +00:00
Florian Hahn	aab71d4443	[DSE] Use same logic as legacy impl to check if free kills a location. This patch updates DSE + MemorySSA to use the same check as the legacy implementation to determine if a location is killed by a free call. This changes the existing behavior so that a free does not kill locations before the start of the freed pointer. This should fix PR48036.	2020-10-31 20:09:25 +00:00
Florian Hahn	799033d8c5	Reland "[SLP] Consider alternatives for cost of select instructions." This reverts the revert commit `a1b53db324`. This patch includes a fix for a reported issue, caused by matchSelectPattern returning UMIN for selects of pointers in some cases by looking to some connected casts. For now, ensure integer instrinsics are only returned for selects of ints or int vectors.	2020-10-31 16:52:36 +00:00
Simon Pilgrim	538fdb0189	[InstCombine] foldSelectRotate - generalize to foldSelectFunnelShift This is the last of the rotate->funnel shift InstCombine generalizations for PR46896 We still have foldGuardedRotateToFunnelShift to deal with in AggressiveInstCombine Differential Revision: https://reviews.llvm.org/D90382	2020-10-31 12:32:34 +00:00
Simon Pilgrim	4da6a48399	[CSE] Make some basic EarlyCSE::StackNode helper methods const. NFCI. Fixes a number of cppcheck remarks.	2020-10-31 12:16:48 +00:00
Nikita Popov	27f647d117	[Inliner] Consistently apply callsite noalias metadata Previously, !noalias and !alias.scope metadata on the call site was applied as part of CloneAliasScopeMetadata(), which short-circuits if the callee does not use any noalias metadata itself. However, these two things have no relation to each other. Consistently apply !noalias and !alias.scope metadata by integrating this into an existing function that handled !llvm.access.group and !llvm.mem.parallel_loop_access metadata. The handling for all of these metadata kinds essentially the same.	2020-10-31 10:54:45 +01:00
Arthur Eubanks	5c31b8b94f	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `10f2a0d662`. More uint64_t overflows.	2020-10-31 00:25:32 -07:00
Florian Hahn	a1b53db324	Revert "[SLP] Consider alternatives for cost of select instructions." This reverts commit `1922570489`. This appears to cause a crash in the following example a, b, c; l() { int e = a, f = l, g, h, i, j; float d = c, k = b; for (;;) for (; g < f; g++) { k[h] = d[i]; k[h - 1] = d[j]; h += e << 1; i += e; } } clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.	2020-10-30 21:26:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Peter Collingbourne	3d049bce98	hwasan: Support for outlined checks in the Linux kernel. Add support for match-all tags and GOT-free runtime calls, which are both required for the kernel to be able to support outlined checks. This requires extending the access info to let the backend know when to enable these features. To make the code easier to maintain introduce an enum with the bit field positions for the access info. Allow outlined checks to be enabled with -mllvm -hwasan-inline-all-checks=0. Kernels that contain runtime support for outlined checks may pass this flag. Kernels lacking runtime support will continue to link because they do not pass the flag. Old versions of LLVM will ignore the flag and continue to use inline checks. With a separate kernel patch [1] I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time before 92824064 6.18s after 38822400 6.65s [1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Depends on D90425 Differential Revision: https://reviews.llvm.org/D90426	2020-10-30 14:25:40 -07:00
Peter Collingbourne	0930763b4b	hwasan: Move fixed shadow behind opaque no-op cast as well. This is a workaround for poor heuristics in the backend where we can end up materializing the constant multiple times. This is particularly bad when using outlined checks because we materialize it for every call (because the backend considers it trivial to materialize). As a result the field containing the shadow base value will always be set so simplify the code taking that into account. Differential Revision: https://reviews.llvm.org/D90425	2020-10-30 13:23:52 -07:00
Arthur Eubanks	10f2a0d662	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-30 10:03:46 -07:00
Pedro Tammela	86e0c1acdb	[NFC][Reg2Mem] modernize loops iterators This patch updates the Reg2Mem loops to use more modern iterators. Differential Revision: https://reviews.llvm.org/D90122	2020-10-30 16:50:07 +00:00
Pedro Tammela	70a495c7f0	[NFC][LoopSimplify] modernize for loops over LoopInfo This patch modifies two for loops to use the range based syntax. Since they are equivalent, this patch is tagged NFC. Differential Revision: https://reviews.llvm.org/D90069	2020-10-30 16:50:07 +00:00
Michael Liao	c82403d025	[gvn] PRE needs to skip convergent intrinsics/calls. - As convergent intrinsics/calls could only be moved to control-equivalent blocks, or more precisely the same divergent branch, PRE needs to skip them. Differential Revision: https://reviews.llvm.org/D90391	2020-10-30 11:24:40 -04:00
Evgeniy Brevnov	3d31adaec4	[DSE] Improve partial overlap detection Currently isOverwrite returns OW_MaybePartial even for accesss known not to overlap. This is not a big problem for legacy implementation (since isPartialOverwrite follows isOverwrite and clarifies the result). Contrary SSA based version does a lot of work to later find out that accesses don't overlap. Besides negative impact on compile time we quickly reach MemorySSAPartialStoreLimit and miss optimization opportunities. Note: In fact, I think it would be cleaner implementation if isOverwrite returned fully clarified result in the first place whithout need to call isPartialOverwrite. This can be done as a follow up. What do you think? Reviewed By: fhahn, asbirlea Differential Revision: https://reviews.llvm.org/D90371	2020-10-30 22:23:20 +07:00
Simon Pilgrim	ed577892cf	Use cast<> instead of dyn_cast<> as we dereference the pointers immediately. NFCI. Fix clang static analyzer warnings - we're better off relying on cast<> asserting on failure rather than a null dereference crash.	2020-10-30 15:20:40 +00:00
Florian Hahn	aa1a198a64	[VPlan] Use isa<> instead getVPRecipeID in getFirstNonPhi (NFC). As per the comment in VPRecipeBase, clients should not rely on getVPRecipeID, as it may change in the future. It should only be used in classof implementations. Use isa instead in getFirstNonPhi.	2020-10-30 14:56:06 +00:00
Simon Pilgrim	b7c91a9b8e	[SCEV] SCEVExpander::InsertNoopCastOfTo - reduce scope of pointer type. NFCI. By reducing the scope of the dyn_cast<PointerType> we can make this a cast<PointerType> and avoid clang static analyzer null deference warnings.	2020-10-30 14:55:09 +00:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Simon Pilgrim	9e154f1aca	[SROA] Pass Twine by const reference. NFCI. Fixes clang-tidy warnings.	2020-10-30 11:36:58 +00:00
Max Kazantsev	bd341bafbf	[NFC] Simplify code in IndVars	2020-10-30 17:49:32 +07:00
Florian Hahn	05e4f7bde9	[DSE] Remove noop stores after killing stores for a MemoryDef. Currently we fail to eliminate some noop stores if there is a kill-able store between the starting def and the load. This is because we eliminate noop stores first. In practice it seems like eliminating noop stores after the main elimination for a def covers slightly more cases. This patch improves the number of stores slightly in 2 cases for X86 -O3 -flto Same hash: 235 (filtered out) Remaining: 2 Metric: dse.NumRedundantStores Program base patch diff test-suite...ce/Benchmarks/PAQ8p/paq8p.test 2.00 3.00 50.0% test-suite...006/453.povray/453.povray.test 18.00 21.00 16.7% There might be other phase ordering issues, but it appears that they do not show up in the test-suite/SPEC2000/SPEC2006. We can always tune the ordering later. Partly fixes PR47887. Reviewed By: asbirlea, zoecarver Differential Revision: https://reviews.llvm.org/D89650	2020-10-30 09:40:15 +00:00
Roman Lebedev	81fc53a36a	[SCEV] Introduce SCEVPtrToIntExpr (PR46786) And use it to model LLVM IR's `ptrtoint` cast. This is essentially an alternative to D88806, but with no chance for all the problems it caused due to having the cast as implicit there. (see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3) As we've established by now, there are at least two reasons why we want this: * It will allow SCEV to actually model the `ptrtoint` casts and their operands, instead of treating them as `SCEVUnknown` * It should help with initial problem of PR46786 - this should eventually allow us to not loose pointer-ness of an expression in more cases As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 \| PR46786 ]], in principle, we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()` should sink the cast as far down into the expression as possible, so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`. But i think that it isn't the best solution, because it doesn't really matter from memory consumption side - there probably won't be that many `SCEVPtrToIntExpr`s for it to matter, and it allows for much better discoverability. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89456	2020-10-30 11:13:35 +03:00
Vitaly Buka	36fa658db5	[NFC] Fix "ambiguous overload for ‘operator=’" From D89768	2020-10-30 00:43:32 -07:00
Vitaly Buka	1455259546	[NFC] Fix "ambiguous overload for ‘operator=’"	2020-10-30 00:36:50 -07:00
Xun Li	9f5a2beadc	[Coroutine] Properly determine whether an alloca should live on the frame The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame. Differential Revision: https://reviews.llvm.org/D89768	2020-10-29 23:56:05 -07:00
Stefanos Baziotis	a3345300b6	[LCSSA] Doc for special treatment of PHIs Differential Revision: https://reviews.llvm.org/D89739	2020-10-29 22:50:07 +02:00
Nikita Popov	20b386aae0	[LoopUtils] Fix neutral value for vector.reduce.fadd Use -0.0 instead of 0.0 as the start value. The previous use of 0.0 was fine for all existing uses of this function though, as it is always generated with fast flags right now, and thus nsz.	2020-10-29 21:45:13 +01:00
Florian Hahn	1922570489	[SLP] Consider alternatives for cost of select instructions. Some architectures do not have general vector select instructions (e.g. AArch64). But some cmp/select patterns can be vectorized using other instructions/intrinsics. One example is using min/max instructions for certain patterns. This patch updates the cost calculations for selects in the SLP vectorizer to consider using min/max intrinsics. This patch does not change SLP vectorizer's codegen itself to actually generate those intrinsics, but relies on the backends to lower the vector cmps & selects. This keeps things simple on the SLP side and works well in practice for AArch64. This exposes additional SLP vectorization opportunities in some benchmarks on AArch64 (-O3 -flto). Metric: SLP.NumVectorInstructions Program base slp diff test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8% test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2% test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1% test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2% test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5% test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2% Reviewed By: RKSimon, samparker Differential Revision: https://reviews.llvm.org/D89969	2020-10-29 20:39:50 +00:00
Dávid Bolvanský	7a2abf5aca	[InferAttrs] Add nocapture/writeonly to string/mem libcalls One step closer to fix PR47644. Differential Revision: https://reviews.llvm.org/D89645	2020-10-29 20:06:43 +01:00
Simon Pilgrim	dcb3dc101d	[InstCombine] visitShl - ensure inner shifts have inrange amounts Noticed when fixing OSS Fuzz #26716	2020-10-29 15:28:15 +00:00
Max Kazantsev	a5b2e795c3	[NFC][SCEV] Refactor monotonic predicate checks to return enums instead of bools This patch gets rid of output parameter which is not needed for most users and prepares this API for further refactoring.	2020-10-29 16:01:25 +07:00
Johannes Doerfert	d39f574dcc	[Attributor][FIX] Properly promote arguments pointers to arrays When we promote pointer arguments we did compute a wrong offset and use a wrong type for the array case. Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.	2020-10-29 00:45:32 -05:00
Fangrui Song	39856d5d0b	[Debugify] Move global namespace functions into llvm:: Also move exportDebugifyStats from tools/opt to Debugify.cpp	2020-10-28 19:11:41 -07:00
Florian Hahn	53f4c4b2cc	[InstCombine] Do not introduce bitcasts for swifterror arguments. The following constraints hold for swifterror values: A swifterror value (either the parameter or the alloca) can only be loaded and stored from, or used as a swifterror argument. This patch updates instcombine to not try to convert a bitcast of a function into a bitcast of a swifterror argument. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D90258	2020-10-28 21:52:12 +00:00
Benjamin Kramer	207cf71fa9	Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" This reverts commit `d981c7b758` and `a87d7b3d44`. Test fails under msan.	2020-10-28 13:58:14 +01:00
Max Kazantsev	160a453138	Return "[IndVars] Remove monotonic checks with unknown exit count" This reverts commit `e038b60d91`. This reverts commit `a0d84d8031`. This revert was a mistake. The reason of the failures was "Use uint64_t for branch weights instead of uint32_t" Differential Revision: https://reviews.llvm.org/D87832	2020-10-28 18:51:40 +07:00
Florian Hahn	b82f80057d	[DSE] Use walker to skip noalias stores between current & clobber def. Instead of getting the defining access we should be able to use getClobberingMemoryAccess to skip non-aliasing MemoryDefs. No additional checks should be needed, because we only remove the starting def if it matches the defining access of the load. All we need to worry about is that there are no (may)alias stores between the starting def and the load and getClobberingMemoryAccess should guarantee that. Partly fixes PR47887. This improves the number of redundant stores removed in some cases (numbers below for MultiSource, SPEC2000, SPEC2006 on X86 with -flto -O3). Same hash: 226 (filtered out) Remaining: 11 Metric: dse.NumRedundantStores Program base patch1 diff test-suite...:: External/Povray/povray.test 1.00 5.00 400.0% test-suite...chmarks/MallocBench/gs/gs.test 1.00 3.00 200.0% test-suite...0/253.perlbmk/253.perlbmk.test 21.00 37.00 76.2% test-suite...0.perlbench/400.perlbench.test 24.00 37.00 54.2% test-suite.../Applications/SPASS/SPASS.test 3.00 4.00 33.3% test-suite...006/453.povray/453.povray.test 15.00 18.00 20.0% test-suite...T2006/445.gobmk/445.gobmk.test 27.00 29.00 7.4% test-suite.../CINT2006/403.gcc/403.gcc.test 136.00 137.00 0.7% test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0% test-suite.../Benchmarks/Bullet/bullet.test NaN 3.00 nan% test-suite.../Benchmarks/Ptrdist/bc/bc.test NaN 1.00 nan% Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89647	2020-10-28 11:01:25 +00:00
Luqman Aden	4c0a016927	Rename EHPersonality::MSVC_Win64SEH to EHPersonality::MSVC_TableSEH. NFC. The types of SEH aren't x86(-32) vs x64 but rather stack-based exception chaining vs table-based exception handling. x86-32 is the only arch for which Windows uses the former. 32-bit ARM would use what is called Win64SEH today, which is a bit confusing so instead let's just rename it to be a bit more clear. Reviewed By: compnerd, rnk Differential Revision: https://reviews.llvm.org/D90117	2020-10-27 23:22:13 -07:00
Kazu Hirata	b2f05fae80	[JumpThreading] Remove extraneous calls to setEdgeProbability This patch removes extraneous calls to setEdgeProbability introduced in `c91487769d`. The follow-up patch, `a7b662d0f4`, has since fixed BranchProbabilityInfo::eraseBlock, so we don't need to worry about getting stale values from getEdgeProbability. Also, since getEdgeProbability(BB, BB->getSingleSuccessor()) returns edge probability 1/1 by default for BB with exactly one successor edge, we don't need to explicitly call setEdgeProbability. This patch introduces almost no functional change, but we do end up reducing debug messages from setEdgeProbability. Differential Revision: https://reviews.llvm.org/D90284	2020-10-27 21:12:54 -07:00
Johannes Doerfert	d13daa4018	[Attributor] Finalize the CGUpdater after each SCC This matches the new PM model.	2020-10-27 22:07:56 -05:00
Johannes Doerfert	50d34958df	[Attributor][NFC] Introduce a debug counter for `AA::manifest` This will simplify debugging and tracking down problems.	2020-10-27 22:07:56 -05:00
Johannes Doerfert	1d57b7f503	[Attributor][NFC] Print the right value in debug output	2020-10-27 22:07:55 -05:00
Johannes Doerfert	1c2531c9e1	[Attributor][FIX] Delete all unreachable static functions Before we used to only mark unreachable static functions as dead if all uses were known dead. Now we optimistically assume uses to be dead until proven otherwise.	2020-10-27 22:07:55 -05:00
Johannes Doerfert	bfe05b1aff	[Attributor][FIX] Do not attach range metadata to the wrong Instruction If we are looking at a call site argument it might be a load or call which is in a different context than the call site argument. We cannot simply use the call site argument range for the call or load. Bug reported and reduced by Whitney Tsang <whitneyt@ca.ibm.com>.	2020-10-27 22:07:55 -05:00
Johannes Doerfert	724fcce109	[Attributor][NFC] Clang-format	2020-10-27 22:07:55 -05:00
Johannes Doerfert	d504f7b91a	[Attributor][NFC] Hoist call out of a lambda The call is not free, unsure if this is needed but it does not make it worse either.	2020-10-27 22:07:54 -05:00
Johannes Doerfert	30e5a1f0be	[Attributor][FIX] Properly check uses in the call not uses of the call In the AANoAlias logic we determine if a pointer may have been captured before a call. We need to look at other uses in the call not uses of the call. The new code is not perfect as it does not allow trivial cases where the call has multiple arguments but it is at least not unsound and a TODO was added.	2020-10-27 22:07:54 -05:00
Johannes Doerfert	cb813ab66a	[Attributor][NFC] Improve time trace output	2020-10-27 22:07:54 -05:00
Kazu Hirata	c91487769d	[JumpThreading] Set edge probabilities when creating basic blocks This patch teaches the jump threading pass to set edge probabilities whenever the pass creates new basic blocks. Without this patch, the compiler sometimes produces non-deterministic results. The non-determinism comes from the jump threading pass using stale edge probabilities in BranchProbabilityInfo. Specifically, when the jump threading pass creates a new basic block, we don't initialize its outgoing edge probability. Edge probabilities are maintained in: DenseMap<Edge, BranchProbability> Probs; in class BranchProbabilityInfo, where Edge is an ordered pair of BasicBlock * and a successor index declared as: using Edge = std::pair<const BasicBlock *, unsigned>; Probs maps edges to their corresponding probabilities. Now, we rarely remove entries from this map, so if we happen to allocate a new basic block at the same address as a previously deleted basic block with an edge probability assigned, the newly created basic block appears to have an edge probability, albeit a stale one. This patch fixes the problem by explicitly setting edge probabilities whenever the jump threading pass creates new basic blocks. Differential Revision: https://reviews.llvm.org/D90106	2020-10-27 16:07:27 -07:00
Joseph Huber	a87d7b3d44	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;". See clang/test/OpenMP/target_map_names.cpp for an example of the generated output for a given map clause. Reviewers: jdoervert Differential Revision: https://reviews.llvm.org/D89802	2020-10-27 16:09:19 -04:00
Nicolai Hähnle	e025d09b21	Revert multiple patches based on "Introduce CfgTraits abstraction" These logically belong together since it's a base commit plus followup fixes to less common build configurations. The patches are: Revert "CfgInterface: rename interface() to getInterface()" This reverts commit `a74fc48158`. Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5" This reverts commit `f2a06875b6`. Revert "Try to make GCC5 happy about the CfgTraits thing" This reverts commit `03a5f7ce12`. Revert "Introduce CfgTraits abstraction" This reverts commit `c0cdd22c72`.	2020-10-27 20:33:30 +01:00
Nicolai Hähnle	ce6900c6cb	Revert "DomTree: Extract (mostly) read-only logic into type-erased base classes" This reverts commit `848a68a032`.	2020-10-27 20:33:29 +01:00
Vedant Kumar	5a3ef55a52	[Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746) This patch changes MergeBlockIntoPredecessor to skip the call to RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to some compile-time issues spotted in LoopUnroll and SimplifyCFG. The call to RemoveRedundantDbgInstrs appears to have changed the worst-case behavior of the merging utility. Loosely speaking, it seems to have gone from O(#phis) to O(#insts). It might not be possible to mitigate this by scanning a block to determine whether there are any debug intrinsics to remove, since such a scan costs O(#insts). So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly little fallout from this, and most of it can be addressed by doing RemoveRedundantDbgInstrs later. The exception is (the block-local version of) SimplifyCFG, where it might just be too expensive to call RemoveRedundantDbgInstrs. Differential Revision: https://reviews.llvm.org/D88928	2020-10-27 10:12:59 -07:00
Raphael Isemann	e038b60d91	Revert "[IndVars] Remove monotonic checks with unknown exit count" This reverts commit `c6ca26c0bf`. This breaks stage2 builds due to hitting this assert: ``` Assertion failed: (WeightSum <= UINT32_MAX && "Expected weights to scale down to 32 bits"), function calcMetadataWeights ``` when compiling AArch64RegisterBankInfo.cpp in LLVM.	2020-10-27 15:31:37 +01:00
Raphael Isemann	a0d84d8031	Revert "[NFC] Factor away lambda's redundant parameter" This reverts commit `fdc845b361`. It seems to be a follow-up to c6372b3fb495 which will be reverted.	2020-10-27 15:30:52 +01:00
Simon Pilgrim	bce770ffa6	Revert rG0905bd5c2fa42bd4c "[InstCombine] collectBitParts - add trunc support." This reverts commit `0905bd5c2f`. Causing failures in multistage buildbots that I need to investigate	2020-10-27 13:43:54 +00:00
Nico Weber	2a4e704c92	Revert "Use uint64_t for branch weights instead of uint32_t" This reverts commit `e5766f25c6`. Makes clang assert when building Chromium, see https://crbug.com/1142813 for a repro.	2020-10-27 09:26:21 -04:00
Simon Pilgrim	0905bd5c2f	[InstCombine] collectBitParts - add trunc support. This should allow us to remove the rather limited matchOrConcat fold and just use recognizeBSwapOrBitReverseIdiom.	2020-10-27 13:14:54 +00:00
Roman Lebedev	0ac56e8eaa	[InstCombine] Fold `(X >>? C1) << C2` patterns to shift+bitmask (PR37872) This is essentially finalizes a revert of rL155136, because nowadays the situation has improved, SCEV can model all these patterns well, and we canonicalize rotate-like patterns into a funnel shift intrinsics in InstCombine. So this should not cause any pessimization. I've verified the canonicalize-{a,l}shr-shl-to-masking.ll transforms with alive, which confirms that we can freely preserve exact-ness, and no-wrap flags. Profs: * base: https://rise4fun.com/Alive/gPQ * exact-ness preservation: https://rise4fun.com/Alive/izi * nuw preservation: https://rise4fun.com/Alive/DmD * nsw preservation: https://rise4fun.com/Alive/SLN6N * nuw nsw preservation: https://rise4fun.com/Alive/Qp7 Refs. https://reviews.llvm.org/D46760	2020-10-27 14:42:53 +03:00
Florian Hahn	f067bc3c0a	[LoopRotation] Allow loop header duplication if vectorization is forced. -Oz normally does not allow loop header duplication so this loop wouldn't be vectorized. However the vectorization pragma should override this and allow for loop rotation. rdar://problem/49281061 Original patch by Adam Nemet. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D59832	2020-10-27 09:28:01 +00:00
Max Kazantsev	fdc845b361	[NFC] Factor away lambda's redundant parameter	2020-10-27 12:56:52 +07:00
Serguei Katkov	b69919b537	[GVN LoadPRE] Add an option to disable splitting backedge GVN Load PRE can split the backedge causing breaking the loop structure where the latch contains the conditional branch with for example induction variable. Different optimizations expect this form of the loop, so it is better to preserve it for some time. This CL adds an option to control an ability to split backedge. Default value is true so technically it is NFC and current behavior is not changed. Reviewers: fedor.sergeev, mkazantsev, nikic, reames, fhahn Reviewed By: mkazasntsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D89854	2020-10-27 11:59:52 +07:00
Max Kazantsev	c6ca26c0bf	[IndVars] Remove monotonic checks with unknown exit count Even if the exact exit count is unknown, we can still prove that this exit will not be taken. If we can prove that the predicate is monotonic, fulfilled on first & last iteration, and no overflow happened in between, then the check can be removed. Differential Revision: https://reviews.llvm.org/D87832 Reviewed By: apilipenko	2020-10-27 11:35:16 +07:00
Arthur Eubanks	42f76e193b	Reland [AlwaysInliner] Pass callee AAResults to InlineFunction() Test copied from noalias-calls.ll with small changes. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89609	2020-10-26 20:40:46 -07:00
Arthur Eubanks	e5766f25c6	Use uint64_t for branch weights instead of uint32_t CallInst::updateProfWeight() creates branch_weights with i64 instead of i32. To be more consistent everywhere and remove lots of casts from uint64_t to uint32_t, use i64 for branch_weights. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D88609	2020-10-26 20:24:04 -07:00
Arthur Eubanks	4af5ba1726	Revert "[AlwaysInliner] Pass callee AAResults to InlineFunction()" This reverts commit `504fbec7a6`. Test failure.	2020-10-26 20:23:38 -07:00
Arthur Eubanks	504fbec7a6	[AlwaysInliner] Pass callee AAResults to InlineFunction() Test copied from noalias-calls.ll with small changes. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89609	2020-10-26 20:10:09 -07:00
Arthur Eubanks	3dd1c72458	Port -objc-arc-expand to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90182	2020-10-26 20:05:10 -07:00
Arthur Eubanks	90c0b0d3d6	Port -objc-arc-apelim to NPM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D90181	2020-10-26 20:01:46 -07:00
Chen Zheng	00e573cadb	[LSR] fix typo in comments and rename for a new added hook.	2020-10-26 22:29:22 -04:00
TaWeiTu	0efbfa38ae	[NPM] Port -slsr to NPM `-separate-const-offset-from-gep` has not yet be ported, so some tests are not updated. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D90149	2020-10-27 09:21:40 +08:00
Sriraman Tallam	ad1b9daa4b	Prepend "__uniq" to symbol names hash with -funique-internal-linkage-names. Prepend the module name hash with a fixed string ".__uniq." which helps tools that consume sampled profiles and attribute it to functions to understand that this symbol belongs to a unique internal linkage type symbol. Symbols with suffixes can result from various optimizations in the compiler. Function Multiversioning, function splitting, parameter constant propogation, unique internal linkage names. External tools like sampled profile aggregators combine profiles from multiple runs of a binary. They use various heuristics with symbols that have suffixes to try and attribute the profile to the right function instance. For instance multi-versioned symbols like foo.avx, foo.sse4.2, etc even though different should be attributed to the same source function if a single function is versioned, using attribute target_clones (supported in GCC but yet to land in LLVM). Similarly, functions that are split (split part having a .cold suffix) could have profiles for both the original and split symbols but would be aggregated and attributed to the original function that was split. Unique internal linkage functions however have different source instances and the aggregator must not put them together but attribute it to the appropriate function instance. To be sure that we are dealing with a symbol of a unique internal linkage function, we would like to prepend the hash with a known string ".__uniq." which these tools can check to understand the suffix type. Differential Revision: https://reviews.llvm.org/D89617	2020-10-26 14:24:28 -07:00
Sanjay Patel	5a6e66ec72	[InstCombine] add folds for icmp+ctpop https://alive2.llvm.org/ce/z/XjFPQJ define void @src(i64 %value) { %t0 = call i64 @llvm.ctpop.i64(i64 %value) %gt = icmp ugt i64 %t0, 63 %lt = icmp ult i64 %t0, 64 call void @use(i1 %gt, i1 %lt) ret void } define void @tgt(i64 %value) { %eq = icmp eq i64 %value, -1 %ne = icmp ne i64 %value, -1 call void @use(i1 %eq, i1 %ne) ret void } declare i64 @llvm.ctpop.i64(i64) #1 declare void @use(i1, i1)	2020-10-26 16:48:56 -04:00
Sanjay Patel	437d7551c5	[InstCombine] reduce code duplication in icmp intrinsic folds; NFC	2020-10-26 16:48:56 -04:00
Stanislav Mekhanoshin	00928a1956	Fix SROA with a PHI mergig values from a same block This fixes the bug 47945. It is legal to have a PHI with values from from the same block, but values must stay the same. In this case it is illegal to merge different values. Differential Revision: https://reviews.llvm.org/D89978	2020-10-26 12:58:27 -07:00
Joe Ellis	0f83505593	[SVE][InstCombine] Fix TypeSize warning in canReplaceGEPIdxWithZero The warning would fire when calling canReplaceGEPIdxWithZero on a GEP whose source element type is a scalable vector. The size of scalable vector types is not known, so this optimization cannot be performed. This patch fixes the issue by: - bailing out early in this routine if the GEP instruction's source element type is a scalable vector. - making use of getFixedSize -- this removes the dependency on the deprecated interface. Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D89968	2020-10-26 17:40:26 +00:00
Joe Ellis	467e5cf40f	[SVE][AArch64] Fix TypeSize warning in loop vectorization legality The warning would fire when calling isDereferenceableAndAlignedInLoop with a scalable load. Calling isDereferenceableAndAlignedInLoop with a scalable load would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes this issue by: - no longer considering vector loads as candidates in canVectorizeWithIfConvert. This doesn't make sense in the context of identifying scalar loads to vectorize. - making use of getFixedSize inside isDereferenceableAndAlignedInLoop -- this removes the dependency on the deprecated interface, and will trigger an assertion error if the function is ever called with a scalable type. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89798	2020-10-26 17:40:04 +00:00
Simon Pilgrim	532f3bec3e	[InstCombine] collectBitParts - add bitreverse intrinsic support.	2020-10-26 14:36:36 +00:00
Simon Pilgrim	6b2eb31e1e	[InstCombine] Add support for zext(and(neg(amt),width-1)) rotate shift amount patterns Alive2: https://alive2.llvm.org/ce/z/bCvvHd	2020-10-26 11:22:41 +00:00
Max Kazantsev	bfabd7878b	Fix broken build after previous commit	2020-10-26 14:55:46 +07:00
Max Kazantsev	cdccc82f48	[NFC] Remove unused funciton param	2020-10-26 14:53:22 +07:00
Max Kazantsev	4b5e848bef	[NFC] Factor out common code into lambda for further improvement	2020-10-26 14:50:45 +07:00
Max Kazantsev	c019099053	[IndVars] Use contextual knowledge when proving trivial conds No exact example where it would help, but it's a generally a more powerful way to prove predicates.	2020-10-26 13:48:32 +07:00
Simon Pilgrim	3052e474ec	[InstCombine] matchBSwapOrBitReversem - recognise or(fshl(),fshl()) bswap patterns. I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).	2020-10-25 10:17:45 +00:00
TaWeiTu	65a36bbc3d	[NPM] Port -loop-versioning-licm to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89371	2020-10-24 21:51:18 +08:00
TaWeiTu	060a4fccf1	[LoopVersioning] Form dedicated exits for versioned loop to preserve simplify form The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form. Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89569	2020-10-24 21:40:46 +08:00
Simon Pilgrim	310f62b4ff	[InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155) As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns. Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types. Differential Revision: https://reviews.llvm.org/D89542	2020-10-24 12:42:43 +01:00
Hongtao Yu	a16cbdd676	[AutoFDO] Remove a broken assert in merging inlinee samples Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way. To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like ``` A() { B(); // with nested profile B1 and C1 B(); // duplicated, with nested profile B1 and C1 B(); // with nested profile B2 and C2 } ``` For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into ``` A() { C(); // with nested profile C1 B(); // duplicated, with nested profile B1 and C1 B(); // with nested profile B2 and C2. } ``` Here is what happens next: 1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite. 2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count. 3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on. 4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off. It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes. The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D90056	2020-10-23 17:42:21 -07:00
Arthur Eubanks	baffd052b0	[StructurizeCFG][NewPM] Port -structurizecfg to NPM This doesn't support -structurizecfg-skip-uniform-regions since that would require porting LegacyDivergenceAnalysis. The NPM doesn't support adding a non-analysis pass as a dependency of another, so I had to add -lowerswitch to some tests or pin them to the legacy PM. This is the only RegionPass in tree, so I simply copied the logic for finding all Regions from the legacy PM's RGManager into StructurizeCFG::run(). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89026	2020-10-23 15:54:03 -07:00
Arthur Eubanks	ba22c403b2	[Inliner][NPM] Properly pass callee AAResults Fixes noalias-calls.ll under NPM. Differential Revision: https://reviews.llvm.org/D89592	2020-10-23 15:37:18 -07:00
Artur Pilipenko	6ec2c5e402	GC-parseable element atomic memcpy/memmove This change introduces a GC parseable lowering for element atomic memcpy/memmove intrinsics. This way runtime can provide an implementation which can take a safepoint during copy operation. See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev for the background and details: https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ Differential Revision: https://reviews.llvm.org/D88861	2020-10-23 14:06:09 -07:00
Nick Desaulniers	b7926ce6d7	[IR] add fn attr for no_stack_protector; prevent inlining on mismatch It's currently ambiguous in IR whether the source language explicitly did not want a stack a stack protector (in C, via function attribute no_stack_protector) or doesn't care for any given function. It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an __attribute__((__no_stack_protector__)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u Typically, when inlining a callee into a caller, the caller will be upgraded in its level of stack protection (see adjustCallerSSPLevel()). By adding an explicit attribute in the IR when the function attribute is used in the source language, we can now identify such cases and prevent inlining. Block inlining when the callee and caller differ in the case that one contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`. Fixes pr/47479. Reviewed By: void Differential Revision: https://reviews.llvm.org/D87956	2020-10-23 11:55:39 -07:00
Chen Zheng	1e0b6c1df0	[LSR] ignore profitable chain when reg num is not major cost. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D89665	2020-10-23 09:35:48 -04:00
Simon Pilgrim	1cab3bf004	[InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matching flags. matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.	2020-10-23 12:35:28 +01:00
Simon Pilgrim	19a13bf538	[InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitReverse. NFCI. This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.	2020-10-23 12:35:27 +01:00
OCHyams	fea067bdfd	[mem2reg] Remove dbg.values describing contents of dead allocas This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The motivation and rationale are exactly the same: When mem2reg removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been silently dropped instead of being made `undef`, so we're just returning to previous behaviour with these patches. Testing: `llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added 3 tests, each of which covers a dbg.value deletion path in mem2reg: mem2reg-promote-alloca-1.ll mem2reg-promote-alloca-2.ll mem2reg-promote-alloca-3.ll The first is based on the dexter test inlining.c from D89543. This patch also improves the debugging experience for loop.c from D89543, which suffers similarly after arg promotion instead of inlining.	2020-10-23 04:46:56 +00:00
Caroline Concatto	2415636475	[SVE]Clarify TypeSize comparisons in llvm/lib/Transforms Use isKnownXY comparators when one of the operands can be with scalable vectors or getFixedSize() for all the other cases. This patch also does bug fixes for getPrimitiveSizeInBits by using getFixedSize() near the places with the TypeSize comparison. Differential Revision: https://reviews.llvm.org/D89703	2020-10-23 09:15:17 +01:00
Max Kazantsev	6e574abf61	[SCEV][NFC] Cache symbolic max exit count We want to have a caching version of symbolic BE exit count rather than recompute it every time we need it. Differential Revision: https://reviews.llvm.org/D89954 Reviewed By: nikic, efriedma	2020-10-23 12:29:37 +07:00
Arthur Eubanks	0291e2c933	[Inliner] Run always-inliner in inliner-wrapper An alwaysinline function may not get inlined in inliner-wrapper due to the inlining order. Previously for the following, the inliner would first inline @a() into @b(), ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @a() br label %for.cond } ``` making @b() recursive and unable to be inlined into @a(), ending at ``` define void @a() { entry: call void @b() ret void } define void @b() alwaysinline { entry: br label %for.cond for.cond: call void @b() br label %for.cond } ``` Running always-inliner first makes sure that we respect alwaysinline in more cases. Fixes https://bugs.llvm.org/show_bug.cgi?id=46945. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D86988	2020-10-22 19:16:25 -07:00
Vedant Kumar	099bffe7f7	Revert "[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)" This reverts commit `26ee8aff2b`. It's necessary to insert bitcast the pointer operand of a lifetime marker if it has an opaque pointer type. rdar://70560161	2020-10-22 12:25:50 -07:00
Arthur Eubanks	92d9a3868a	Port -instnamer to NPM Some clang tests use this. Reviewed By: akhuang Differential Revision: https://reviews.llvm.org/D89931	2020-10-22 12:08:36 -07:00
Layton Kifer	d49911c282	[InstCombine][NFC] Use ConstantExpr::getBinOpIdentity Delete duplicate implementation getSelectFoldableConstant and replace with ConstantExpr::getBinOpIdentity. Differential Revision: https://reviews.llvm.org/D89839	2020-10-22 20:44:57 +02:00
Nikita Popov	3e37543111	[MemCpyOpt] Move GEP during call slot optimization When performing a call slot optimization to a GEP destination, it will currently usually fail, because the GEP is directly before the memcpy and as such does not dominate the call. We should move it above the call if that satisfies the domination requirement. I think that a constant-index GEP is the only useful thing to move here, as otherwise isDereferenceablePointer couldn't look through it anyway. As such I'm not trying to generalize this further. Differential Revision: https://reviews.llvm.org/D89623	2020-10-22 20:40:56 +02:00
Ettore Tiotto	e6521ce064	[NFC][PartialInliner]: Clean up code Make member function const where possible, use LLVM_DEBUG to print debug traces rather than a custom option, pass by reference to avoid null checking, ... Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D89895	2020-10-22 14:40:15 -04:00
Vedant Kumar	3419252a79	[InstCombine] Remove dbg.values describing contents of dead allocas When InstCombine removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. This effectively undoes work performed in an InstCombine run earlier in the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values before CallInst users of an alloca. The motivating example looks like: ``` define void @foo(i32 %0) { %a = alloca i32 ; This alloca is erased. store i32 %0, i32* %a dbg.value(i32 %0, "arg0") ; This dbg.value survives. dbg.value(i32* %a, "arg0", DW_OP_deref) call void @trivially_inlinable_no_op(i32* %a) ret void } ``` If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef) after inlining, making "arg0" unavailable. But we already have dbg.value descriptions of the alloca's value (from LowerDbgDeclare), so the DW_OP_deref dbg.value cannot serve its purpose of describing an initialization of the alloca by some callee. It invalidates other useful dbg.values, causing large gaps in location coverage, so we should delete it (even though doing so may cause stale dbg.values to appear, if there's a dead store to `%a` in @trivially_inlinable_no_op). OTOH, it wouldn't be correct to delete all dbg.value descriptions of an alloca. Note that it's possible to describe a variable that takes on different pointer values, e.g.: ``` void use(int ); void t(int a, int b) { int local = &a; // dbg.value(i32* %a.addr, "local") local = &b; // dbg.value(i32* undef, "local") use(&a); // (note: %b.addr is optimized out) local = &a; // dbg.value(i32* %a.addr, "local") } ``` In this example, the alloca for "b" is erased, but we need to describe the value of "local" as <unavailable> before the call to "use". This prevents "local" from appearing to be equal to "&a" at the callsite. rdar://66592859 Differential Revision: https://reviews.llvm.org/D85555	2020-10-22 10:00:13 -07:00
Serguei Katkov	75d0e0cd5f	[IRCE] consolidate profitability check Use BFI if it is available and BPI otherwise. This is a promised follow-up after D89541. Reviewers: ebrevnov, mkazantsev Reviewed By: ebrevnov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D89773	2020-10-22 11:26:45 +07:00
Zequan Wu	2f29341114	Revert "Revert "SimplifyCFG: Clean up optforfuzzing implementation"" This reverts commit `716f7636e1`.	2020-10-21 17:08:56 -07:00
Zequan Wu	716f7636e1	Revert "SimplifyCFG: Clean up optforfuzzing implementation" See discussion: https://reviews.llvm.org/D89590 This reverts commit `cdd006eec9`.	2020-10-21 16:56:32 -07:00
Arthur Eubanks	8d9466a385	[BlockExtract][NewPM] Port -extract-blocks to NPM Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89015	2020-10-21 12:51:11 -07:00
Arthur Eubanks	aa6c305344	[LowerMatrixIntrinsics][NewPM] Fix PreservedAnalyses result PreservedCFGCheckerInstrumentation was saying that LowerMatrixIntrinsics didn't properly preserve CFG even though it claimed to. The legacy pass says it doesn't. Match the legacy pass's preserved analyses. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D89175	2020-10-21 12:42:16 -07:00
Artur Pilipenko	e8cce5ad89	[RS4GC] NFC. Preparatory refactoring to make GC parseable memcpy For GC parseable element atomic memcpy/memmove we'll need to shuffle statepoint arguments. Make it possible by storing the arguments as Value , not Use .	2020-10-21 12:38:20 -07:00
Simon Pilgrim	7b4a828452	[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI.	2020-10-21 11:53:45 +01:00
Florian Hahn	88241ffb56	[Passes] Move ADCE before DSE & LICM. The adjustment seems to have very little impact on optimizations. The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is in consumer-typeset and the size there actually decreases by -0.1%, with not significant changes in the stats. On its own, it is mildly positive in terms of compile-time, most likely due to LICM & DSE having to process slightly less instructions. It should also be unlikely that DSE/LICM make much new code dead. http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions With DSE & MemorySSA, it gives some nice compile-time improvements, due to the fact that DSE can re-use the PDT from ADCE, if it does not make any changes: http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D87322	2020-10-21 10:30:56 +01:00
Martin Storsjö	4de215ff18	Revert "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support" Also revert "[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI." to make the primarily intended revert work. This reverts commits `ce13549761` and `e372a5f86f`. This commit caused failed asserts e.g. like this: $ cat repro.cpp bool a(char b) { return b >= '0' && b <= '9' \|\| (b \| 32) >= 'a' && (b \| 32) <= 'z'; $ clang++ -target x86_64-linux-gnu -c -O2 repro.cpp clang++: ../include/llvm/ADT/APInt.h:1151: bool llvm::APInt::operator==(const llvm::APInt&) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.	2020-10-21 09:47:18 +03:00
Geoffrey Martin-Noble	c17ae2916c	Remove unnecessary header include which violates layering This was introduced in https://reviews.llvm.org/D89774, but I don't think it should be necessary. Reviewed By: TaWeiTu, aeubanks Differential Revision: https://reviews.llvm.org/D89843	2020-10-20 20:14:03 -07:00
Nicolai Hähnle	848a68a032	DomTree: Extract (mostly) read-only logic into type-erased base classes Avoid having to instantiate and compile a subset of the dominator tree logic separately for each node type. More importantly, this allows generic algorithms to be built on top of dominator trees without writing them as templates -- such algorithms can now use opaque CfgBlockRef and CfgInterface instead. A type-erased implementation of dominator trees could be written in terms of CfgInterface as well, but doing so would change the current trade-off: it would slightly reduce code size at the cost of a slight runtime overhead. This patch does not change the trade-off, as it only does type-erasure where basic blocks can be treated in a fully opaque way, i.e. it only moves methods that don't require iteration over CFG successors and predecessors. v5: - rename generic_{begin,end,children} back without the generic_ prefix and refer explictly to base class methods in NewGVN, which wants to mutate the order of dominator tree node children directly v6: - style change: iDom -> idom; it's arguable whether this is really invalid, since it is actually standard camelCase, but clang-tidy complains about it so... shrug - rename {to,from}Generic -> {wrap,unwrap}Ref Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672 Differential Revision: https://reviews.llvm.org/D83089	2020-10-20 19:53:07 +02:00
Ta-Wei Tu	529ecd19df	[NPM] port -unify-loop-exits to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89774	2020-10-20 10:46:57 -07:00
Ta-Wei Tu	59286b36df	[NPM] Port -mergereturn to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89781	2020-10-20 10:33:58 -07:00
Florian Hahn	2e58010208	[DSE] Do not scan users of memory terminators for further reads. isMemTerminator checks if the current def is a memory terminator that terminates the memory pointed to by DefLoc. We do not have to add any of their users to the worklist, because the follow-on users cannot read the memory in question. This leads to more stores eliminated in the presence of lifetime calls. Previously we added the users of those intrinsics to the worklist, limiting elimination. In terms of removed stores, this gives a nice boost on some benchmarks (MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3): Same hash: 205 (filtered out) Remaining: 32 Metric: dse.NumFastStores Program base patch diff test-suite...000/197.parser/197.parser.test 4.00 8.00 100.0% test-suite...rolangs-C++/family/family.test 4.00 7.00 75.0% test-suite...marks/7zip/7zip-benchmark.test 1722.00 2189.00 27.1% test-suite...CFP2000/177.mesa/177.mesa.test 30.00 38.00 26.7% test-suite :: External/Nurbs/nurbs.test 44.00 49.00 11.4% test-suite...lications/sqlite3/sqlite3.test 115.00 128.00 11.3% test-suite...006/447.dealII/447.dealII.test 2715.00 3013.00 11.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 237.00 261.00 10.1% test-suite...tions/lambda-0.1.3/lambda.test 40.00 44.00 10.0% test-suite...3.xalancbmk/483.xalancbmk.test 1366.00 1475.00 8.0% test-suite...abench/jpeg/jpeg-6a/cjpeg.test 13.00 14.00 7.7% test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 46.00 7.0% test-suite...lications/ClamAV/clamscan.test 230.00 246.00 7.0% test-suite...006/450.soplex/450.soplex.test 284.00 299.00 5.3% test-suite...nsumer-jpeg/consumer-jpeg.test 21.00 22.00 4.8%	2020-10-20 16:55:22 +01:00
Simon Pilgrim	ec228fbfc0	[InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI.	2020-10-20 16:45:16 +01:00
Simon Pilgrim	ce13549761	[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI.	2020-10-20 16:26:41 +01:00
Florian Hahn	6439fde6d4	[DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem. This change should currently not have any impact, but guard against further inconsistencies between MemoryLocation and function attributes.	2020-10-20 14:37:53 +01:00
Simon Pilgrim	e372a5f86f	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types	2020-10-20 14:14:26 +01:00
Nicolai Hähnle	c0cdd22c72	Introduce CfgTraits abstraction The CfgTraits abstraction simplfies writing algorithms that are generic over the type of CFG, and enables writing such algorithms as regular non-template code that operates on opaque references to CFG blocks and values. Implementations of CfgTraits provide operations on the concrete CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock `. CfgInterface is an abstract base class which provides operations on opaque types CfgBlockRef and CfgValueRef. Those opaque types encapsulate a `void `, but the meaning depends on the concrete CFG type. For example, MachineCfgTraits -- for use with MachineIR in SSA form -- encodes a Register inside CfgValueRef. Converting between concrete references and opaque/generic ones is done by CfgTraits::{fromGeneric,toGeneric}. Convenience methods CfgTraits::{un}wrap{Iterator,Range} are available as well. Writing algorithms in terms of CfgInterface adds some overhead (virtual method calls, plus in same cases it removes the opportunity to inline iterators), but can be much more convenient since generic algorithms can be written as non-templates. This patch adds implementations of CfgTraits for all CFGs on which dominator trees are calculated, so that the dominator tree can be ported to this machinery. Only IrCfgTraits (LLVM IR) and MachineCfgTraits (Machine IR in SSA form) are complete, the other implementations are limited to the absolute minimum required to make the upcoming dominator tree changes work. v5: - fix MachineCfgTraits::blockdef_iterator and allow it to iterate over the instructions in a bundle - use MachineBasicBlock::printName v6: - implement predecessors/successors for all CfgTraits implementations - fix error in unwrapRange - rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming that is consistent with {wrap,unwrap}{Iterator,Range} - use getVRegDef instead of getUniqueVRegDef v7: - std::forward fix in wrapping_iterator - fix typos v8: - cleanup operators on CfgOpaqueType - address other review comments Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d Differential Revision: https://reviews.llvm.org/D83088	2020-10-20 13:50:52 +02:00
Simon Pilgrim	e346ea9905	[InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI.	2020-10-20 12:13:08 +01:00
Atmn Patel	595c615606	[IR] Adds mustprogress as a LLVM IR attribute This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85393	2020-10-20 03:09:57 -04:00
Serguei Katkov	38799975ce	[IRCE] Do not transform if loop has small number of iterations IRCE has some overhead for runtime checks and in case number of iteration is small the overhead can kill the benefit from optimizations. This CL bases on BlockFrequencyInfo of pre-header and header to estimate the number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop. Probably it is better to make more complex cost model but for simplicity it seems the be enough. The usage of BFI is added only for new pass manager and tries to use it efficiently. Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev Reviewed By: mkazantsev Subscribers: llvm-commits, fhahn Differential Revision: https://reviews.llvm.org/D89541	2020-10-20 10:33:59 +07:00
Jordan Rupprecht	8a377f1e3c	[NFC] Inline assertion-only variable	2020-10-19 15:11:37 -07:00
Roman Lebedev	e0567582b8	[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer The main tricky thing here is forward-declaring the enum: we have to specify it's underlying data type. In particular, this avoids the danger of switching over the SCEVTypes, but actually switching over an integer, and not being notified when some case is not handled. I have updated most of such switches to be exaustive and not have a default case, where it's pretty obvious to be the intent, however not all of them.	2020-10-20 00:10:22 +03:00
Roman Lebedev	3355284b2d	[NFC][SCEVExpander] isHighCostExpansionHelper(): rewrite as a switch If we switch over an enum, compiler can easily issue a diagnostic if some case is not handled. However with an if cascade that isn't so. Experimental evidence suggests new behavior to be superior.	2020-10-20 00:10:22 +03:00
Simon Pilgrim	adb52e5f9e	[InstCombine] foldOrOfICmps - only fold (icmp_eq B, 0) \| (icmp_ult/gt A, B) for integer types Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......	2020-10-19 17:05:38 +01:00
Simon Pilgrim	482e6f0041	Revert rGa704d8238c86bac: "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support" This reverts commit `a704d8238c`. Causing stage2 build failures on some bots.	2020-10-19 16:03:36 +01:00
Simon Pilgrim	de885f1b2a	[InstCombine] Add (icmp ne A, 0) \| (icmp ne B, 0) --> (icmp ne (A\|B), 0) vector support Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.	2020-10-19 15:41:21 +01:00
Simon Pilgrim	ecd25086d1	[InstCombine] Add (icmp eq B, 0) \| (icmp ult/gt A, B) -> (icmp ule A, B-1) vector support	2020-10-19 15:23:48 +01:00
Simon Pilgrim	a704d8238c	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support	2020-10-19 14:55:18 +01:00
Simon Pilgrim	1d90e53044	[InstCombine] foldOrOfICmps - pull out repeated getOperand() calls. NFCI.	2020-10-19 14:28:08 +01:00
Hans Wennborg	0628bea513	Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" This broke Chromium's PGO build, it seems because hot-cold-splitting got turned on unintentionally. See comment on the code review for repro etc. > This patch adds -f[no-]split-cold-code CC1 options to clang. This allows > the splitting pass to be toggled on/off. The current method of passing > `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose > correctly (say, with `-O0` or `-Oz`). > > To implement the -fsplit-cold-code option, an attribute is applied to > functions to indicate that they may be considered for splitting. This > removes some complexity from the old/new PM pipeline builders, and > behaves as expected when LTO is enabled. > > Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> > Differential Revision: https://reviews.llvm.org/D57265 > Reviewed By: Aditya Kumar, Vedant Kumar > Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar This reverts commit `273c299d5d`.	2020-10-19 12:31:14 +02:00
Simon Pilgrim	0b7b446a40	[InstCombine] Support vectors-with-undef in and(logicalshift(1,X),1) --> zext(X == 0) fold	2020-10-19 11:10:32 +01:00
Roman Lebedev	d083d55c2c	[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr All existing SCEV cast types operate on integers. D89456 will add SCEVPtrToIntExpr cast expression type. I believe this is best for consistency. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89455	2020-10-19 10:59:53 +03:00
Florian Hahn	f5cf7f544b	[DSE] Do not consider 'noop' intrinsics as read-clobbers. isNoopIntrinsic returns true for some intrinsics that are modeled in MemorySSA but do not actually read or write any memory and do not block DSE. Such intrinsics should not be considered as read-clobbers.	2020-10-18 15:51:05 +01:00
Dávid Bolvanský	65e94cc946	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-18 01:33:26 +02:00
Dávid Bolvanský	2a75e956e5	Revert "[InferAttrs] Add argmemonly attribute to string libcalls" This reverts commit `b77dd32a6f`. Sanitizer tests are broken.	2020-10-17 23:29:02 +02:00
Dávid Bolvanský	b77dd32a6f	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-17 22:42:36 +02:00

... 15 16 17 18 19 ...

27111 Commits