llvm-project

Commit Graph

Author	SHA1	Message	Date
Dylan Fleming	7215dcfe36	[SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths Depends on D104239 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105341	2021-07-07 15:30:10 +01:00
Arnold Schwaighofer	033de11150	[coro async] Move code to proper switch While upstreaming patches this code somehow was applied to the wrong switch statement. Differential Revision: https://reviews.llvm.org/D105504	2021-07-07 06:19:08 -07:00
Max Kazantsev	19885c7adf	[NFC] Remove duplicate function calls Removed repeated call of L->getHeader(). Now using previously stored return value. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D105535 Reviewed By: mkazantsev	2021-07-07 17:02:36 +07:00
Dylan Fleming	7586b47fb6	[SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104239	2021-07-07 09:58:05 +01:00
Johannes Doerfert	168a9234d7	[Attributor][FIX] Replace uses first, then values Before we replaced value by registering all their uses. However, as we replace a value old uses become stale. We now replace values explicitly and keep track of "new values" when doing so to avoid replacing only uses in stale/old values but not their replacements.	2021-07-06 22:43:51 -05:00
Johannes Doerfert	aa3768278d	[Attributor] Introduce a helper function to deal with undef + none We often need to deal with the value lattice that contains none and undef as special values. A simple helper makes this much nicer. Differential Revision: https://reviews.llvm.org/D103857	2021-07-06 22:41:21 -05:00
Johannes Doerfert	fc82409b5c	[Attributor] Simplify operands inside of simplification AAs first When we do simplification via AAPotentialValues or AAValueConstantRange we need to simplify the operands of an instruction we deconstruct first. This does not only improve the result, see for example range.ll, but is required as we allow outside AAs to provide simplification rules via callbacks. If we do ignore the simplification rules and base other simplifications on the IR instead we can create an inconsistent state.	2021-07-06 22:41:18 -05:00
Eli Friedman	7ac1c7bead	Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Recommitting with fix to MemoryDepChecker::isDependent. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 12:16:05 -07:00
Eli Friedman	a6d081b2cb	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit `74d6ce5d5f`. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Philip Reames	9ffa90d6c2	[LV] Disable epilogue vectorization for non-latch exits When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed. Better late (4 months), than never right? I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error. Thus, let's play it safe for now.	2021-07-06 10:57:10 -07:00
Philip Reames	600624a103	[LoopVersion] Move an assert [nfc-ish]	2021-07-06 10:57:10 -07:00
Eli Friedman	74d6ce5d5f	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Arnold Schwaighofer	846a530e7d	Fix coro lowering of single predecessor phis Code assumes that uses of single predecessor phis are not live accross suspend points. Cleanup any single predecessor phis preceeding the code making this assumption. rdar://76020301 Differential Revision: https://reviews.llvm.org/D105488	2021-07-06 10:22:25 -07:00
Alexey Bataev	4e1a0684f1	[SLP]Fix non-determinism in PHI sorting. Compare type IDs and DFS numbering for basic block instead of addresses to fix non-determinism. Differential Revision: https://reviews.llvm.org/D105031	2021-07-06 08:45:45 -07:00
Arnold Schwaighofer	130ea3ceb4	Use swift mangling for resume functions The resume partial functions generated for swift suspend points will now use a Swift mangling suffix. Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_") and suspend resume partial functions will use the suffix 'TY'[0-9]+'_' (e.g "...TY1_"). Reviewed By: nate_chandler Differential Revision: https://reviews.llvm.org/D104144	2021-07-06 08:27:46 -07:00
Florian Hahn	ef0d147cdc	Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups. This reverts commit `706bbfb35b`. The committed version moves the definition of VPReductionPHIRecipe out of an ifdef only intended for ::print helpers. This should resolve the build failures that caused the revert	2021-07-06 14:15:42 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Florian Hahn	706bbfb35b	Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups This reverts commit `3fed6d443f`, `bbcbf21ae6` and `6c3451cd76`. The changes causing build failures with certain configurations, e.g. https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction, llvm::ArrayRef<llvm::VPValue>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]': LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe' collect2: error: ld returned 1 exit status	2021-07-06 12:10:03 +01:00
Florian Hahn	3fed6d443f	[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual. VPReductionRecipe overrides those implementations. Mark them as virtual in the VPWidenPHIRecipe to unbreak build in certain configurations.	2021-07-06 12:00:41 +01:00
Florian Hahn	bbcbf21ae6	[VPlan] Add destructor to VPReductionRecipe to unbreak build. Attempt to unbreak https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio	2021-07-06 11:41:20 +01:00
Florian Hahn	6c3451cd76	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Kerry McLaughlin	17b701c43c	[LV] Collect a list of all element types found in the loop (NFC) Splits `getSmallestAndWidestTypes` into two functions, one of which now collects a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not have to iterate over all instructions in the loop again in other places, such as in D102253 which disables scalable vectorization of a loop if any of the instructions use invalid types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105437	2021-07-06 10:37:41 +01:00
Akira Hatanaka	28fe9afdba	[ObjC][ARC] Prevent moving objc_retain calls past objc_release calls that release the retained object This patch fixes what looks like a longstanding bug in ARC optimizer where it reverses the order of objc_retain calls and objc_release calls that retain and release the same object. The code in ARC optimizer that is responsible for code motion takes the following steps: 1. Traverse the CFG bottom-up and determine how far up objc_release calls can be moved. Determine the insertion points for the objc_release calls, but don't actually move them. 2. Traverse the CFG top-down and determine how far down objc_retain calls can be moved. Determine the insertion points for the objc_retain calls, but don't actually move them. 3. Try to move the objc_retain and objc_release calls if they can't be removed. The problem is that the insertion points for the objc_retain calls are determined in step 2 without taking into consideration the insertion points for objc_release calls determined in step 1, so the order of an objc_retain call and an objc_release call can be reversed, which is incorrect, even though each step is correct in isolation. To fix this bug, this patch teaches the top-down traversal step to take into consideration the insertion points for objc_release calls determined in the bottom-up traversal step. Code motion for an objc_retain call is disabled if there is a possibility that it can be moved past an objc_release call that releases the retained object. rdar://79292791 Differential Revision: https://reviews.llvm.org/D104953	2021-07-05 12:16:15 -07:00
Sanjay Patel	40b752d28d	[InstCombine] fold icmp slt/sgt of offset value with constant This follows up patches for the unsigned siblings: `0c400e8953` `c7b658aeb5` We are translating an offset signed compare to its unsigned equivalent when one end of the range is at the limit (zero or unsigned max). (X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1) (X + C2) <s C --> X >u (C ^ SMAX) (if C == C2) This probably does not show up much in IR derived from C/C++ source because that would likely have 'nsw', and we have folds for that already. As with the previous unsigned transforms, the folds could be generalized to handle non-constant patterns: https://alive2.llvm.org/ce/z/Y8Xrrm ; sgt define i1 @src(i8 %a, i8 %c) { %c2 = add i8 %c, 1 %t = add i8 %a, %c2 %ov = icmp sgt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_off = sub i8 127, %c ; SMAX %ov = icmp ult i8 %a, %c_off ret i1 %ov } https://alive2.llvm.org/ce/z/c8uhnk ; slt define i1 @src(i8 %a, i8 %c) { %t = add i8 %a, %c %ov = icmp slt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_offnot = xor i8 %c, 127 ; SMAX %ov = icmp ugt i8 %a, %c_offnot ret i1 %ov }	2021-07-05 10:08:31 -04:00
Caroline Concatto	b868a2d2c6	[SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector. The function vectorizeChainsInBlock does not support scalable vector, because function like canReuseExtract and isCommutative in the code path assert with scalable vectors. This patch avoids vectorizing blocks that have extract instructions with scalable vector.. Differential Revision: https://reviews.llvm.org/D104809	2021-07-05 12:43:41 +01:00
Stephen Tozer	14b62f7e2f	[DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops This patch fixes an issue which occurred in CodeGenPrepare and HWAddressSanitizer, which both at some point create a map of Old->New instructions and update dbg.value uses of these. They did this by iterating over the dbg.value's location operands, and if an instance of the old instruction was found, replaceVariableLocationOp would be called on that dbg.value. This would cause an error if the same operand appeared multiple times as a location operand, as the first call to replaceVariableLocationOp would update all uses of the old instruction, invalidating the old iterator and eventually hitting an assertion. This has been fixed by no longer iterating over the dbg.value's location operands directly, but by first collecting them into a set and then iterating over that, ensuring that we never attempt to replace a duplicated operand multiple times. Differential Revision: https://reviews.llvm.org/D105129	2021-07-05 10:35:19 +01:00
Nikita Popov	a213f735d8	[IR] Deprecate GetElementPtrInst::CreateInBounds without element type This API is not compatible with opaque pointers, the method accepting an explicit pointer element type should be used instead. Thankfully there were few in-tree users. The BPF case still ends up using the pointer element type for now and needs something like D105407 to avoid doing so.	2021-07-04 16:49:30 +02:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Roman Lebedev	fc150cecd7	[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable This replaces the current ad-hoc implementation, by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`, with one exception that here in SimplifyCFG we are allowed to remove EH instructions. Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return), arithmetic/logic operations, etc. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105374	2021-07-03 10:45:44 +03:00
Fangrui Song	252a1eecc0	[ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for isDeclarationForLinker `GlobalValue`s. It missed a case for imported declarations (`doImportAsDefinition` is false while `isPerformingImport` is true). This can lead to a linker error for a default visibility symbol in `ld.lld -shared`. When `ClearDSOLocalOnDeclarations` is true, we check `isPerformingImport() && !doImportAsDefinition(&GV)` along with `GV.isDeclarationForLinker()`. The new condition checks an imported declaration. This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104986	2021-07-02 17:08:25 -07:00
Roman Lebedev	53fef0b293	[NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs Mimics similar change for InstCombine: `ce192ced2b` / D104602 All these uses are in blocks that aren't reachable from function's entry, and said blocks are removed by SimplifyCFG itself, so we can't really test this change.	2021-07-02 22:11:52 +03:00
Heejin Ahn	51fecd17bb	[InstCombine] Don't combine PHI before catchswitch This tries to bail out if the PHI is in a `catchswitch` BB in InstCombine. A PHI cannot be combined into a non-PHI instruction if it is in a `catchswitch` BB, because `catchswitch` BB cannot have any non-PHI instruction other than `catchswitch` itself. The given test case started crashing after D98058. Reviewed By: lebedev.ri, rnk Differential Revision: https://reviews.llvm.org/D105309	2021-07-02 12:10:24 -07:00
Roman Lebedev	da81ec6158	[SimplifyCFG] Volatile memory operations do not trap Somewhat related to D105338. While it is up for discussion whether or not volatile store traps, so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap. And even if simplifycfg currently concervatively believes that to be the case, instcombine does not: https://godbolt.org/z/5vhv4K5b8 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105343	2021-07-02 21:47:44 +03:00
Alexey Bataev	7f7e4aed21	[SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by V.Dmitriev. Reduces number of arguments	2021-07-02 10:30:13 -07:00
Jon Roelofs	37b6e03c18	[Intrinsics] Make MemCpyInlineInst a MemCpyInst This opens up more optimization opportunities in passes that already handle MemCpyInst's. Differential revision: https://reviews.llvm.org/D105247	2021-07-02 10:25:24 -07:00
Roman Lebedev	13e35ac124	[NFC][InstCombine] visitUnreachableInst(): enhance comments somewhat	2021-07-02 17:30:01 +03:00
Roman Lebedev	dadedc99e9	[InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. This is a recommit of `d9d65527c2`, which i immediately reverted because i have messed up something during branch switch, and `597ccc92ce` accidentally ended up being pushed, which was very much not the intention. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:20:21 +03:00
Roman Lebedev	24d271bb18	Revert "https://godbolt.org/z/5vhv4K5b8 " This reverts commit `597ccc92ce`.	2021-07-02 17:17:55 +03:00
Roman Lebedev	93a1642763	Revert "[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable" This reverts commit `d9d65527c2`.	2021-07-02 17:17:47 +03:00
Roman Lebedev	d9d65527c2	[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:17:03 +03:00
Roman Lebedev	597ccc92ce	https://godbolt.org/z/5vhv4K5b8	2021-07-02 17:16:19 +03:00
Nico Weber	a92964779c	Revert "[InstrProfiling] Use external weak reference for bias variable" This reverts commit `33a7b4d9d8`. Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176	2021-07-02 09:05:12 -04:00
Florian Hahn	a3ca578eb9	[Matrix] Fix crash during fusion if the same load is re-used. This patch fixes a crash when the same load is used for both operands of a fuseable multiply.	2021-07-02 14:00:17 +01:00
Alexey Bataev	28ac873bcb	[SLP]Fix gathering of the scalars by not ignoring UndefValues. The compiler should not ignore UndefValue when gathering the scalars, otherwise the resulting code may be less defined than the original one. Also, grouped scalars to insert them at first to reduce the analysis in further passes. Differential Revision: https://reviews.llvm.org/D105275	2021-07-02 04:46:48 -07:00
Florian Hahn	7655061cc6	[Matrix] Hoist address computation before multiply to enable fusion. If the store address does not dominate the matrix multiply, try to hoist address computation instructions without side-effects and/or memory reads before the multiply, to allow fusion. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D105193	2021-07-02 09:52:11 +01:00
Evgeniy Brevnov	9568811cb8	[NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D101044	2021-07-02 10:00:47 +07:00
Craig Topper	066524ea54	[ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0. Previously we used the vector type, but we're loading/storing invididual elements so I think only element alignment should matter. Noticed while looking at the code for something else so I don't have a test case. Differential Revision: https://reviews.llvm.org/D105220	2021-07-01 19:08:47 -07:00
Petr Hosek	33a7b4d9d8	[InstrProfiling] Use external weak reference for bias variable We need the compiler generated variable to override the weak symbol of the same name inside the profile runtime, but using LinkOnceODRLinkage results in weak symbol being emitted which leads to an issue where the linker might choose either of the weak symbols potentially disabling the runtime counter relocation. This change replaces the use of weak definition inside the runtime with an external weak reference to address the issue. We also place the compiler generated symbol inside a COMDAT group so dead definition can be garbage collected by the linker. Differential Revision: https://reviews.llvm.org/D105176	2021-07-01 15:25:31 -07:00
Philip Reames	955f125899	[instcombine] Fold overflow check using overflow intrinsic to comparison This follows up to D104665 (which added umulo handling alongside the existing uaddo case), and generalizes for the remaining overflow intrinsics. I went to add analogous handling to LVI, and discovered that LVI already had a more general implementation. Instead, we can port was LVI does to instcombine. (For context, LVI uses makeExactNoWrapRegion to constrain the value 'x' in blocks reached after a branch on the condition `op.with.overflow(x, C).overflow`.) Differential Revision: https://reviews.llvm.org/D104932	2021-07-01 09:41:55 -07:00
Arnold Schwaighofer	4a361f5209	[coro async] Add support for specifying which parameter is swiftself in async resume functions Differential Revision: https://reviews.llvm.org/D104147	2021-07-01 07:33:15 -07:00
David Sherwood	51b4ab26ca	[NFC] Add new setDebugLocFromInst that uses the class Builder by default In lots of places we were calling setDebugLocFromInst and passing in the same Builder member variable found in InnerLoopVectorizer. I personally found this confusing so I've changed the interface to take an Optional<IRBuilder<> *> and we can now pass in None when we want to use the class member variable. Differential Revision: https://reviews.llvm.org/D105100	2021-07-01 14:23:34 +01:00
Roman Lebedev	333d3a3cdf	[NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO' I.e. it will be `false` for thin lto.	2021-07-01 10:09:24 +03:00
Chuanqi Xu	51fbd18706	[Coroutine] Recommit Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-07-01 11:01:28 +08:00
Sanjay Patel	0c400e8953	[InstCombine] fold icmp ult of offset value with constant This is one sibling of the fold added with `c7b658aeb5` . (X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN) I'm still not sure how to describe it best, but we're translating 2 constants from an unsigned range comparison to signed because that eliminates the offset (add) op. This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/K-fMBf define i1 @src(i8 %a, i8 %c2) { %t = add i8 %a, %c2 %c = add i8 %c2, 128 ; SMIN %ov = icmp ult i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c2) { %not_c2 = xor i8 %c2, -1 %ov = icmp sgt i8 %a, %not_c2 ret i1 %ov }	2021-06-30 19:00:12 -04:00
Xun Li	822b92aae4	[Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html In the existing design, An SCC that contains a coroutine will go through the folloing passes: Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat. As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended. What we really wanted is this: Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline (note that we don't really need to run Inliner again on the ramp function after split). Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function. This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually. By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines. Reviewed By: aeubanks, ChuanqiXu Differential Revision: https://reviews.llvm.org/D95807	2021-06-30 11:38:14 -07:00
Sanjay Patel	c7b658aeb5	[InstCombine] fold icmp of offset value with constant There must be a better way to describe this pattern in words? (X + C2) >u C --> X <s -C2 (if C == C2 + SMAX) This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/rdfNFP define i1 @src(i8 %a, i8 %c1) { %t = add i8 %a, %c1 %c2 = add i8 %c1, 127 ; SMAX %ov = icmp ugt i8 %t, %c2 ret i1 %ov } define i1 @tgt(i8 %a, i8 %c1) { %neg_c1 = sub i8 0, %c1 %ov = icmp slt i8 %a, %neg_c1 ret i1 %ov } The pattern was noticed as a by-product of D104932.	2021-06-30 13:37:31 -04:00
Philip Reames	c4fc2cb5b2	[instcombine] umin(x, 1) == zext(x != 0) We already implemented this for the select form, but the intrinsic form was missing. Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.	2021-06-30 10:20:01 -07:00
Joseph Huber	ecabc6684f	[OpenMP] Change analysis remarks to not emit on cold functions The remarks will trigger on some functions that are marked cold, such as the `__muldc3` intrinsic functions. Change the remarks to avoid these functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105196	2021-06-30 11:54:24 -04:00
Nico Weber	db86e5c914	Revert "[Coroutine] Add statistics for the number of elided coroutine" This reverts commit `1d9539cf49`. Test fails in LLVM_ENABLE_ASSERTIONS=OFF builds (such as regular release builds).	2021-06-30 10:22:45 -04:00
Joseph Huber	0edb87773b	[OpenMP] Add additional remarks for OpenMPOpt This patch adds additional remarks, suggesting the use of `noescape` for failed globalization and indicating when internalization failed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105150	2021-06-30 09:49:25 -04:00
David Sherwood	7b7b5b5a26	[NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-30 11:11:49 +01:00
Chuanqi Xu	801c2b9bba	[FuncSpec] Add an option to specializing literal constant Now the option is off by default. Since we are not sure if this option would make the compile time increase aggressively. Although we tested it on SPEC2017, we may need to test more to make it on by default. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104365	2021-06-30 11:26:44 +08:00
Chuanqi Xu	1d9539cf49	[Coroutine] Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-06-30 11:20:53 +08:00
Jianzhou Zhao	ae6648cee0	[dfsan] Expose dfsan_get_track_origins to get origin tracking status This allows application code checks if origin tracking is on before printing out traces. -dfsan-track-origins can be 0,1,2. The current code only distinguishes 1 and 2 in compile time, but not at runtime. Made runtime distinguish 1 and 2 too. Reviewed By: browneee Differential Revision: https://reviews.llvm.org/D105128	2021-06-29 20:32:39 +00:00
Nikita Popov	c4de78e91c	[SanitizerCoverage] Fix global type check with opaque pointers The code was previously relying on the fact that an incorrectly typed global would result in the insertion of a BitCast constant expression. With opaque pointers, this is no longer the case, so we should check the type explicitly.	2021-06-29 20:32:14 +02:00
Alexey Bataev	129ae515fb	[INSTCOMBINE] Transform reduction(shuffle V, poison, unique_mask) to reduction(V). After SLP + LTO we may have have reduction(shuffle V, poison, mask). This can be simplified to just reduction(V) if the mask is only for single vector and just all elements from this vector are permuted, without reusing, replacing with undefs and/or other values, etc. Differential Revision: https://reviews.llvm.org/D105053	2021-06-29 10:02:38 -07:00
Philip Reames	e49d65f36d	[LV] Fix bug when unrolling (only) a loop with non-latch exit If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so. The included test case on an unmodified opt will access memory one past the expected bound. As a result, this patch is fixing a latent miscompile. Differential Revision: https://reviews.llvm.org/D103700	2021-06-29 08:04:26 -07:00
Johannes Doerfert	7af91a2b8f	[Attributor][NFCI] Make the state of AAValueSimplify explicit As we have done with other states we want the AAValueSimplify state to be explicit to use it more easily in our helpers.	2021-06-29 09:38:22 -05:00
Johannes Doerfert	dcbe58d94c	[Attributor][NFCI] Remove unneeded namespace	2021-06-29 09:38:20 -05:00
Johannes Doerfert	457bd5c8d5	[Attributor] Teach AAPotentialValues about constant select conditions There was a TODO but now we actually check if the select condition is assumed constant and only look at the relevant operand.	2021-06-29 09:38:18 -05:00
Johannes Doerfert	8dc9bb6d85	[Attributor][NFC] Clang format	2021-06-29 09:38:15 -05:00
Johannes Doerfert	a33e128012	[InstCombine] Gracefully handle an alloca outside the alloca-AS While we might eventually want to disallow allocas that do not have the alloca-AS set, it seems undesirable to crash on them. Add a cast when required so that we can support such allocas (at least here). Differential Revision: https://reviews.llvm.org/D104866	2021-06-29 09:38:13 -05:00
David Sherwood	9de63367d8	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `9dde514162`.	2021-06-29 15:20:22 +01:00
David Sherwood	9dde514162	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 14:34:30 +01:00
David Sherwood	8a3365fba2	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `dcfc2c3fac`.	2021-06-29 14:04:42 +01:00
Florian Hahn	47215e1c62	[LV] Fix crash when target instruction for sinking is dead. This patch fixes a crash when the target instruction for sinking is dead. In that case, no recipe is created and trying to get the recipe for it results in a crash. To ensure all sink targets are alive, find & use the first previous alive instruction. Note that the case where the sink source is dead is already handled. Found by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104603	2021-06-29 13:31:22 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
Roman Lebedev	6cf6f6f65f	[NFC][InstCombine] foldAggregateConstructionIntoAggregateReuse(): cast to Instruction eagerly In all of these, the value must be an instruction for us to succeed anyway, so change it to maybe hopefully make further changes more straight-forward.	2021-06-29 13:29:18 +03:00
David Sherwood	dcfc2c3fac	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 09:14:35 +01:00
Sanjay Patel	9d0bf7699c	[InstCombine] don't try to fold a constant expression that can trap (PR50906) We could use a bigger hammer and bail out on any constant expression, but there's a regression test that appears to validly do the transform (although it may not have been intending to check that optimization).	2021-06-28 17:00:21 -04:00
Joseph Huber	57ad2e1067	[OpenMP] Prevent OpenMPOpt from internalizing uncalled functions Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary. Depends on D102423 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104890	2021-06-28 16:47:53 -04:00
Nikita Popov	7ac0442fe5	[SanitizerCoverage] Support opaque pointers Pass element type rather than pointer type to some functions, so we know which type to use for the global variables.	2021-06-28 22:18:42 +02:00
Arnold Schwaighofer	3dee1e8a84	[coro] Fix rematerializable instruction sinking to coro.suspend blocks There is a constraint that coro.suspend instructions need to be in their own blocks. The coro split pass initially creates IR that obeys this constraint (which is later checked). Sinking rematerializable instructions into these blocks breaks that constraint. Instead rematerialize in the predecessor block to the suspend's single predecessor block. Differential Revision: https://reviews.llvm.org/D104051	2021-06-28 09:37:45 -07:00
Nico Weber	540b4a5fb3	Revert "[DebugInfo] Enable variadic debug value salvaging" This reverts commit `adace79652`. Still breaks things, see comment on https://reviews.llvm.org/D91722	2021-06-28 11:25:09 -04:00
Reshabh Sharma	ae983de6cc	[InferAddressSpaces] NFC: For noop IntToPtr/PtrToInt pair cast to operator instead of PtrToInt Compiler crashes at an assertion while casting operands to PtrToIntInst at some cases when ptrtoint is present as an explicit operand to inttoptr. Explicit instruction operator as operand can not be casted to an Instruction. This patch replaces cast from PtrToInst to Operator which are later checked for constant expressions. Differential Revision: https://reviews.llvm.org/D105002	2021-06-28 19:24:26 +05:30
Joseph Huber	13b2fba239	[OpenMP][NFC] Fix typo in OpenMPOpt	2021-06-28 09:49:14 -04:00
Joseph Huber	4024087731	[OpenMP][NFC] Fix missing argument	2021-06-28 09:15:01 -04:00
Joseph Huber	4a6bd8e3e7	[OpenMP] Increase attributor iterations on the GPU Increase the number of attributor iterations on a GPU target. I forgot to change this in D104416. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104920	2021-06-28 08:50:49 -04:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Max Kazantsev	d58514d41c	[LSR][NFC] Make sure that after the canonicalization the formula is canonical	2021-06-28 12:50:04 +07:00
Max Kazantsev	7c73c2ede8	[LoopDeletion] Benefit from branches by undef conditions when symbolically executing 1st iteration We can exploit branches by `undef` condition. Frankly, the LangRef says that such branches are UB, so we can assume that all outgoing edges of such blocks are dead. However, from practical perspective, we know that this is not supported correctly in some other places. So we are being conservative about it. Branch by undef is treated in the following way: - If it is a loop-exiting branch, we always assume it exits the loop; - If not, we arbitrarily assume it takes `true` value. Differential Revision: https://reviews.llvm.org/D104689 Reviewed By: nikic	2021-06-28 11:39:46 +07:00
Nikita Popov	e81702912e	[DSE] Preserve address space Preserve address space when inserting i8* cast.	2021-06-27 20:26:00 +02:00
Nikita Popov	9aa951e80e	[MemCpyOpt] Preserve address space Preserve address space when generating the cast to i8*.	2021-06-27 20:21:19 +02:00
Nikita Popov	f00941e061	[DSE] Support opaque pointers For the start shortening optimization, always use a i8 type for the GEP, as it is a raw offset calculation. Handling of non-i8* memset/memcpy arguments requires insertion of casts. These cases were previously miscompiled, as the offset calculation was performed on the wrong type.	2021-06-27 17:41:40 +02:00
Nikita Popov	f025053977	[MemCpyOpt] Handle unusual memcpy element type Apparently, it is legal to use memcpy/memset with pointer types other than i8. Prior to `81fcdae68c` this case was silently miscompiled, as the i8 offset calculation was performed on some other type. Now it would crash due to a type mismatch. Fix this by inserting an explicit bitcast to i8.	2021-06-27 16:21:44 +02:00
Sanjay Patel	153da08a6c	[InstCombine] hoist min/max intrinsics above select with constant op This is an extension of the handling for unary intrinsics and follows the logic that we use for binary ops. We don't canonicalize to min/max intrinsics yet, but this might help unlock other folds seen in D98152.	2021-06-27 10:02:23 -04:00
Nikita Popov	81fcdae68c	[MemCpyOpt] Support opaque pointers	2021-06-27 15:52:38 +02:00
Nikita Popov	a9129f8964	[LoadStoreVectorizer] Support opaque pointers There are remaining redundant bitcasts.	2021-06-27 15:42:16 +02:00
Florian Hahn	f1a6430272	[VPlan] Track both incoming values for first-order recurrence phis. This patch updates VPWidenPHI recipes for first-order recurrences to also track the incoming value from the back-edge. Similar to D99294, which did the same for reductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104197	2021-06-27 14:29:35 +01:00
Florian Hahn	7f36981977	[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC). Suggested in D104197, avoids the early exit.	2021-06-26 13:13:25 +01:00
Andrew Browne	45f6d5522f	[DFSan] Change shadow and origin memory layouts to match MSan. Previously on x86_64: +--------------------+ 0x800000000000 (top of memory) \| application memory \| +--------------------+ 0x700000008000 (kAppAddr) \| \| \| unused \| \| \| +--------------------+ 0x300000000000 (kUnusedAddr) \| origin \| +--------------------+ 0x200000008000 (kOriginAddr) \| unused \| +--------------------+ 0x200000000000 \| shadow memory \| +--------------------+ 0x100000008000 (kShadowAddr) \| unused \| +--------------------+ 0x000000010000 \| reserved by kernel \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem & ~0x600000000000 SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow Now for x86_64: +--------------------+ 0x800000000000 (top of memory) \| application 3 \| +--------------------+ 0x700000000000 \| invalid \| +--------------------+ 0x610000000000 \| origin 1 \| +--------------------+ 0x600000000000 \| application 2 \| +--------------------+ 0x510000000000 \| shadow 1 \| +--------------------+ 0x500000000000 \| invalid \| +--------------------+ 0x400000000000 \| origin 3 \| +--------------------+ 0x300000000000 \| shadow 3 \| +--------------------+ 0x200000000000 \| origin 2 \| +--------------------+ 0x110000000000 \| invalid \| +--------------------+ 0x100000000000 \| shadow 2 \| +--------------------+ 0x010000000000 \| application 1 \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem ^ 0x500000000000 SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000 Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D104896	2021-06-25 17:00:38 -07:00
Nikita Popov	fdd4c199a1	Revert "[InstCombine] Make indexed compare fold opaque ptr compatible" This reverts commit `5cb20ef8a2`. Assertion failures with this patch were reported on https://reviews.llvm.org/rG5cb20ef8a235, revert for now.	2021-06-26 00:32:59 +02:00
Eli Friedman	8d5bf0709d	[NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion The implementation is identical, but it makes the semantics a bit more obvious.	2021-06-25 14:43:13 -07:00
Juneyoung Lee	1605593440	[SimplifyLibCalls] Fix memchr opt to use CreateLogicalAnd This fixes a bug at LibCallSimplifier::optimizeMemChr which does the following transformation: ``` // memchr("\r\n", C, 2) != nullptr -> (1 << C & ((1 << '\r') \| (1 << '\n'))) // != 0 // after bounds check. ``` As written above, a bounds check on C (whether it is less than integer bitwidth) is done before doing `1 << C` otherwise 1 << C will overflow. If the bounds check is false, the result of (1 << C & ...) must not be used at all, otherwise the result of shift (which is poison) will contaminate the whole results. A correct way to encode this is `select i1 (bounds check), (1 << C & ...), false` because select does not allow the unused operand to contaminate the result. However, this optimization was introducing `and (bounds check), (1 << C & ...)` which cannot do that. The bug was found from compilation of this C++ code: https://reviews.llvm.org/rG2fd3037ac615#1007197 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104901	2021-06-26 05:59:35 +09:00
Joseph Huber	5ccb7424fa	[OpenMP] Change OpenMPOpt to check openmp metadata The metadata added in D102361 introduces a module flag that we can check to determine if the module was compiled with `-fopenmp` enables. We can now check for the precense of this instead of scanning the call graph for OpenMP runtime functions. Depends on D102361 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102423	2021-06-25 16:34:22 -04:00
Hongtao Yu	3638085ff0	[Coroutines] Define __coro_frame_ty in function scope Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope. We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D104937	2021-06-25 12:33:20 -07:00
Florian Hahn	cc5ee857f9	[LV] Doxygenize VectorizationFactor member comments (NFC). Minor cleanup for follow-up patch.	2021-06-25 18:35:00 +01:00
Philip Reames	2cd23eb243	[instcombine] Fold overflow check using umulo to comparison If we have a umul.with.overflow where the multiply result is not used and one of the operands is a constant, we can perform the overflow check cheaper with a comparison then by performing the multiply and extracting the overflow flag. (Noticed when looking at the conditions SCEV emits for overflow checks.) Differential Revision: https://reviews.llvm.org/D104665	2021-06-25 10:25:45 -07:00
Florian Hahn	91053e327c	[LV] Reflow comment for VectorizationCostTy (NFC).	2021-06-25 14:20:06 +01:00
Arthur Eubanks	1aa02b37e7	Revert "[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions." This reverts commit `1eda5453f2`. Causes https://crbug.com/1223647: Incompatible argument and return types for 'returned' attribute tail call void @llvm.memset.p0i8.i64(i8* noalias noundef returned writeonly align 1 dereferenceable(255) %arraydecay, i8 0, i64 255, i1 false), !dbg !985	2021-06-24 19:24:34 -07:00
Nikita Popov	5cb20ef8a2	[InstCombine] Make indexed compare fold opaque ptr compatible Rather than relying on pointer type equality (which, for a change, is silently incorrect with opaque pointers) check that the GEP source element types match.	2021-06-24 22:33:01 +02:00
Arthur Eubanks	7110510eca	[WPD] Don't optimize calls more than once WPD currently assumes that there is a one to one correspondence between type test assume sequences and virtual calls. However, with -fstrict-vtable-pointers this may not be true. This ends up causing crashes when we try to optimize a virtual call more than once ( applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()). applySingleImplDevirt() actually didn't previous crash because it would replace the devirtualized call with the same direct call. Adding an assert that the call is indirect causes the corresponding test to crash with the rest of the patch. This makes Chrome successfully build with -fstrict-vtable-pointers + WPD. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104798	2021-06-24 13:28:09 -07:00
Nikita Popov	8e0ff44bf8	[InstCombine] Make varargs cast transform compatible with opaque ptrs The whole transform can be dropped once we have fully transitioned to opaque pointers (as it's purpose is to remove no-op pointer casts). For now, make sure that it handles opaque pointers correctly.	2021-06-24 21:57:05 +02:00
Jonas Paulsson	1eda5453f2	[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions. - When emitting libcalls, do not only pass the calling convention from the function prototype but also the attributes. - Do not pass attributes from e.g. libc memcpy to llvm.memcpy. Review: Reid Kleckner, Eli Friedman, Arthur Eubanks Differential Revision: https://reviews.llvm.org/D103992	2021-06-24 14:47:24 -05:00
Roman Lebedev	d064182612	[SimplifyCFG] Tail-merging all blocks with `resume` terminator Similar to what we already do for `ret` terminators. As noted by @rnk, clang seems to already generate a single `ret`/`resume`, so this isn't likely to cause widespread changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104849	2021-06-24 21:25:06 +03:00
Florian Hahn	833bdbe93c	[LV] Support sinking recipe in replicate region after another region. This patch handles sinking a replicate region after another replicate region. In that case, we can connect the sink region after the target region. This properly handles the case for which an assertion has been added in `337d765282`. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D103514	2021-06-24 13:58:42 +01:00
Stephen Tozer	adace79652	[DebugInfo] Enable variadic debug value salvaging This patch enables the salvaging of debug values that may be calculated from more than one SSA value, such as with binary operators that do not use a constant argument. The actual functionality for this behaviour is added in a previous commit (`c7270567`), but with the ability to actually emit the resulting debug values switched off. The reason for this is that the prior patch has been reverted several times due to issues discovered downstream, some time after the actual landing of the patch. The patch in question is rather large and touches several widely used header files, and all issues discovered are more related to the handling of variadic debug values as a whole rather than the details of the patch itself. Therefore, to minimize the build time impact and risk of conflicts involved in any potential future revert/reapply of that patch, this significantly smaller patch (that touches no header files) will instead be used as the capstone to enable variadic debug value salvaging. The review linked to this patch is mostly implemented by the previous commit, `c7270567`, but also contains the changes in this patch. Differential Revision: https://reviews.llvm.org/D91722	2021-06-24 13:16:29 +01:00
Roman Lebedev	9c4c2f2472	[SimplifyCFG] Tail-merging all blocks with `ret` terminator Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104597	2021-06-24 13:15:39 +03:00
Stephen Tozer	c72705678c	Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This is a partial reapply of the original commit and the followup commit that were previously reverted; this reapply also includes a small fix for a potential source of non-determinism, but also has a small change to turn off variadic debug value salvaging, to ensure that any future revert/reapply steps to disable and renable this feature do not risk causing conflicts. Differential Revision: https://reviews.llvm.org/D91722 This reverts commit `386b66b2fc`.	2021-06-24 09:46:38 +01:00
Zequan Wu	9393894331	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This casues compiler crash: Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' This reverts commit `e3d24b45b8`.	2021-06-23 19:24:56 -07:00
Evgenii Stepanov	78f7e6d8d7	[hwasan] Respect llvm.asan.globals. This enable no_sanitize C++ attribute to exclude globals from hwasan testing, and automatically excludes other sanitizers' globals (such as ubsan location descriptors). Differential Revision: https://reviews.llvm.org/D104825	2021-06-23 18:37:00 -07:00
Nikita Popov	8321335fd8	[InstCombine] Use getFunctionType() Avoid fetching pointer element type...	2021-06-23 20:28:34 +02:00
Sami Tolvanen	e3d24b45b8	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. This relands commit `4474958d3a` with a fix to a use-of-uninitialized-value error that tripped MemorySanitizer. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-23 10:56:13 -07:00
Kuter Dinel	5d44d56f7d	[Attributor] Derive AAFunctionReachability attribute. This attribute uses Attributor's internal 'optimistic' call graph information to answer queries about function call reachability. Functions can become reachable over time as new call edges are discovered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104599	2021-06-23 20:43:10 +03:00
Nikita Popov	00d3f7cc3c	[LAA] Make getPointersDiff() API compatible with opaque pointers Make getPointersDiff() and sortPtrAccesses() compatible with opaque pointers by explicitly passing in the element type instead of determining it from the pointer element type. The SLPVectorizer result is slightly non-optimal in that unnecessary pointer bitcasts are added. Differential Revision: https://reviews.llvm.org/D104784	2021-06-23 18:44:34 +02:00
Datta Nagraj	ad0085d338	[InstCombine] Eliminate casts to optimize ctlz operation If a ctlz operation is performed on higher datatype and then downcasted, then this can be optimized by doing a ctlz operation on a lower datatype and adding the difference bitsize to the result of ctlz to provide the same output: https://alive2.llvm.org/ce/z/8uup9M The original problem is shown in https://llvm.org/PR50173 Differential Revision: https://reviews.llvm.org/D103788	2021-06-23 11:19:12 -04:00
Sanjay Patel	1e9b6b89a7	[InstCombine] convert FP min/max with negated op to fabs This is part of improving floating-point patterns seen in: https://llvm.org/PR39480 We don't require any FMF because the 2 potential corner cases (-0.0 and NaN) are correctly handled without FMF: 1. -0.0 is treated as strictly less than +0.0 with maximum/minimum, so fabs/fneg work as expected. 2. +/- 0.0 with maxnum/minnum is indeterminate, so transforming to fabs/fneg is more defined. 3. The sign of a NaN may be altered by this transform, but that is allowed in the default FP environment. If there are FMF, they are propagated from the min/max call to one or both new operands which seems to agree with Alive2: https://alive2.llvm.org/ce/z/bem_xC	2021-06-23 10:41:39 -04:00
Roman Lebedev	ff4b1d379f	[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future. This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it. Other restrictions of the transform remain. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104598	2021-06-23 14:33:18 +03:00
Joe Ellis	3c4dbf6ea9	[Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics With regards to overrunning, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. (llvm.experimental.vector.extract) Elements ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. For the non-mixed cases (e.g. inserting/extracting a scalable into/from another scalable, or inserting/extracting a fixed into/from another fixed), it is possible to statically check whether or not the above conditions are met. This was previously missing from the verifier, and if the conditions were found to be false, the result of the insertion/extraction would be replaced with an undef. With regards to invalid indices, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) ``idx`` represents the starting element number at which ``subvec`` will be inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum vector length. (llvm.experimental.vector.extract) The ``idx`` specifies the starting element number within ``vec`` from which a subvector is extracted. ``idx`` must be a constant multiple of the known-minimum vector length of the result type. Similarly, these conditions were not previously enforced in the verifier. In some circumstances, invalid indices were permitted silently, and in other circumstances, an undef was spawned where a verifier error would have been preferred. This commit adds verifier checks to enforce the constraints above. Differential Revision: https://reviews.llvm.org/D104468	2021-06-23 10:33:22 +00:00
Max Kazantsev	842b4c83cb	[LoopDeletion] Exploit undef Phi inputs when symbolically executing 1st iteration Follow-up on Roman's idea expressed in D103959. - If a Phi has undefined inputs from live blocks: - and no other inputs, assume it is undef itself; - and exactly one non-undef input, we can assume that all undefs are equal to this input. Differential Revision: https://reviews.llvm.org/D104618 Reviewed By: lebedev.ri, nikic	2021-06-23 11:53:48 +07:00
Max Kazantsev	b7d2c173eb	[LSR] Filter out zero factors. PR50765 Zero factor leads to division by zero and failure of corresponding assert as shown in PR50765. We should filter out such factors. Differential Revision: https://reviews.llvm.org/D104702 Reviewed By: huihuiz, reames	2021-06-23 10:43:06 +07:00
Jon Roelofs	493d6928fe	[Remarks] Make memsize remarks report as an analysis, not a missed opportunity. Differential revision: https://reviews.llvm.org/D104078	2021-06-22 18:22:47 -07:00
Liqiang Tao	a0d96fdd3a	[llvm][Inliner] Make PriorityInlineOrder lazily updated This patch makes PriorityInlineOrder lazily updated. The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D104654	2021-06-23 08:59:53 +08:00
Joseph Huber	1cfdcae653	[Attributor] Fix AAExecutionDomain returning true on invalid states This patch fixes a problem with the AAExecutionDomain attributor not checking if it is in a valid state. This can cause it to incorrectly return that a block is executed in a single threaded context after the attributor failed for any reason. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103186	2021-06-22 18:12:43 -04:00
Joseph Huber	44feacc736	[OpenMP] Change remaining globalization from an analysis remark to missed After landing the globalization optimizations, the precense of globalization on the device that was not put in shared or stack memory is a failed optimization with performance consequences so it should indicate a missed remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104735	2021-06-22 16:52:06 -04:00
Nikita Popov	7bb7fa12e7	[OpaquePtr] Support changing load type in InstCombine When the load type is changed to ptr, we need the load pointer type to also be ptr, because it's not allowed to create a pointer to an opaque pointer. This is achieved by adjusting the getPointerTo() API to return an opaque pointer for an opaque pointer base type. Differential Revision: https://reviews.llvm.org/D104718	2021-06-22 21:16:15 +02:00
Sami Tolvanen	33c9438f11	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This reverts commit `4474958d3a`. Breaks check-llvm on Mac.	2021-06-22 12:10:58 -07:00
Joseph Huber	ca1560da72	[OpenMP][NFC] Add new optimizations to OpenMPOpt comment header Summary: Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.	2021-06-22 14:40:31 -04:00
Joseph Huber	b54ccab509	[Attributor] Add an option to increase the max number of iterations Right now the Attributor defaults to 32 fixed point iterations unless it is set explicitly by a command line flag. This patch allows this to be configured when the attributor instance is created. The maximum is then increased in OpenMPOpt if the target is a kernel. This is because the globalization analysis can result in larger iteration counts due to many dependent instances running at once. Depends on D102444 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104416	2021-06-22 14:38:25 -04:00
Sanjay Patel	b1f6ef92ec	[InstCombine] reduce code duplication for FP min/max with casts fold; NFC	2021-06-22 14:15:04 -04:00
Joseph Huber	30e36c9b3c	[Attributor] Add interface to emit remarks in Attributor Summary: This patch adds support for the Attributor to emit remarks on behalf of some other pass. The attributor can now optionally take a callback function that returns an OptimizationRemarkEmitter object when given a Function pointer. If this is availible then a remark will be emitted for the corresponding pass name. Depends on D102197 Reviewed By: sstefan1 thegameg Differential Revision: https://reviews.llvm.org/D102444	2021-06-22 14:12:46 -04:00
Joseph Huber	7d69da71dd	[OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls Summary: The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087. Depends on D97818 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102197	2021-06-22 13:23:05 -04:00
Sami Tolvanen	4474958d3a	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-22 10:01:55 -07:00
Joseph Huber	03d7e61c87	[OpenMP] Internalize functions in OpenMPOpt to improve IPO passes Summary: Currently the attributor needs to give up if a function has external linkage. This means that the optimization introduced in D97818 will only apply to static functions. This change uses the Attributor to internalize OpenMP device routines by making a copy of each function with private linkage and replacing the uses in the module with it. This allows for the optimization to be applied to any regular function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102824	2021-06-22 12:38:10 -04:00
Joseph Huber	6fc51c9f7d	[OpenMP] Replace GPU globalization calls with shared memory in the middle-end Summary: The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97818	2021-06-22 11:55:44 -04:00
Nikita Popov	e790d3667e	[OpaquePtr] Handle addrspacecasts in InstCombine This adds support for addrspace casts involving opaque pointers to InstCombine, as well as the isEliminableCastPair() helper (otherwise the assertion failure would just move there). Add PointerType::hasSameElementTypeAs() to hide the element type details. Differential Revision: https://reviews.llvm.org/D104668	2021-06-22 17:45:30 +02:00
Jingu Kang	873ff5a728	[SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch There was a bug from cost calculation for partially invariant unswitch. The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should not be counted. Differential Revision: https://reviews.llvm.org/D103816	2021-06-22 16:18:00 +01:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Max Kazantsev	4c4f1ae93e	Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks" Patch was reverted due to a bug that existed before it and was exposed by it. Returning after the underlying bug has been fixed. Differential Revision: https://reviews.llvm.org/D103959	2021-06-22 12:28:46 +07:00
Max Kazantsev	575253887b	[LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically Two predecessors break the further logic, and the loop may come to the opt in non-canonicalized state.	2021-06-22 12:20:55 +07:00
Nikita Popov	39796e1ad0	Reapply [InstCombine] Don't try converting opaque pointer bitcast to GEP Reapplied without changes -- this was reverted together with an underlying patch. ----- Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 22:15:56 +02:00
Nikita Popov	e2c2124a4b	Reapply [InstCombine] Extract bitcast -> gep transform Relative to the original patch, an InstCombine test has been added to show a previously missed pattern, and the Coroutine test that resulted in the revert has been regenerated. ----- Move this into a separate function, to make sure that early returns do not accidentally skip other transforms. This previously happened for the isSized() check, which skipped folds like distributing a bitcast over a select.	2021-06-21 22:03:15 +02:00
Nikita Popov	6922ab73a5	Revert "[InstCombine] Extract bitcast -> gep transform" This reverts commit `d9f5d7b959`. This reverts commit `5780611d7e`. This causes a failure in Coroutine tests.	2021-06-21 21:34:17 +02:00
Nikita Popov	862313cf59	[LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI) As these are no longer passed to UnrollLoop(), there is no need to modify them in computeUnrollCount(). Make them non-reference parameters. Differential Revision: https://reviews.llvm.org/D104590	2021-06-21 21:34:17 +02:00
Alexey Bataev	908b753661	[SLP]Improve vectorization of PHI instructions. Perform better analysis when trying to vectorize PHIs. 1. Do not try to vectorize vector PHIs. 2. Do deeper analysis for more profitable nodes for the vectorization. Before we just tried to vectorize the PHIs of the same type. Patch improves this and tries to vectorize PHIs with incoming values which come from the same basic block, have the same and/or alternative opcodes. It allows to save the compile time and provides better vectorization results in general. Part of D57059. Differential Revision: https://reviews.llvm.org/D103638	2021-06-21 12:26:24 -07:00
Nikita Popov	5780611d7e	[InstCombine] Don't try converting opaque pointer bitcast to GEP Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 21:24:50 +02:00
Nikita Popov	d9f5d7b959	[InstCombine] Extract bitcast -> gep transform Move this into a separate function, to make sure that early returns do not accidentally skip other transforms. There is already one isSized() check that could run into this issue, thus this change is not strictly NFC.	2021-06-21 21:24:50 +02:00
Nikita Popov	a969bdc56f	[InstCombine] Remove unnecessary addres space check (NFC) It's not possible to bitcast between different address spaces, and this is ensured by the IR verifier. As such, this bitcast to addrspacecast canonicalization can never be hit.	2021-06-21 20:11:39 +02:00
Nathan Chancellor	f52666985d	Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks" This reverts commit `bb1dc876eb`. This patch causes an assertion failure when building an arm64 defconfig Linux kernel. See https://reviews.llvm.org/D103959 for a link to the original bug report and a reduced reproducer.	2021-06-21 10:18:55 -07:00
Sanjay Patel	198b79caae	[InstCombine] move bitmanipulation-of-select folds This is no outwardly-visible-difference-intended, but it is obviously better to have all transforms for an intrinsic housed together since we already have helper functions in place. It is also potentially more efficient to zap a simple pattern match before trying to do expensive computeKnownBits() calls.	2021-06-21 11:32:16 -04:00
Sanjay Patel	64b2676ca8	[InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms Building on: `4c44b02d87` ...and adding handling for the extra operand in these intrinsics. This pattern is discussed in: https://llvm.org/PR50140	2021-06-21 11:04:12 -04:00
Nikita Popov	80e0424b2c	[Mem2Reg] Use poison for unreachable cases Use poison instead of undef for cases dealing with unreachable code. This still leaves the more interesting case of "load from uninitialized memory" as undef.	2021-06-21 10:54:13 +02:00
Juneyoung Lee	c038845f58	[InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified. Resolves llvm.org/pr48975. Reviewed By: nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D96663	2021-06-21 17:39:05 +09:00
Sjoerd Meijer	342bbb7832	[FuncSpec] Don't specialise functions with NoDuplicate instructions. getSpecializationCost was returning INT_MAX for a case when specialisation shouldn't happen, but this wasn't properly checked if specialisation was forced. Differential Revision: https://reviews.llvm.org/D104461	2021-06-21 09:02:11 +01:00
Max Kazantsev	bb1dc876eb	[LoopDeletion] Handle Phis with similar inputs from different blocks This patch lifts the requirement to have the only incoming live block for Phis. There can be multiple live blocks if the same value comes to phi from all of them. Differential Revision: https://reviews.llvm.org/D103959 Reviewed By: nikic, lebedev.ri	2021-06-21 11:37:06 +07:00
Juneyoung Lee	ce192ced2b	[InstCombine] Use poison constant to represent the result of unreachable instrs This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction. This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll . Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104602	2021-06-21 09:58:44 +09:00
Nikita Popov	1ae266f452	[LoopUnroll] Use smallest exact trip count from any exit This is a more general alternative/extension to D102635. Rather than handling the special case of "header exit with non-exiting latch", this unrolls against the smallest exact trip count from any exit. The latch exit is no longer treated as priviledged when it comes to full unrolling. The motivating case is in full-unroll-one-unpredictable-exit.ll. Here the header exit is an IV-based exit, while the latch exit is a data comparison. This kind of loop does not get rotated, because the latch is already exiting, and loop rotation doesn't try to distinguish IV-based/analyzable latches. Differential Revision: https://reviews.llvm.org/D102982	2021-06-20 20:58:26 +02:00
David Green	a24b02193a	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-06-20 17:03:30 +01:00
Sanjay Patel	4c44b02d87	[InstCombine] fold ctpop-of-select with 1 or more constant arms The general pattern is mentioned in: https://llvm.org/PR50140 ...but we need to do a bit more to handle intrinsics with extra operands like ctlz/cttz.	2021-06-20 11:28:45 -04:00
Sanjay Patel	240acb0cff	[InstCombine] avoid infinite loops with select folds of constant expressions This pair of transforms was added recently with: `8591640379` And could lead to conflicting folds: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399	2021-06-20 09:46:25 -04:00
Roman Lebedev	c5b7335dc8	[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken Same as with HoistThenElseCodeToIf() (`ad87761925`).	2021-06-20 12:37:14 +03:00
Roman Lebedev	ad87761925	[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken This problem is exposed by D104598, after it tail-merges `ret` in `@test_inline_constraint_S_label`, the verifier would start complaining `invalid operand for inline asm constraint 'S'`. Essentially, taking address of a block is mismodelled in IR. It should probably be an explicit instruction, a first one in block, that isn't identical to any other instruction of the same type, so that it can't be hoisted.	2021-06-20 12:18:15 +03:00
Nikita Popov	1bd4085e0b	[LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop() Currently, UnrollLoop() is passed an AllowRuntime flag and decides itself whether runtime unrolling should be used or not. This patch pushes the decision into the caller and allows us to eliminate the ULO.TripCount and ULO.TripMultiple parameters. Differential Revision: https://reviews.llvm.org/D104487	2021-06-19 09:25:57 +02:00
Liqiang Tao	671a87104b	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-19 10:17:32 +08:00
Guozhi Wei	575ba6f425	[InstCombine] Don't transform code if DoTransform is false In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case. Differential Revision: https://reviews.llvm.org/D104567	2021-06-18 18:01:34 -07:00
Fangrui Song	3307240f05	[InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling On ELF, the D1003372 optimization can apply to more cases. There are two prerequisites for making `__profd_` private: * `__profc_` keeps `__profd_` live under compiler/linker GC * `__profd_` is not referenced by code The first is satisfied because all counters/data are in a section group (either `comdat any` or `comdat noduplicates`). The second requires that the function does not use value profiling. Regarding the second point: `__profd_` may be referenced by other text sections due to inlining. There will be a linker error if a prevailing text section references the non-prevailing local symbol. With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) clang is 4.2% smaller (1-169620032/177066968). `stat -c %s */.o \| awk '{s+=$1}END{print s}' is 2.5% smaller. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-18 17:01:17 -07:00
Hongtao Yu	bd52495518	[CSSPGO] Undoing the concept of dangling pseudo probe As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen. I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104477	2021-06-18 15:14:11 -07:00
Nikita Popov	3308205ae9	[LoopUnroll] Simplify optimization remarks Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and debug code. For debug code, print information about all exits. For optimization remarks, only include the unroll count and the type of unroll (complete, partial or runtime), but omit detailed information about exit folding, now that more than one exit may be folded. Differential Revision: https://reviews.llvm.org/D104482	2021-06-18 23:47:03 +02:00
Nick Desaulniers	bef2992861	[GCOVProfiling] don't profile Fn's w/ noprofile attribute Similar to D104475, the Linux kernel would like to avoid compiler generated code in certain functions. The no_profile function attribute can be used in C to generate the the noprofile fn attr in IR. Respect that from GCOVProfiling. Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/ Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D104257	2021-06-18 13:58:34 -07:00
Andrew Browne	14407332de	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-18 11:21:46 -07:00
Liqiang Tao	93183a41b9	Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder"	2021-06-18 18:52:00 +08:00
Max Kazantsev	de92287cf8	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3) This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. This implementation uses InstSimplify. SCEV version was rejected due to high compile time impact. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: nikic	2021-06-18 17:31:57 +07:00
Haojian Wu	3f5d53a525	[Attributor] Fix UB behavior on uninitalized bool variables. Found by ASAN.	2021-06-18 11:49:42 +02:00
Daniil Seredkin	6643e51d79	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 16:28:06 +07:00
Liqiang Tao	a740b707d1	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-18 16:55:38 +08:00
Max Kazantsev	fa5eb22ad4	[NFC] Assert non-zero factor before division This is to ensure that zero denominator leads to controlled assertion failure rather than UB.	2021-06-18 15:50:50 +07:00
Haojian Wu	7670938bba	[Attributor] Don't print the call-graph in a hard-coded file. This looks like not a practical pattern in our codebase (it could fail in some sandbox environement). Instead we print it via standard output, and it is controled by the -attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.	2021-06-18 09:38:07 +02:00
Daniil Seredkin	6de741de08	Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)" This reverts commit `31053338c9`.	2021-06-18 14:21:02 +07:00
Daniil Seredkin	31053338c9	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 14:12:00 +07:00
Fangrui Song	5798be8458	Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF" This reverts commit `76d0747e08`. If a group has `__llvm_prf_vals` due to static value profiler counters (`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text section may reference `__llvm_prf_data` and will cause a `relocation refers to a discarded section` linker error. Note: while a `__profc_` group is non-prevailing, it may be referenced by a prevailing text section due to inlining. ``` group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections: [Index] Name [ 67] __llvm_prf_cnts [ 68] __llvm_prf_vals [ 69] __llvm_prf_data [ 70] .rela__llvm_prf_data ```	2021-06-17 23:38:17 -07:00
Johannes Doerfert	30c9d68ad9	[Attributor][FIX] Arguments of unknown functions can be undef This should fix PR50683. The wrong assumption was that we could always know what the callee is when we replace a call site argument with undef. We wanted to know that to remove the `noundef` that might be attached to the argument. Since no callee means we did the propagation on the caller site, there is no need to remove an attribute. It is only needed if we replace all uses and therefore pass `undef` instead of the value that was passed in otherwise.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	666dc6f126	[Attributor] Use a centralized value simplification interface To allow outside AAs that simplify values we need to ensure all value simplification goes through the Attributor, not AAValueSimplify (or any of the other AAs we have already like AAPotentialValues). This patch also introduces an interface for the outside AAs to register simplification callbacks for an IRPosition. To make this work as expected we have to pass IRPositions instead of Values in AAValueSimplify, which makes sense by itself.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	d9194b6efb	[Attributor] Introduce a helper do deal with constant type mismatches If we simplify values we sometimes end up with type mismatches. If the value is a constant we can often cast it though to still allow propagation. The logic is now put into a helper and it replaces some ad hoc things we did before. This also introduces the AA namespace for abstract attribute related functions and types. Differential Revision: https://reviews.llvm.org/D103856	2021-06-18 01:07:52 -05:00
Johannes Doerfert	9959eee001	[Attributor] Make sure Heap2Stack works properly on a GPU target If the target stack is not accessible between different running "threads" we have to make sure not to create allocas for mallocs that might be used by multiple "threads". The "use check" is sufficient to prevent this but if we apply the "free check" we have to make sure the pointer is not communicated to others before the free is reached. Differential Revision: https://reviews.llvm.org/D98608	2021-06-18 01:07:52 -05:00
Johannes Doerfert	9a23e673ca	[OpenMP][NFC] Expose AAExecutionDomain and rename its getter The initial use for AAExecutionDomain was to determine if a single thread executes a block. While this is sometimes informative most of the time, and for other reasons, we actually want to know if it is the "initial thread". Thus, the thread that started execution on the current device. The deduction needs to be adjusted in a follow up as the methods we use right not are looking for the OpenMP thread id which is resets whenever a thread enters a parallel region. What we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and equivalent functions.	2021-06-18 01:07:52 -05:00
Johannes Doerfert	8d7bace3b5	[Attributor][NFC] AAReachability is currently stateless, don't invalidate it We invalidated AAReachabilityImpl directly which is not helpful and confusing as we still used it regardless. We now avoid invalidating it (not needed anyway) and add checks for the state. This has by itself no actual effect but prepares for later extensions.	2021-06-18 01:07:51 -05:00
George Balatsouras	c6b5a25eeb	[dfsan] Replace dfs$ prefix with .dfsan suffix The current naming scheme adds the `dfs$` prefix to all DFSan-instrumented functions. This breaks mangling and prevents stack trace printers and other tools from automatically demangling function names. This new naming scheme is mangling-compatible, with the `.dfsan` suffix being a vendor-specific suffix: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure With this fix, demangling utils would work out-of-the-box. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104494	2021-06-17 22:42:47 -07:00
Xun Li	3522167efd	[Coroutine] Properly deal with byval and noalias parameters This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857. Previous attempts can be found in D104007 and D101980. A lot of discussions can be found in those two patches. To summarize the bug: When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly. However, in some cases we find that arguments are still directly used: When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free. This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly. For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass. For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer. Differential Revision: https://reviews.llvm.org/D104184	2021-06-17 19:06:10 -07:00
Roman Lebedev	84eeb82888	[NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining unswitching failure It's not prohibitively verbose, and allows easier understanding why certain unswitching ultimately wasn't performed.	2021-06-18 00:46:04 +03:00
Kuter Dinel	eaf1b6810c	[Attributor] Derive AACallEdges attribute This attribute computes the optimistic live call edges using the attributor liveness information. This attribute will be used for deriving a inter-procedural function reachability attribute. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104059	2021-06-18 03:29:22 +03:00
Andrew Browne	39295e92f7	Revert "[DFSan] Cleanup code for platforms other than Linux x86_64." This reverts commit `8441b993bd`. Buildbot failures.	2021-06-17 14:19:18 -07:00
Fangrui Song	76d0747e08	[InstrProfiling] Make __profd_ unconditionally private for ELF For ELF, since all counters/data are in a section group (either `comdat any` or `comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the D1003372 optimization prerequisite (linker GC cannot discard data variables while the text section is retained) is always satisified, we can make __profd_ unconditionally private. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-17 14:16:54 -07:00
Craig Topper	99e95856fb	[PartiallyInlineLibCalls] Disable sqrt expansion for strictfp. This pass emits a floating point compare and a conditional branch, but if strictfp is enabled we don't emit a constrained compare intrinsic. The backend also won't expand the readonly sqrt call this pass inserts to a sqrt instruction under strictfp. So we end up with 2 libcalls as seen here. https://godbolt.org/z/oax5zMEWd Fix these things by disabling the pass. Differential Revision: https://reviews.llvm.org/D104479	2021-06-17 14:15:12 -07:00
Andrew Browne	8441b993bd	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-17 14:08:40 -07:00
Nikita Popov	f7c54c4603	[LoopUnroll] Fold all exits based on known trip count/multiple Fold all exits based on known trip count/multiple information from SCEV. Previously only the latch exit or the single exit were folded. This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple entirely: They're still used to a) decide whether runtime unrolling should be performed and b) for ORE remarks. However, the core unrolling logic is independent of them now. Differential Revision: https://reviews.llvm.org/D104203	2021-06-17 20:58:34 +02:00
Roman Lebedev	37dfc467ac	[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg This really isn't talking about vectors in general, but only about either fixed or scalable vectors, and it's pretty confusing to see it state that there aren't any vectors :)	2021-06-17 21:07:34 +03:00
hyeongyukim	69b0ed9a0a	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-06-17 19:46:17 +09:00
Sjoerd Meijer	dcd23d875a	[FuncSpec] Don't specialise functions with attribute NoDuplicate. Differential Revision: https://reviews.llvm.org/D104378	2021-06-17 10:32:29 +01:00
Florian Hahn	80a403348b	[VPlan] Support PHIs as LastInst when inserting scalars in ::get(). At the moment, we create insertelement instructions directly after LastInst when inserting scalar values in a vector in VPTransformState::get. This results in invalid IR when LastInst is a phi, followed by another phi. In that case, the new instructions should be inserted just after the last PHI node in the block. At the moment, I don't think the problematic case can be triggered, but it can happen once predicate regions are merged and multiple VPredInstPHI recipes are in the same block (D100260). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104188	2021-06-17 09:36:44 +01:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Sjoerd Meijer	49ab3b1735	[FuncSpec] Statistics Adds some bookkeeping for collecting the number of specialised functions and a test for that. Differential Revision: https://reviews.llvm.org/D104102	2021-06-16 09:11:51 +01:00
Evgeniy Brevnov	96cded5b79	[SLP] Incorrect handling of external scalar values Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D103954	2021-06-16 13:27:36 +07:00
Andrew Browne	e652d99169	[DFSan][NFC] Fix shadowing variable name.	2021-06-15 22:58:22 -07:00
Rong Xu	82a0bb1afc	[SampleFDO] Place the discriminator flag variable into the used list. We create flag variable "__llvm_fs_discriminator__" in the binary to indicate that FSAFDO hierarchical discriminators are used. This variable might be GC'ed by the linker since it is not explicitly reference. I initially added the var to the use list in pass MIRFSDiscriminator but it did not work. It turned out the used global list is collected in lowering (before MIR pass) and then emitted in the end of pass pipeline. Here I add the variable to the use list in IR level's AddDiscriminators pass. The machine level code is still keep in the case IR's AddDiscriminators is not invoked. If this is the case, this just use -Wl,--export-dynamic-symbol=__llvm_fs_discriminator__ to force the emit. Differential Revision: https://reviews.llvm.org/D103988	2021-06-15 21:51:04 -07:00
Chuanqi Xu	86906304d8	[FuncSpec] Use std::pow instead of operator^ The original implementation calculating UserBonus uses operator ^, which means XOR in C++ language. At the first glance of reviewing, I thought it should be power, my bad. It doesn't make sense to use XOR here. So I believe it should be a carelessness as I made. Test Plan: check-all Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104282	2021-06-16 10:13:21 +08:00
Andrew Browne	af93157625	[DFSan] Handle landingpad inst explicitly as zero shadow. Before this change, DFSan was relying fallback cases when getting origin address. Differential Revision: https://reviews.llvm.org/D104266	2021-06-15 18:28:20 -07:00
Vitaly Buka	6478ef61b1	[asan] Remove Asan, Ubsan support of RTEMS and Myriad Differential Revision: https://reviews.llvm.org/D104279	2021-06-15 12:59:05 -07:00
Roman Lebedev	e52364532a	[NewPM] Remove SpeculateAroundPHIs pass Addition of this pass has been botched. There is no particular reason why it had to be sold as an inseparable part of new-pm transition. It was added when old-pm was still the default, and very very few users were actually tracking new-pm, so it's effects weren't measured. Which means, some of the turnoil of the new-pm transition are actually likely regressions due to this pass. Likewise, there has been a number of post-commit feedback (post new-pm switch), namely * https://reviews.llvm.org/D37467#2787157 (regresses HW-loops) * https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before) * https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata) and in the half year past, the pass authors (google) still haven't found time to respond to any of that. Hereby it is proposed to backout the pass from the pipeline, until someone who cares about it can address the issues reported, and properly start the process of adding a new pass into the pipeline, with proper performance evaluation. Furthermore, neither google nor facebook reports any perf changes from this change, so i'm dropping the pass completely. It can always be re-reverted should/if anyone want to pick it up again. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D104099	2021-06-15 20:35:55 +03:00
Andrew Litteken	2c21278e74	[IROutliner] Adding DebugInfo handling for IR Outlined Functions This adds support for functions outlined by the IR Outliner to be recognized by the debugger. The expected behavior is that it will skip over the instructions included in that section. This is due to the fact that we can not say which of the original locations the instructions originated from. These functions will show up in the call stack, but you cannot step through them. Reviewers: paquette, vsk, djtodoro Differential Revision: https://reviews.llvm.org/D87302	2021-06-15 10:57:08 -05:00
Florian Hahn	f7fc8927c0	[LoopDeletion] Check for irreducible cycles when deleting loops. Loops with irreducible cycles may loop infinitely. Those cannot be removed, unless the loop/function is marked as mustprogress. Also discussed in D103382. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104238	2021-06-15 12:56:12 +01:00
Neil Henning	1540da3b78	ABI breaking changes fixes. This commit mostly just replaces bad uses of `NDEBUG` with uses of `LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI breaking changes (normally extra struct elements in headers). Differential Revision: https://reviews.llvm.org/D104216	2021-06-15 11:08:13 +01:00
Vitaly Buka	b8919fb0ea	[NFC][sanitizer] clang-format some code	2021-06-14 18:05:22 -07:00
Huihui Zhang	1c096bf09f	[SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE. Currently, Loop strengh reduce is not handling loops with scalable stride very well. Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance, (refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added). Memory accesses are incremented by "16vscale", while induction variable is incremented by "8vscale". The scaling factor "2" needs to be extracted to build candidate formula i.e., "reg(%in) + 2reg({0,+,(8 %vscale)}". So that addrec register reg({0,+,(8vscale)}) can be reused among Address and ICmpZero LSRUses to enable optimal solution selection. This patch allow LSR getExactSDiv to recognize special cases like "C1XY /s C2X*Y", and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR is missing candidate formula with proper scaled factor to leverage target scaled-index addressing mode. Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable vector. But allow simple valid scale to pass through. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103939	2021-06-14 16:42:34 -07:00
Matt Morehouse	b87894a1d2	[HWASan] Enable globals support for LAM. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104265	2021-06-14 14:20:44 -07:00
Sanjay Patel	8591640379	[InstCombine] add DeMorgan folds for logical ops in select form We canonicalized to these select patterns (poison-safe logic) with D101191, so we need to reduce 'not' ops when possible as we would with 'and'/'or' instructions. This is shown in a secondary example in: https://llvm.org/PR50389 https://alive2.llvm.org/ce/z/BvsESh	2021-06-14 12:54:35 -04:00
Florian Hahn	96ca03493a	[VectorCombine] Limit scalarization to non-poison indices for now. As Eli mentioned post-commit in D103378, the result of the freeze may still be out-of-range according to Alive2. So for now, just limit the transform to indices that are non-poison.	2021-06-14 16:40:14 +01:00
Jeroen Dobbelaere	bb8ce25e88	Intrinsic::getName: require a Module argument Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary. This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288 (as mentioned by @fhahn), after committing D91250. Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99173	2021-06-14 14:52:29 +02:00
Aditya Kumar	dcbbc69cc5	Calculate getTerminator only when necessary Differential Revision: https://reviews.llvm.org/D104202	2021-06-13 20:16:07 -07:00
Simon Pilgrim	9efe89d82f	BoundsChecking.cpp - tidy implicit header dependencies. NFCI. We don't use <vector> but we do use std::pair (<utility>)	2021-06-13 17:08:15 +01:00
Simon Pilgrim	56541d1377	GVN.cpp - remove unused <vector> include. NFCI.	2021-06-13 14:06:32 +01:00
Simon Pilgrim	c14fd171fe	LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI.	2021-06-13 14:06:32 +01:00
Sanjay Patel	afd44bb6f2	[InstCombine] fold ctlz/cttz of bool types https://alive2.llvm.org/ce/z/tX4pUT	2021-06-13 08:26:40 -04:00
Simon Pilgrim	2477b498f2	ArgumentPromotion.cpp - remove unused <string> include. NFCI.	2021-06-13 13:03:47 +01:00
Simon Pilgrim	b013c58e82	VPlanSLP.cpp - tidy implicit header dependencies. NFCI. We don't use std::string and std::vector, but we do use std::pair and std::max.	2021-06-13 12:37:17 +01:00
Xun Li	fae7debadc	[CHR] Don't run ControlHeightReduction if any BB has address taken This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610. In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable. CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list. However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code. To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken. Added a few test cases. Reviewed By: aeubanks, wenlei, hoy Differential Revision: https://reviews.llvm.org/D103867	2021-06-12 10:29:53 -07:00
Kevin Athey	1d22596b2f	[sanitizer] Remove numeric values from -asan-use-after-return flag. (NFC) for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104152	2021-06-11 15:14:51 -07:00
Kevin Athey	e0b469ffa1	[clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang. Also: - add driver test (fsanitize-use-after-return.c) - add basic IR test (asan-use-after-return.cpp) - (NFC) cleaned up logic for generating table of __asan_stack_malloc depending on flag. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104076	2021-06-11 12:07:35 -07:00
Valery N Dmitriev	94a07c79cf	[SLP][NFC] Fix condition that was supposed to save a bit of compile time. It was found by chance revealing discrepancy between comment (few lines above), the condition and how re-ordering of instruction is done inside the if statement it guards. The condition was always evaluated to true. Differential Revision: https://reviews.llvm.org/D104064	2021-06-11 10:08:55 -07:00
Adam Nemet	e0efebb8eb	[Matrix] In transpose opts, handle a^t * a^t Without the fix the testcase crashes because we remove the same instruction twice. Differential Revision: https://reviews.llvm.org/D104127	2021-06-11 09:29:43 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Matt Morehouse	0867edfc64	[HWASan] Add basic stack tagging support for LAM. Adds the basic instrumentation needed for stack tagging. Currently does not support stack short granules or TLS stack histories, since a different code path is followed for the callback instrumentation we use. We may simply wait to support these two features until we switch to a custom calling convention. Patch By: xiangzhangllvm, morehouse Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102901	2021-06-11 08:21:17 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Sjoerd Meijer	9907746f5d	Move Function Specialization to its correct location. NFC. As a follow up of rGc4a0969b9c14, and as part of D104102, move it to the IPO transformations directory.	2021-06-11 15:00:10 +01:00
Sanjay Patel	602ab24833	[SimplifyCFG] avoid crash on degenerate loop The problematic code pattern in the test is based on: https://llvm.org/PR50638 If the IfCond is itself the phi that we are trying to remove, then the loop around line 2835 can end up with something like: %cmp = select i1 %cmp, i1 false, i1 true That can then lead to a use-after-free and assert (although I'm still not seeing that locally in my release + asserts build). I think this can only happen with unreachable code. Differential Revision: https://reviews.llvm.org/D104063	2021-06-11 09:37:06 -04:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Roman Lebedev	20542b47d6	[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper This results in slightly more optimistic alignments in some cases	2021-06-11 12:47:10 +03:00
Roman Lebedev	abc0e0125c	[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function	2021-06-11 12:47:09 +03:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Sjoerd Meijer	c4a0969b9c	Function Specialization Pass This adds a function specialization pass to LLVM. Constant parameters like function pointers and constant globals are propagated to the callee by specializing the function. This is a first version with a number of limitations: - The pass is off by default, so needs to be enabled on the command line, - It does not handle specialization of recursive functions, - It does not yet handle constants and constant ranges, - Only 1 argument per function is specialised, - The cost-model could be further looked into, and perhaps related, - We are not yet caching analysis results. This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan. More recently this was also discussed on the list, see: https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html. The motivation for this work is that function specialisation often comes up as a reason for performance differences of generated code between LLVM and GCC, which has this enabled by default from optimisation level -O3 and up. And while this certainly helps a few cpu benchmark cases, this also triggers in real world codes and is thus a generally useful transformation to have in LLVM. Function specialisation has great potential to increase compile-times and code-size. The summary from some investigations with this patch is: - Compile-time increases for short compile jobs is high relatively, but the increase in absolute numbers still low. - For longer compile-jobs, the extra compile time is around 1%, and very much in line with GCC. - It is difficult to blame one thing for compile-time increases: it looks like everywhere a little bit more time is spent processing more functions and instructions. - But the function specialisation pass itself is not very expensive; it doesn't show up very high in the profile of the optimisation passes. The goal of this work is to reach parity with GCC which means that eventually we would like to get this enabled by default. But first we would like to address some of the limitations before that. Differential Revision: https://reviews.llvm.org/D93838	2021-06-11 09:11:29 +01:00
Qiu Chaofan	2670c7dd5b	[VectorCombine] Fix alignment in single element store This fixes the concern in single element store scalarization that the alignment of new store may be larger than it should be. It calculates the largest alignment if index is constant, and a safe one if not. Reviewed By: lebedev.ri, spatel Differential Revision: https://reviews.llvm.org/D103419	2021-06-11 10:28:15 +08:00
Slava Nikolaev	119965865c	LoadStoreVectorizer: support different operand orders in the add sequence match First we refactor the code which does no wrapping add sequences match: we need to allow different operand orders for the key add instructions involved in the match. Then we use the refactored code trying 4 variants of matching operands. Originally the code relied on the fact that the matching operands of the two last add instructions of memory index calculations had the same LHS argument. But which operand is the same in the two instructions is actually not essential, so now we allow that to be any of LHS or RHS of each of the two instructions. This increases the chances of vectorization to happen. Reviewed By: volkan Differential Revision: https://reviews.llvm.org/D103912	2021-06-10 16:31:35 -07:00
Andy Kaylor	41555eaf65	Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store. Added a LIT test to catch one case. Patch by Mark Mendell Differential Revision: https://reviews.llvm.org/D103254	2021-06-10 15:47:03 -07:00
Joachim Meyer	4f01122c3f	[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads. This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL. The question remains why this was initially added and what the implications of removing this optimization would be. Do we need an alternative mechanism to propagate the information about legality of if-conversion? Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general? I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous. Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103907	2021-06-10 23:37:57 +02:00
Philip Reames	7629b2a09c	[LI] Add a cover function for checking if a loop is mustprogress [nfc] Essentially, the cover function simply combines the loop level check and the function level scope into one call. This simplifies several callers and is (subjectively) less error prone.	2021-06-10 13:37:32 -07:00
Philip Reames	b6ee5f2b1d	Move code for checking loop metadata into Analysis [nfc] I need the mustprogress loop metadata in ScalarEvolution and it makes sense to keep all the accessors for quering loop metadate together.	2021-06-10 13:01:22 -07:00
Alexey Bataev	a893b44187	[SLP]Disable scheduling of insertelements. There is no need to schedule insertelement instructions. The compiler did not schedule them before it started support their vectorization and it should not do it after. We pre-schedule them manually when finding a build vector sequence. Disabling scheduling of insertelement instructions improves compile time and vectorization of the very large basic blocks by saving scheduling budget for other instructions. Differential Revision: https://reviews.llvm.org/D104026	2021-06-10 10:25:26 -07:00
Keith Smiley	026170d17d	Fix range-loop-analysis warning ``` llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis] for (const auto VF : VFCandidates) { ^ llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying for (const auto VF : VFCandidates) { ^~~~~~~~~~~~~~~ & 1 warning generated. ``` Differential Revision: https://reviews.llvm.org/D103970	2021-06-10 08:39:54 -07:00
Caroline Concatto	1ad52105eb	[InstCombine] Add fold for extracting known elements from a stepvector This patch allows folding stepvector + extract to the lane when the lane is lower than the minimum size of the scalable vector. This fold is possible because lane X of a stepvector is also X! For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3. Differential Revision: https://reviews.llvm.org/D103153	2021-06-10 13:36:57 +01:00
Simon Pilgrim	b01d393fc0	Fix MSVC int64_t -> uint64_t "narrowing conversion" warning.	2021-06-10 10:55:24 +01:00
Jon Roelofs	f8f1c9c389	Annotate memcpy's of globals with info about the src/dst Differential revision: https://reviews.llvm.org/D103994	2021-06-09 18:11:08 -07:00
Joseph Huber	4c9471581f	[Attributor] Set floating point loads and stores as nofree in AANoFreeFloating Summary: The current implementation of AANoFreeFloating will incorrectly list floating point loads and stores as may-free. This prevents other attributor instances like HeapToStack from pushing some allocations to the stack. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103975	2021-06-09 16:16:37 -04:00
Leonard Chan	314c049142	[compiler-rt][hwasan] Decouple use of the TLS global for getting the shadow base and using the frame record feature This allows for using the frame record feature (which uses __hwasan_tls) independently from however the user wants to access the shadow base, which prior was only usable if shadow wasn't accessed through the TLS variable or ifuncs. Frame recording can be explicitly set according to ShadowMapping::WithFrameRecord in ShadowMapping::init. Currently, it is only enabled on Fuchsia and if TLS is used, so this should mimic the old behavior. Added an extra case to prologue.ll that covers this new case. Differential Revision: https://reviews.llvm.org/D103841	2021-06-09 12:55:19 -07:00
LemonBoy	d3faef6eef	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 16:36:58 +02:00
Alexey Bataev	a0086add2e	[SLP]Improve gathering of scalar elements. 1. Better sorting of scalars to be gathered. Trying to insert constants/arguments/instructions-out-of-loop at first and only then the instructions which are inside the loop. It improves hoisting of invariant insertelements instructions. 2. Better detection of shuffle candidates in gathering function. 3. The cost of insertelement for constants is 0. Part of D57059. Differential Revision: https://reviews.llvm.org/D103458	2021-06-09 05:23:21 -07:00
Nico Weber	205cde63c7	Revert "[SROA] Avoid splitting loads/stores with irregular type" This reverts commit `905f4eb537`. Breaks check-llvm on most (all?) bots, see https://reviews.llvm.org/D99435	2021-06-09 06:32:58 -04:00
LemonBoy	905f4eb537	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 11:48:20 +02:00
Jingu Kang	8eee02020b	[LoopBoundSplit] Ignore phi node which is not scevable There was a bug in LoopBoundSplit. The pass should ignore phi node which is not scevable. Differential Revision: https://reviews.llvm.org/D103913	2021-06-09 09:44:36 +01:00
Kevin Athey	af8c59e06d	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-08 14:39:06 -07:00
Sanjay Patel	d2012d965d	[InstCombine] fix nsz (fast-math) propagation from fneg-of-select As discussed in the post-commit comments for: `3cdd05e519` It seems to be safe to propagate all flags from the final fneg except for 'nsz' to the new select: https://alive2.llvm.org/ce/z/J_APDc nsz has unique FMF semantics: it is not poison, it is only "insignificant" in the calculation according to the LangRef.	2021-06-08 17:04:30 -04:00
David Green	297088d1ad	Revert "[DSE] Remove stores in the same loop iteration" Apparently non-dead stores are being removed, as noted in D100464. This reverts commit `222aeb4d51`.	2021-06-08 21:23:08 +01:00
Hans Wennborg	386b66b2fc	Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" > This reapplies `c0f3dfb9`, which was reverted following the discovery of > crashes on linux kernel and chromium builds - these issues have since > been fixed, allowing this patch to re-land. This reverts commit `36ec97f76a`. The change caused non-determinism in the compiler, see comments on the code review at https://reviews.llvm.org/D91722. Reverting to unbreak people's builds until that can be addressed. This also reverts the follow-up "[DebugInfo] Limit the number of values that may be referenced by a dbg.value" in `a0bd6105d8`.	2021-06-08 14:54:08 +02:00
maekawatoshiki	09e92c607c	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Also, a crash problem on legacy pass manager is fixed. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-06-08 20:30:02 +09:00
Simon Pilgrim	596004a947	MemCpyOptimizer.cpp - hasUndefContentsMSSA - Pass DataLayout by reference. NFCI.	2021-06-08 10:41:02 +01:00
Kerry McLaughlin	14eeccfe9a	[LoopVectorize] Don't use strict reductions when reordering is allowed If the `-enable-strict-reductions` flag is set to true, then currently we will always choose to vectorize the loop with strict in-order reductions. This is not necessary where we allow the reordering of FP operations, such as when loop hints are passed via metadata. This patch moves useOrderedReductions so that we can also check whether loop hints allow reordering, in which case we should use the default behaviour of vectorizing with unordered reductions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103814	2021-06-08 10:39:29 +01:00
George Balatsouras	5b4dda550e	[dfsan] Add full fast8 support Complete support for fast8: - amend shadow size and mapping in runtime - remove fast16 mode and -dfsan-fast-16-labels flag - remove legacy mode and make fast8 mode the default - remove dfsan-fast-8-labels flag - remove functions in dfsan interface only applicable to legacy - remove legacy-related instrumentation code and tests - update documentation. Reviewed By: stephan.yichao.zhao, browneee Differential Revision: https://reviews.llvm.org/D103745	2021-06-07 17:20:54 -07:00
Arthur Eubanks	47211fa889	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" Needs to be discussed more. This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46 This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1 This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23 This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd	2021-06-07 16:07:44 -07:00
Nikita Popov	8fdd7c2ff1	[LoopUnroll] Clamp unroll count to MaxTripCount Unrolling with more iterations than MaxTripCount is pointless, as those iterations can never be executed. As such, we clamp ULO.Count to MaxTripCount if it is known. This means we no longer need to consider iterations after MaxTripCount for exit folding, and the CompletelyUnroll flag becomes independent of ULO.TripCount. Differential Revision: https://reviews.llvm.org/D103748	2021-06-07 21:08:42 +02:00
Philip Reames	c880d5e583	[RS4GC] Treat inttoptr as base pointer This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message. inttoptr for a non-integral address space is currently ill defined in the LangRef. Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled. Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons. First, as a simple consistency argument. We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them. Having a lack of constant folding trigger a crash during lowering is non-ideal. Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code. At the same time, we can't assume that dynamically dead code is always pruned before lowering. As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths. We need the lowering to not crash. The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash. Differential Revision: https://reviews.llvm.org/D103492	2021-06-07 10:27:23 -07:00
Sanjay Patel	4675beaa21	[InstCombine] intersect nsz and ninf fast-math-flags (FMF) for fneg(fdiv) fold https://alive2.llvm.org/ce/z/3KPvih https://llvm.org/PR49654	2021-06-07 13:22:49 -04:00
Sanjay Patel	519e98cd9a	[InstCombine] refactor match clauses; NFC We need to adjust the FMF propagation on at least one of these transforms as discussed in: https://llvm.org/PR49654 ...so this should make it easier to intersect flags.	2021-06-07 13:22:49 -04:00
Florian Hahn	1465e7770b	[VPlan] Print successors of VPRegionBlocks. The non-DOT printing does not include the successors of VPregionBlocks. This patch use the same style for printing successors as for VPBasicBlock. I think the printing of successors could be a bit improved further, as at the moment it is hard to ensure a check line matches all successors. But that can be done as follow-up. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D103515	2021-06-07 17:57:21 +01:00
Fraser Cormack	ae3f6de3a8	[InstCombine] Support negation of scalable-vector splats This patch is an extension of D103421. It allows the InstCombiner to generate the negated form of integer scalable-vector splats. It can technically handle fixed-length vectors too but those are completely covered by the preceding logic. This enables extra combining opportunities for scalable vector types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103801	2021-06-07 15:14:00 +01:00
Daniil Seredkin	7736c1936a	[InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math If FP reassociation (fast-math) is allowed, then LLVM is free to do the following transformation pow(x, y) * pow(x, z) -> pow(x, y + z). This patch adds this transformation and tests for it. See more https://bugs.llvm.org/show_bug.cgi?id=47205 It handles two cases 1. When operands of fmul are different instructions %4 = call reassoc float @llvm.pow.f32(float %0, float %1) %5 = call reassoc float @llvm.pow.f32(float %0, float %2) %6 = fmul reassoc float %5, %4 --> %3 = fadd reassoc float %1, %2 %4 = call reassoc float @llvm.pow.f32(float %0, float %3) 2. When operands of fmul are the same instruction %4 = call reassoc float @llvm.pow.f32(float %0, float %1) %5 = fmul reassoc float %4, %4 --> %3 = fadd reassoc float %1, %1 %4 = call reassoc float @llvm.pow.f32(float %0, float %3) Differential Revision: https://reviews.llvm.org/D102574	2021-06-07 08:08:05 -04:00
Liqiang Tao	4a0de622c3	[llvm] Add interface to order inlining This patch abstract Calls in Inliner:run() to InlineOrder. With this patch, it's possible to customize the inlining order, e.g. use queue or priority queue. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D103315	2021-06-07 18:27:55 +08:00
Jingu Kang	a2a0ac42ab	[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV This pass transforms loops that contain a conditional branch with induction variable. For example, it transforms left code to right code: newbound = min(n, c) while (iv < n) { while(iv < newbound) { A A if (iv < c) B B C C } } if (iv != n) { while (iv < n) { A C } } Differential Revision: https://reviews.llvm.org/D102234	2021-06-07 10:55:25 +01:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
maekawatoshiki	0a9d079931	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `2165360003`. To fix the crash problem in legacy pass manager	2021-06-07 01:26:47 +09:00
Simon Pilgrim	9ced408fe9	SimplifyCFG.cpp - remove dead early-return code added at rGcc63203908da. NFCI. We've already checked that ScanIdx == 0 a few lines above.	2021-06-06 14:15:11 +01:00
Liqiang Tao	48252d7570	Revert "[llvm] Add interface to order inlining"	2021-06-06 14:45:03 +08:00
Liqiang Tao	478dc47292	[llvm] Add interface to order inlining This patch abstract Calls in Inliner:run() to InlineOrder. With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D103315	2021-06-06 12:03:02 +08:00
Roman Lebedev	e350494fb0	[NFC] Promote willNotOverflow() / getStrengthenedNoWrapFlagsFromBinOp() from IndVars into SCEV proper We might want to use it when creating SCEV proper in createSCEV(), now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`, which might have caused us to loose some optimization potential.	2021-06-05 12:17:51 +03:00
Nikita Popov	db45746821	[LoopUnroll] Separate peeling from unrolling Loop peeling is currently performed as part of UnrollLoop(). Outside test scenarios, it is always performed with an unroll count of 1. This means that unrolling doesn't actually do anything apart from performing post-unroll simplification. When testing, it's currently possible to specify both an explicit peel count and an explicit unroll count. This doesn't perform any sensible operation and may result in miscompiles, see https://bugs.llvm.org/show_bug.cgi?id=45939. This patch moves peeling from UnrollLoop() into tryToUnrollLoop(), so that peeling does not also perform a susequent unroll. We only run the post-unroll simplifications. Specifying both an explicit peel count and unroll count is forbidden. In the future, we may want to support both (non-PGO) peeling a loop and unrolling it, but this needs to be done by first performing the peel and then recalculating unrolling heuristics on a now possibly analyzable loop. Differential Revision: https://reviews.llvm.org/D103362	2021-06-05 10:32:00 +02:00
Vitaly Buka	e3258b0894	Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always)." Windows is still broken. This reverts commit `927688a4cd`.	2021-06-05 00:39:50 -07:00
Kevin Athey	927688a4cd	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-05 00:26:10 -07:00
Fangrui Song	06e7de795b	Fix some -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build	2021-06-04 23:34:43 -07:00
Vitaly Buka	d8a4a2cb93	Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always)." Reverts commits of D103304, it breaks Darwin. This reverts commit `60e5243e59`. This reverts commit `26b3ea224e`. This reverts commit `17600ec32a`.	2021-06-04 20:20:11 -07:00
Kevin Athey	60e5243e59	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-04 16:30:47 -07:00
Fangrui Song	9e51d1f348	[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat `__profd_` variables are referenced by code only when value profiling is enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just waste space on ELF/Mach-O. We change the comdat symbol from `__profd_` to `__profc_` because an internal symbol does not provide deduplication features on COFF. The choice doesn't matter on ELF. (In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_` symbols.) On Windows this enables further optimization. We are no longer affected by the link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can cause duplicate definition error. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585). This avoids many `/INCLUDE:` directives in `.drectve`. Here is rnk's measurement for Chrome: ``` This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%: #BEFORE $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 1047758867 $ du -cksh base_unittests.exe 82M base_unittests.exe 82M total # AFTER $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 937886499 $ du -cksh base_unittests.exe 78M base_unittests.exe 78M total ``` The change is NFC for Mach-O. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103372	2021-06-04 13:27:56 -07:00
Nikita Popov	14f350daf2	[IndVars] Don't forget value when inferring nowrap flags When SimplifyIndVars infers IR nowrap flags from SCEV, this may happen in two ways: Either nowrap flags were already present in SCEV and just get transferred to IR. Or zero/sign extension of addrecs infers additional nowrap flags, and those get transferred to IR. In the latter case, calling forgetValue() ensures that the newly inferred nowrap flags get propagated to any other SCEV expressions based on the addrec. However, the invalidation can also have a major compile-time effect in some cases. For https://bugs.llvm.org/show_bug.cgi?id=50384 with n=512 compile- time drops from 7.1s to 0.8s without this invalidation. At the same time, removing the invalidation doesn't affect any codegen in test-suite. Differential Revision: https://reviews.llvm.org/D103424	2021-06-04 20:57:22 +02:00
Rong Xu	8d581857d7	[SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is for llvm-profdata part of change. It sets the bit masks for the profile reader in llvm-profdata. Also add an internal option "-fs-discriminator-pass" for show and merge command to process the profile offline. This patch also moved setDiscriminatorMaskedBitFrom() to SampleProfileReader::create() to simplify the interface. Differential Revision: https://reviews.llvm.org/D103550	2021-06-04 11:22:06 -07:00
Adam Nemet	ffde966cd9	[Matrix] Fix transpose-multiply folding if transpose has multiple uses Don't add it to FusedInsts in this case. Differential Revision: https://reviews.llvm.org/D103627	2021-06-04 10:55:03 -07:00
Joseph Huber	4a08163c73	[Attributor] Check HeapToStack's state for isKnownHeapToStack This patch changes the `isKnownHeapToStack` and `isAssumedHeapToStack` member functions to return if a function call is going to be altered by HeapToStack. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103574	2021-06-04 12:38:33 -04:00
Nico Weber	e9a9c85098	Revert "[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat" This reverts commit `a14fc749aa`. Breaks check-profile on macOS. See https://reviews.llvm.org/D103372 for details.	2021-06-04 10:00:12 -04:00
Sanjay Patel	23a116c8c4	[InstCombine] convert lshr to ashr to eliminate cast op This is similar to `b865eead76` ( D103617 ) and fixes: https://llvm.org/PR50575 `41b71f718b` did this and more (noted with TODO comments in the tests), but it didn't handle the case where the destination is narrower than the source, so it got reverted. This is a simple match-and-replace. If there's evidence that the TODO cases are useful, we can revisit/extend.	2021-06-04 07:04:37 -04:00
Nico Weber	5c600dc6d4	Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always)." This reverts commit `41b3088c3f`. Doesn't build on macOS, see comments on https://reviews.llvm.org/D103304	2021-06-03 21:01:11 -04:00
Arthur Eubanks	edf2056ff3	[BuildLibCalls] Properly set ABI attributes on arguments Some floating point lib calls have ABI attributes that need to be set on the caller. Found via D103412. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103415	2021-06-03 15:45:07 -07:00
Philip Reames	a4b924a017	Kill a variable which is unused after `cddcc4cf` [nfc]	2021-06-03 14:38:57 -07:00
Philip Reames	cddcc4cff5	A couple style tweaks on top of `5c0d1b2f9` [nfc]	2021-06-03 14:14:59 -07:00
Philip Reames	5c0d1b2f90	[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed. The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile. Differential Revision: https://reviews.llvm.org/D103620	2021-06-03 14:09:16 -07:00
Fangrui Song	a14fc749aa	[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat `__profd_` variables are referenced by code only when value profiling is enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just waste space on ELF/Mach-O. We change the comdat symbol from `__profd_` to `__profc_` because an internal symbol does not provide deduplication features on COFF. The choice doesn't matter on ELF. (In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_` symbols.) On Windows this enables further optimization. We are no longer affected by the link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can cause duplicate definition error. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585). This avoids many `/INCLUDE:` directives in `.drectve`. Here is rnk's measurement for Chrome: ``` This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%: #BEFORE $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 1047758867 $ du -cksh base_unittests.exe 82M base_unittests.exe 82M total # AFTER $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 937886499 $ du -cksh base_unittests.exe 78M base_unittests.exe 78M total ``` Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103372	2021-06-03 13:16:13 -07:00
Kevin Athey	41b3088c3f	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-03 13:13:51 -07:00
Sanjay Patel	b865eead76	[InstCombine] eliminate sext and/or trunc if value has enough signbits If we have enough signbits in a source value, we can skip an intermediate cast for a trunc+sext pair: https://alive2.llvm.org/ce/z/A_mQt- This is the original problem shown in: https://llvm.org/PR49543 There's a test that shows we transformed what used to be a pair of shifts, so that suggests we could add another ComputeNumSignBits fold starting from a shift. There does not appear to be any change in compile-time from the extra analysis: https://llvm-compile-time-tracker.com/compare.php?from=3d2c9069dcafd0cbb641841aa3dd6e851fb7d760&to=b9513cdf2419704c7bb0c3a02a9ca06aae13d902&stat=instructions Differential Revision: https://reviews.llvm.org/D103617	2021-06-03 13:58:19 -04:00
Philip Reames	44d70d298a	[LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc] This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly. Differential Revision: https://reviews.llvm.org/D103584	2021-06-03 10:33:14 -07:00
Alexey Bataev	8c48d77cdf	[SLP]Improve cost estimation/emission of externally used extractelements. No need to recalculate the cost of extractelements, just no need to compensate the cost of all extractelements, need to check before if this is actually going to be removed at the vectorization. Also, no need to generate new extractelement instruction, we may just regenerate the original one. It may improve the final vectorization. Differential Revision: https://reviews.llvm.org/D102933	2021-06-03 10:26:59 -07:00
Philip Reames	bb5e1c6dcb	[LoopUnroll] Reorder code to max dom tree update more obvious [nfc] This cleans up the unroll action into two phases. Phase 1 does the mechanical act of unrolling, and leaves all conditional branches in place. Phase 2 optimizes away some of the conditional branches and then simplifies the loop. The primary benefit of the reordering is that we can delete some special cases dom tree update logic. Differential Revision: https://reviews.llvm.org/D103561	2021-06-03 10:19:56 -07:00
Alexey Bataev	89f3bc7698	[SLP]Allow to reorder nodes with >2 scalar values. tryToVectorizeList function allows to reorder only 2 scalars. Patch allows to reorder >2 scalars. Also, to avoid possible regressions, it allows extra vectorization of the remaining parts of the scalars elements if possible. Part of D57059. Differential Revision: https://reviews.llvm.org/D103247	2021-06-03 10:01:36 -07:00
Harald van Dijk	5d2b3de284	[SLP] Avoid std::stable_sort(properlyDominates()). As noticed by NAKAMURA Takumi back in 2017, we cannot use properlyDominates for std::stable_sort as properlyDominates only partially orders blocks. That is, for blocks A, B, C, D, where A dominates B and C dominates D, we have A == C, B == C, but A < B. This is not a valid comparison function for std::stable_sort and causes different results between libstdc++ and libc++. This change uses DFS numbering to give deterministic results for all reachable blocks. Unreachable blocks are ignored already, so do not need special consideration. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103441	2021-06-03 17:51:52 +01:00
Hamza Mahfooz	83235b07e3	[Matrix] Preserve existing fast-math flags during lowering This patch makes it so, floating-point instructions created in LowerMatrixIntrinsics retain fast-math flags from instructions that are higher up the chain. Fixes https://bugs.llvm.org/show_bug.cgi?id=49738 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103233	2021-06-03 15:29:31 +01:00
Arthur Eubanks	1faff79b7c	[DFSan] Properly set argument ABI attributes Calls must properly match argument ABI attributes with the callee. Found via D103412. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D103414	2021-06-02 22:24:46 -07:00
Fangrui Song	87c43f3aa9	[InstrProfiling] Delete linkage/visibility toggling for Windows The linkage/visibility of `__profn_` variables are derived from the profiled functions. extern_weak => linkonce available_externally => linkonce_odr internal => private extern => private _ => unchanged The linkage/visibility of `__profc_`/`__profd_` variables are derived from `__profn_` with linkage/visibility wrestling for Windows. The changes can be folded to the following without changing semantics. ``` if (TT.isOSBinFormatCOFF() && !NeedComdat) { Linkage = GlobalValue::InternalLinkage; Visibility = GlobalValue::DefaultVisibility; } ``` That said, I think we can just delete the code block. An extern/internal function will now use private `__profc_`/`__profd_` variables, instead of internal ones. This saves some symbol table entries. A non-comdat {linkonce,weak}_odr function will now use hidden external `__profc_`/`__profd_` variables instead of internal ones. There is potential object file size increase because such symbols need `/INCLUDE:` directives. However such non-comdat functions are rare (note that non-comdat weak definitions don't prevent duplicate definition error). The behavior changes match ELF. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103355	2021-06-02 16:49:54 -07:00
Dave Lee	60ce8babf7	[coro] Preserve scope line for compiler generated functions Coro-split functions with an active suspend point have their scope line set to the line of the suspend point. However for compiler generated functions, this results in debug info with unconventional results: a file named `<compiler-generated>` with a non-zero line number. The convention for `<compiler-generated>` is that the line number is zero. This change propagates the scope line only for non-compiler generated functions. Differential Revision: https://reviews.llvm.org/D102412	2021-06-02 15:57:12 -07:00
Andrew Browne	70804f2a2f	Fix dfsan handling of musttail calls. Without this change, a callsite like: [[clang::musttail]] return func_call(x); will cause an error like: fatal error: error in backend: failed to perform tail call elimination on a call site marked musttail due to DFSan inserting instrumentation between the musttail call and the return. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D103542	2021-06-02 11:38:35 -07:00
Rong Xu	6745ffe4fa	[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators. This patch also simplified the bit API used by FS discriminators. Differential Revision: https://reviews.llvm.org/D103041	2021-06-02 10:32:52 -07:00
Stephen Tozer	4316b0e59c	[LoopStrengthReduce] Ensure that debug intrinsics do not affect LSR's output During Loop Strength Reduce, if the terminating condition for the loop is not immediately adjacent to the terminating branch and it has more than one use, a clone of the condition will be created just before the terminating branch and will be used as the branch condition. Currently, whether the instructions are "immediately adjacent" is determined by checking whether the next instruction after the condition is the terminating branch; this is incorrect however, as the presence of a debug intrinsic between the two will result in a change to the output. This is fixed by using getNextNonDebugInstruction() instead. Differential Revision: https://reviews.llvm.org/D103033	2021-06-02 15:56:23 +01:00
Arnold Schwaighofer	f1a0c5d67c	[coro async] Add the swiftasync attribute to the resume partial function Transfer the swiftasync attribute to the resume partial function according to suspend.async specification. It's first argument denotes which argument is the async context. rdar://71499498 Differential Revision: https://reviews.llvm.org/D103285	2021-06-02 07:44:33 -07:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Sander de Smalen	034503e9d2	[LV] NFC: Remove redundant isLegalMasked(Gather\|Scatter) functions. This NFC change follows from conversation in D102437, where it was discussed to remove these functions as a separate patch.	2021-06-02 14:09:07 +01:00
Sander de Smalen	3472d3fd9d	[LV] NFC: Replace custom getMemInstValueType by llvm::getLoadStoreType. llvm::getLoadStoreType was added recently and has the same implementation as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no value in having two implementations, this patch removes the custom LV implementation in favor of the generic one defined in Instructions.h.	2021-06-02 14:09:06 +01:00
Jingu Kang	f3a27511c9	[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch This re-enables commit `107d19eb01` with bug fixes. Differential Revision: https://reviews.llvm.org/D99354	2021-06-02 10:58:22 +01:00
Bjorn Pettersson	9c54ee4378	[SimplifyLibCalls] Take size of int into consideration when emitting ldexp/ldexpf When rewriting powf(2.0, itofp(x)) -> ldexpf(1.0, x) exp2(sitofp(x)) -> ldexp(1.0, sext(x)) exp2(uitofp(x)) -> ldexp(1.0, zext(x)) the wrong type was used for the second argument in the ldexp/ldexpf libc call, for target architectures with 16 bit "int" type. The transform incorrectly used a bitcasted function pointer with a 32-bit argument when emitting the ldexp/ldexpf call for such targets. The fault is solved by using the correct function prototype in the call, by asking TargetLibraryInfo about the size of "int". TargetLibraryInfo by default derives the size of the int type by assuming that it is 16 bits for 16-bit architectures, and 32 bits otherwise. If this isn't true for a target it should be possible to override that default in the TargetLibraryInfo initializer. Differential Revision: https://reviews.llvm.org/D99438	2021-06-02 11:40:34 +02:00
Daniil Fukalov	0b34acdab7	[NFC] Fix 'Load' name masking. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103456	2021-06-02 11:09:53 +03:00
Arthur Eubanks	2983053d23	[NFC][OpaquePtr] Explicitly pass GEP source type to IRBuilder in more places	2021-06-01 13:13:37 -07:00
Harald van Dijk	f126e8ec28	[SLPVectorizer] Ignore unreachable blocks As the existing test unreachable.ll shows, we should be doing more work to avoid entering unreachable blocks: we should not stop vectorization just because a PHI incoming value from an unreachable block cannot be vectorized. We know that particular value will never be used so we can just replace it with poison.	2021-06-01 20:21:04 +01:00
Alexey Bataev	36911971a5	[SLP]Better detection of perfect/shuffles matches for gather nodes. Implemented better scheme for perfect/shuffled matches of the gather nodes which allows to fix the performance regressions introduced by earlier patches. Starting detecting matches for broadcast nodes and extractelement gathering. Differential Revision: https://reviews.llvm.org/D102920	2021-06-01 07:08:07 -07:00
Daniil Seredkin	13140120dc	[InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y) InstCombine didn't perform the transformations when fmul's operands were the same instruction because it required to have one use for each of them which is false in the case. This patch fixes this + adds tests for them and introduces a new function isOnlyUserOfAnyOperand to check these cases in a single place. This patch is a result of discussion in D102574. Differential Revision: https://reviews.llvm.org/D102698	2021-06-01 08:33:23 -04:00
Florian Hahn	1b84acb23a	[LoopDeletion] Consider infinite loops alive, unless mustprogress. The current loop or any of its sub-loops may be infinite. Unless the function or the loops are marked as mustprogress, this in itself makes the loop not dead. This patch moves the logic to check whether the current loop is finite or mustprogress to `isLoopDead` and also extends it to check the sub-loops. This should fix PR50511. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D103382	2021-06-01 13:07:36 +01:00
Florian Hahn	d4c070d801	[VectorCombine] Freeze index unless it is known to be non-poison. If the index itself is already poison, the poison propagates through instructions clamping the index to a valid range. This still causes introducing a load of poison, as flagged by Alive2 and pointed out at `575e2aff55`. This patch updates the code to freeze the index, unless it is proven to not be poison. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D103378	2021-06-01 10:40:57 +01:00
Nathan Chancellor	e6b086bef2	Revert "[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)" This reverts commit `4f2fd3818b`. The Linux kernel fails to build after this commit. See https://reviews.llvm.org/D99481 for a reproducer. Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2021-05-31 20:21:26 -07:00
Arthur Eubanks	372237487e	[OpaquePtr] Remove some uses of PointerType::getElementType()	2021-05-31 16:11:25 -07:00
Congzhe Cao	bfefde22b6	[LoopInterhcange] Handle movement of reduction phis appropriately This patch fixes pr43326 and pr48212. Currently when we move reduction phis to the right place, loop interchange assumes the first phi in loop headers is an induction phi, skips the first phi and assumes the rest of phis are candidate reduction phis to move. However, it may not always be the case. This patch loops over all phis in loop headers and considers a phi node as a candidate reduction phi to move only when it is indeed a reduction phi across outer and inner loop. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102743	2021-05-31 16:27:38 -04:00
Florian Hahn	aa00b1d763	[LV] Try to sink users recursively for first-order recurrences. Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction. Fixes PR44286 (and another PR or two). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84951	2021-05-31 19:55:33 +01:00
Roman Lebedev	f7c95c3322	[NFC] ScalarEvolution: apply SSO to the ExprValueMap value ExprValueMap is a map from SCEV * to a set-vector of (Value , ConstantInt ) pair, and while the map itself will likely be big-ish (have many keys), it is a reasonable assumption that each key will refer to a small-ish number of pairs. In particular looking at n=512 case from https://bugs.llvm.org/show_bug.cgi?id=50384, the small-size of 4 appears to be the sweet spot, it results in the least allocations while minimizing memory footprint. ``` $ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i \| tail -n 6; echo ""; done heaptrack.opt.0-orig.gz total runtime: 14.32s. calls to allocation functions: 8222442 (574192/s) temporary memory allocations: `2419000` (168924/s) peak heap memory consumption: 190.98MB peak RSS (including heaptrack overhead): 239.65MB total memory leaked: 67.58KB heaptrack.opt.1-n1.gz total runtime: 13.72s. calls to allocation functions: 7184188 (523705/s) temporary memory allocations: 2419017 (176338/s) peak heap memory consumption: 191.38MB peak RSS (including heaptrack overhead): 239.64MB total memory leaked: 67.58KB heaptrack.opt.2-n2.gz total runtime: 12.24s. calls to allocation functions: 6146827 (502355/s) temporary memory allocations: 2418997 (197695/s) peak heap memory consumption: 163.31MB peak RSS (including heaptrack overhead): 211.01MB total memory leaked: 67.58KB heaptrack.opt.3-n4.gz total runtime: 12.28s. calls to allocation functions: 6068532 (494260/s) temporary memory allocations: 2418985 (197017/s) peak heap memory consumption: 155.43MB peak RSS (including heaptrack overhead): 201.77MB total memory leaked: 67.58KB heaptrack.opt.4-n8.gz total runtime: 12.06s. calls to allocation functions: 6068042 (503321/s) temporary memory allocations: 2418992 (200646/s) peak heap memory consumption: 166.03MB peak RSS (including heaptrack overhead): 213.55MB total memory leaked: 67.58KB heaptrack.opt.5-n16.gz total runtime: 12.14s. calls to allocation functions: 6067993 (499958/s) temporary memory allocations: 2418999 (199307/s) peak heap memory consumption: 187.24MB peak RSS (including heaptrack overhead): 233.69MB total memory leaked: 67.58KB ``` While that test may be an edge worst-case scenario, https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions agrees that this also results in improvements in the usual situations.	2021-05-31 15:34:03 +03:00
Juneyoung Lee	7161bb87c9	[InsCombine] Fix a few remaining vec transforms to use poison instead of undef This is a patch that replaces shufflevector and insertelement's placeholder value with poison. Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead (D93818) The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 . This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.	2021-05-31 18:47:09 +09:00
David Green	222aeb4d51	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-05-31 10:22:37 +01:00
Hyeongyu Kim	4f2fd3818b	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-05-31 14:08:20 +09:00
Sanjay Patel	7bb8bfa062	[InstCombine] fix miscompile from vector select substitution This is similar to the fix in `c590a9880d` ( PR49832 ), but we missed handling the pattern for select of bools (no compare inst). We can't substitute a vector value because the equality condition replacement that we are attempting requires that the condition is true/false for the entire value. Vector select can be partly true/false. I added an assert for vector types, so we shouldn't hit this again. Fixed formatting while auditing the callers. https://llvm.org/PR50500	2021-05-30 07:11:58 -04:00
Mindong Chen	71acce68da	[NFCI] Move DEBUG_TYPE definition below #includes When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE definition defined around the #includes in files include it could result in redefinition warnings even compile errors. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102594	2021-05-30 17:31:01 +08:00
Sanjay Patel	c7da0c383a	[InstCombine] fold zext of masked bit set/clear This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1 Note: this is a re-post of a patch that I committed at: rGa041c4ec6f7a The commit was reverted because it exposed another bug: rGb212eb7159b40 But that has since been corrected with: rG8a156d1c2795189 ( D101191 ) Differential Revision: https://reviews.llvm.org/D72396	2021-05-29 08:52:26 -04:00
Sanjay Patel	52f2970036	[InstCombine] reduce code duplication; NFC	2021-05-29 08:33:25 -04:00
Nikita Popov	625920dabf	[LoopUnroll] Make DomTree explicitly required (NFC) Some of the code was already assuming that DT is non-null, so make that requirement more explicit and remove unnecessary null checks.	2021-05-29 09:37:32 +02:00
Fangrui Song	38dbdde792	[Internalize] Simplify comdat renaming with noduplicates after D103043 I realized that we can use `comdat noduplicates` which is available on ELF. Add a special case for wasm which doesn't support the feature.	2021-05-28 16:58:38 -07:00
Nikita Popov	90310dfff8	[LoopUnroll] Use changeToUnreachable() (NFC) When fulling unrolling with a non-latch exit, the latch block is folded to unreachable. Replace this folding with the existing changeToUnreachable() helper, rather than performing it manually. This also moves the fold to happen after the manual DT update for exit blocks. I believe this is correct in that the conversion of an unconditional backedge into unreachable should not affect the DT at all. Differential Revision: https://reviews.llvm.org/D103340	2021-05-29 00:11:21 +02:00
Nikita Popov	f765445a69	[LoopUnroll] Clean up exit folding (NFC) This does some non-functional cleanup of exit folding during unrolling. The two main changes are: * First rewrite latch->header edges, which is unrelated to exit folding. * Combine folding for latch and non-latch exits. After the previous change, the only difference in their logic is that for non-latch exits we currently only fold "known non-exit" cases, but not "known exit" cases. I think this helps a lot to clarify this code and prepare it for future changes. Differential Revision: https://reviews.llvm.org/D103333	2021-05-28 22:31:13 +02:00
Bardia Mahjour	06eaffa858	[NFC] Remove confusing info about MainLoop VF/UF from debug message	2021-05-28 16:10:04 -04:00
Florian Hahn	007f268c35	[VectorCombine] Check indices for all extracts we scalarize. We need to make sure that the indices of all extracts we scalarize are valid.	2021-05-28 18:35:29 +01:00
Stefan Pintilie	0159652058	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)" This reverts commit `be1a23203b`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	24bd657202	Revert "[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop" This reverts commit `b0b2bf3b5d`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	fd55331203	Revert "[NFC] Formatting fix" This reverts commit `59d938e649`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	807fc7cdc9	Revert "[NFC] Reuse existing variables instead of re-requesting successors" This reverts commit `c467585682`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	dd226803c2	Revert "[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC" This reverts commit `7d418dadf6`.	2021-05-28 12:21:21 -05:00
Sanjay Patel	403cfe5d70	[PassManager] unify late simplifycfg options between regular and LTO pipelines This is split off from D102002, and I think it is clear that the difference in behavior was not intended. Options were added to SimplifyCFG over time, but different chunks of the pass pipelines were not kept in sync.	2021-05-28 13:06:49 -04:00
eopXD	fa488ea864	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 15:43:12 +00:00
dongAxis	66ff1cbd71	[NFC][Transforms][Utils] remove useless variable in CloneBasicBlock	2021-05-28 17:50:38 +08:00
eopXD	e96d6f4821	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `7952ddb21f`. Differential Revision: https://reviews.llvm.org/D103302	2021-05-28 07:58:06 +00:00
eopXD	7e06cf8f1b	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `ffc4d3e068`.	2021-05-28 07:48:04 +00:00
eopXD	ffc4d3e068	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:25:53 +00:00
eopXD	7952ddb21f	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:11:26 +00:00
Max Kazantsev	6a2af607ad	Revert "[NFCI] Lazily evaluate SCEVs of PHIs" This reverts commit `51d334a845`. Reported failures, need to analyze.	2021-05-28 11:05:30 +07:00
Jinsong Ji	b2581196eb	[AIX] Enable stackprotect feature AIX use `__ssp_canary_word` instead of `__stack_chk_guard`. This patch update the target hook to use correct symbol, so that the basic stackprotect feature can work. The traceback will be handled in follow up patch. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D103100	2021-05-28 02:18:15 +00:00
Jianzhou Zhao	fc1d39849e	[dfsan] Add a flag about whether to propagate offset labels at gep DFSan has flags to control flows between pointers and objects referred by pointers. For example, a = p; L(a) = L(p) when -dfsan-combine-pointer-labels-on-load = false L(a) = L(p) + L(p) when -dfsan-combine-pointer-labels-on-load = true p = b; L(p) = L(b) when -dfsan-combine-pointer-labels-on-store = false L(p) = L(b) + L(p) when -dfsan-combine-pointer-labels-on-store = true The question is what to do with p += c. In practice we found many confusing flows if we propagate labels from c to p. So a new flag works like this p += c; L(p) = L(p) when -dfsan-propagate-via-pointer-arithmetic = false L(p) = L(p) + L(c) when -dfsan-propagate-via-pointer-arithmetic = true Reviewed-by: gbalats Differential Revision: https://reviews.llvm.org/D103176	2021-05-28 00:06:19 +00:00
Arthur Eubanks	2d2a902078	[SanCov] Properly set ABI parameter attributes Arguments need to have the proper ABI parameter attributes set. Followup to D101806. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103288	2021-05-27 15:27:21 -07:00
Adrian Prantl	f3869a5c32	Support stripping indirectly referenced DILocations from !llvm.loop metadata in stripDebugInfo(). This patch fixes an oversight in https://reviews.llvm.org/D96181 and also takes into account loop metadata pointing to other MDNodes that point into the debug info. rdar://78487175 Differential Revision: https://reviews.llvm.org/D103220	2021-05-27 13:23:33 -07:00
maekawatoshiki	2165360003	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-28 01:17:23 +09:00
Florian Hahn	38641ddf3e	[VPlan] Do not sink uniform recipes in sinkScalarOperands. For uniform ReplicateRecipes, only the first lane should be used, so sinking them would mean we have to compute the value of the first lane multiple times. Also, at the moment, sinking them causes a crash because the value of the first lane is re-used by all users. Reported post-commit for D100258.	2021-05-27 14:07:48 +01:00
Max Kazantsev	7d418dadf6	[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC	2021-05-27 15:29:37 +07:00
Max Kazantsev	c467585682	[NFC] Reuse existing variables instead of re-requesting successors	2021-05-27 15:29:37 +07:00
Max Kazantsev	51d334a845	[NFCI] Lazily evaluate SCEVs of PHIs Eager evaluation has cost of compile time. Only query them if they are required for proving predicates.	2021-05-27 13:35:31 +07:00
Max Kazantsev	59d938e649	[NFC] Formatting fix	2021-05-27 12:50:54 +07:00
Max Kazantsev	b0b2bf3b5d	[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop	2021-05-27 12:44:22 +07:00
Yevgeny Rouban	4d26f41f76	[RS4GC] Introduce intrinsics to get base ptr and offset There can be a need for some optimizations to get (base, offset) for any GC pointer. The base can be calculated by generating needed instructions as it is done by the RewriteStatepointsForGC::findBasePointer() function. The offset can be calculated in the same way. Though to not expose the base calculation and to make the offset calculation as simple as ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal outside RS4GC, this patch introduces 2 intrinsics: @llvm.experimental.gc.get.pointer.base(%derived_ptr) @llvm.experimental.gc.get.pointer.offset(%derived_ptr) These intrinsics are inlined by RS4GC along with generation of statepoint sequences. With these new intrinsics the GC parseable lowering for atomic memcpy intrinsics (`6ec2c5e402`) could be implemented as a separate pass. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D100445	2021-05-27 09:14:14 +07:00
Heejin Ahn	5bfe06ad35	[SimplifyCFG] Use make_early_inc_range() while deleting instructions We are deleting `phi` nodes within the for loop, so this makes sure we increment the iterator before we delete the instruction pointed by the iterator. This started to break in `a0be081646`. Reviewed By: dschuff, lebedev.ri Differential Revision: https://reviews.llvm.org/D103181	2021-05-26 11:43:11 -07:00
Alexey Bataev	27d3528acf	[SLP]Fix vectorization of insertelements with multiple uses. SLP vectorizer should not consider in sertelements with multiple uses as a part of high level build vector, it must be considered as a terminating insertelement in the vector build, otherwise it may produce incorrect code. Differential Revision: https://reviews.llvm.org/D103164	2021-05-26 09:42:18 -07:00
Stephen Tozer	a0bd6105d8	[DebugInfo] Limit the number of values that may be referenced by a dbg.value Following the addition of salvaging dbg.values using DIArgLists to reference multiple values, a case has been found where excessively large DIArgLists are produced as a result of this salvaging, resulting in large enough performance costs to effectively freeze the compiler. This patch introduces an upper bound of 16 to the number of values that may be salvaged into a dbg.value, to limit the impact of these extreme cases to performance. Differential Revision: https://reviews.llvm.org/D103162	2021-05-26 17:34:05 +01:00
Philip Reames	9cc2181ec3	[unroll] Use value domain for symbolic execution based cost model The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost. By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928. Differential Revision: https://reviews.llvm.org/D102934	2021-05-26 08:41:25 -07:00
Kerry McLaughlin	9f76a85260	[LoopVectorize] Enable strict reductions when allowReordering() returns false When loop hints are passed via metadata, the allowReordering function in LoopVectorizationLegality will allow the order of floating point operations to be changed: bool allowReordering() const { // When enabling loop hints are provided we allow the vectorizer to change // the order of operations that is given by the scalar loop. This is not // enabled by default because can be unsafe or inefficient. The -enable-strict-reductions flag introduced in D98435 will currently only vectorize reductions in-loop if hints are used, since canVectorizeFPMath() will return false if reordering is not allowed. This patch changes canVectorizeFPMath() to query whether it is safe to vectorize the loop with ordered reductions if no hints are used. For testing purposes, an additional flag (-hints-allow-reordering) has been added to disable the reordering behaviour described above. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D101836	2021-05-26 13:59:12 +01:00
Max Kazantsev	be1a23203b	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2) The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:47:14 +07:00
Max Kazantsev	0de553dce0	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"" This reverts commit `43d2e51c2e`. Commited wrong version.	2021-05-26 19:29:07 +07:00
Max Kazantsev	43d2e51c2e	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:23:21 +07:00
David Sherwood	70d8365e33	Fix warning introduced by `9c766f4090`	2021-05-26 10:20:39 +01:00
David Sherwood	9c766f4090	[InstCombine] Fold extractelement + vector GEP with one use We sometimes see code like this: Case 1: %gep = getelementptr i32, i32* %a, <2 x i64> %splat %ext = extractelement <2 x i32> %gep, i32 0 or this: Case 2: %gep = getelementptr i32, <4 x i32> %a, i64 1 %ext = extractelement <4 x i32> %gep, i32 0 where there is only one use of the GEP. In such cases it makes sense to fold the two together such that we create a scalar GEP: Case 1: %ext = extractelement <2 x i64> %splat, i32 0 %gep = getelementptr i32, i32 %a, i64 %ext Case 2: %ext = extractelement <2 x i32> %a, i32 0 %gep = getelementptr i32, i32 %ext, i64 1 This may create further folding opportunities as a result, i.e. the extract of a splat vector can be completely eliminated. Also, even for the general case where the vector operand is not a splat it seems beneficial to create a scalar GEP and extract the scalar element from the operand. Therefore, in this patch I've assumed that a scalar GEP is always preferrable to a vector GEP and have added code to unconditionally fold the extract + GEP. I haven't added folds for the case when we have both a vector of pointers and a vector of indices, since this would require generating an additional extractelement operation. Tests have been added here: Transforms/InstCombine/gep-vector-indices.ll Differential Revision: https://reviews.llvm.org/D101900	2021-05-26 09:54:26 +01:00
Teresa Johnson	d35fe04fa3	[LTT] Handle merged llvm.assume when dropping type tests When the lower type test pass is invoked a second time with DropTypeTests set to true, it expects that all remaining type tests feed assume instructions, which are removed along with the type tests. In some cases the llvm.assume might have been merged with another one, i.e. from a builtin_assume instruction, in which case the type test would actually feed a phi that in turn feeds the merged assume instruction. In this case we can simply replace that operand of the phi with "true" before removing the type test. Differential Revision: https://reviews.llvm.org/D103073	2021-05-25 17:02:13 -07:00
Kevin Athey	52ac114771	LLVM Detailed IR tests for introduction of flag -fsanitize-address-detect-stack-use-after-return-mode. Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102462	2021-05-25 16:17:39 -07:00
Fangrui Song	b426b45d10	[Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member Beside the `comdat any` deduplication feature, instrumentations use comdat to establish dependencies among a group of sections, to prevent section based linker garbage collection from discarding some members without discarding all. LangRef acknowledges this usage with the following wording: > All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key. On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated `__llvm_prf_data` section are placed in the same GRP_COMDAT group. A `__llvm_prf_data` is usually not referenced and expects the liveness of its associated `__llvm_prf_cnts` to retain it. The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained). The main goal of this patch is to fix the dependency relationship. I think it makes sense for InternalizePass to internalize a comdat and thus suppress the deduplication feature, e.g. a relocatable link of a regular LTO can create an object file affected by InternalizePass. If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the a.o references to the comdat definitions will be non-resolvable (references cannot bind to STB_LOCAL definitions in b.o). On PE-COFF, for a non-external selection symbol, deduplication is naturally suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend to believe the spec creator has not thought about this use case (see D102973). GNU ld and gold are still using the "signature is name based" interpretation. So even if D102973 for ld.lld is accepted, for portability, a better approach is to rename the comdat. A comdat with one single member is the common case, leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by deleting the comdat; otherwise we rename the comdat. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103043	2021-05-25 14:15:27 -07:00
Matt Morehouse	832c99f727	Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" This reverts commit `2531fd70d1` due to performance regression on the PPC buildbot.	2021-05-25 13:58:42 -07:00
Benjamin Kramer	d2d4f16806	[Matrix] Use LLVM_DEBUG for a debug flag dump() doesn't exist in release builds. ld.lld: error: undefined symbol: llvm::Value::dump() const >>> referenced by LowerMatrixIntrinsics.cpp >>> LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())	2021-05-25 21:10:19 +02:00
Nikita Popov	9c91614959	[CVP] Guard against poison in common phi value transform (PR50399) The common phi value transform replaces constants with values that have the same value as the constant on a given edge. However, LVI generally only provides information that is correct up to poison, so this can end up replacing a well-defined value with poison. D69442 addressed an instance of this problem by clearing poison flags on the generating instruction, which was sufficient at the time. rGa917fb89dc28 made LVI's edge value analysis slightly more powerful, and clearing poison flags is no longer sufficient. This patch changes the transform to instead explicitly guard against a poison value instead. This should be satisfied for most cases due to a prior branch on poison. Fixes https://bugs.llvm.org/show_bug.cgi?id=50399. Differential Revision: https://reviews.llvm.org/D102966	2021-05-25 20:47:17 +02:00
Adam Nemet	dfd1bbd00a	[Matrix] Factor and distribute transposes across multiplies Now that we can fold some transposes into multiplies (CM: A * B^t and RM: A^t * B), we want to move them around to create the optimal expressions: * fold away double transposes while still using them to assert the shape * sink transposes hoping they cancel out * lift transposes when both operands are transposed This also modifies the matrix remarks to include the number of exposed transposes (i.e. transposes that we couldn't fold into a multiply). The adjustment to the test remarks-inlining is a bit subtle: I am changing the double transpose to a single transpose so that we don't remove it completely. More importantly this changes some of the total instruction count, most notable stores because we can no longer use a vector store. Differential Revision: https://reviews.llvm.org/D102733	2021-05-25 11:12:20 -07:00
Roman Lebedev	149e018d12	[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: https://github.com/AliveToolkit/alive2/issues/726	2021-05-25 21:02:28 +03:00
Sanjay Patel	ae1bc9ebf3	[InstCombine] avoid infinite loop from vector select transforms The 2nd test is based on the fuzzer example in post-commit comments of D101191 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661 The 1st test shows that we don't deal with this symmetrically. We should be able to reduce both examples (possibly in instsimplify instead of instcombine).	2021-05-25 13:28:38 -04:00
Florian Hahn	8e83ff58c9	[VectorCombine] Remove unneeded InsertPointGuard (NFCI). All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.	2021-05-25 17:01:05 +01:00
Florian Hahn	575e2aff55	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Sanjay Patel	0bab0f6161	[InstCombine] canonicalize cast before unary shuffle We could go either direction on this transform. VectorCombine already goes this way for bitcasts (and handles more complicated cases using the cost model), so let's try cast-first. Deferring completely to VectorCombine is another possibility. But the backend should be able to invert this easily when the vectors have the same shape, so it doesn't seem like a transform that we need to avoid. The motivating example from https://llvm.org/PR49081 has an int-to-float sandwiched between 2 shuffles, and the backend currently does not reduce that, so on x86, we get something like: pshufd $249, %xmm0, %xmm0] cvtdq2ps %xmm0, %xmm0 shufps $144, %xmm0, %xmm0 ...instead of just a single conversion instruction. Differential Revision: https://reviews.llvm.org/D103038	2021-05-25 08:43:09 -04:00
Chuanqi Xu	400a9d3501	[NFC] [Coroutines] Remove unused variable: UnreachableCache	2021-05-25 20:33:46 +08:00
Roman Lebedev	8f4db14d1c	[LoopIdiom] Support 'left-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countBits(unsigned val) { int cnt = 0; for( ; (val << cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val << (cnt + off); cnt++) ; return cnt; } ``` alive2 is happy with all the tests there. Note that, again, much like with the right-shift cases, we don't require the `val != 0` guard. This is the last pattern that was supported by `detectShiftUntilZeroIdiom()`, which now becomes obsolete.	2021-05-25 15:26:35 +03:00
Roman Lebedev	f1c5f78d38	[LoopIdiom] Support 'arithmetic right-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(signed val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countActiveBits(signed val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` This directly matches the existing 'logical right-shift until zero' idiom. alive2 is happy with all the tests there. Note that, again, much like with the original unsigned case, we don't require the `val != 0` guard. The old `detectShiftUntilZeroIdiom()` already supports this pattern, the idea here is that the `val` must be positive (have at least one leading zero), because otherwise the loop is non-terminating, but since it is not `while(1)`, that would have been UB.	2021-05-25 14:30:49 +03:00
Marco Elver	280333021e	[SanitizeCoverage] Add support for NoSanitizeCoverage function attribute We really ought to support no_sanitize("coverage") in line with other sanitizers. This came up again in discussions on the Linux-kernel mailing lists, because we currently do workarounds using objtool to remove coverage instrumentation. Since that support is only on x86, to continue support coverage instrumentation on other architectures, we must support selectively disabling coverage instrumentation via function attributes. Unfortunately, for SanitizeCoverage, it has not been implemented as a sanitizer via fsanitize= and associated options in Sanitizers.def, but rolls its own option fsanitize-coverage. This meant that we never got "automatic" no_sanitize attribute support. Implement no_sanitize attribute support by special-casing the string "coverage" in the NoSanitizeAttr implementation. To keep the feature as unintrusive to existing IR generation as possible, define a new negative function attribute NoSanitizeCoverage to propagate the information through to the instrumentation pass. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035 Reviewed By: vitalybuka, morehouse Differential Revision: https://reviews.llvm.org/D102772	2021-05-25 12:57:14 +02:00
Alexey Lapshin	10c2e26159	[TRE] Reland: allow TRE for non-capturing calls. The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap. This patch does the same as D82085 plus fixes bootstrap error. The problem with D82085 is that it does not create copies for byval operands, while replacing function call with a branch. Consider following example: ``` int zoo ( S p1 ); int foo ( int count, S p1 ) { if ( count > 10 ) return zoo(p1); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). // lifetime.start p1.byvalue.temp return foo(count+1, p1); // lifetime.end p1.byvalue.temp } ``` After recursive call to foo is replaced with a jump into start of the function, its parameters could be passed to zoo function. i.e. temporarily variable created for byvalue parameter "p1" could be passed to zoo. Finally zoo receives broken operand: ``` int foo ( int count, S p1 ) { :tailrecurse p1_tr = phi p1, p1.byvalue.temp if ( count > 10 ) return zoo(p1_tr); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). lifetime.start p1.byvalue.temp memcpy (p1.byvalue.temp, p1_tr) count = count + 1 lifetime.end p1.byvalue.temp br tailrecurse } ``` To prevent using p1.byvalue.temp after its scope finished by lifetime.end marker this patch copies value from p1.byvalue.temp into another temporarily variable and then copies this variable into the input parameter for next iteration. This patch passes bootstrap build and bootstrap build with AddressSanitizer. Differential Revision: https://reviews.llvm.org/D85614	2021-05-25 11:35:48 +03:00
Max Kazantsev	2531fd70d1	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: reames	2021-05-25 12:43:31 +07:00
maekawatoshiki	e77d24f70a	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `d65c32fb41`.	2021-05-25 11:39:49 +09:00
Anton Afanasyev	b2cd895011	[SLP] Fix "gathering" of insertelement instructions For rare exceptional case vector tree node (insertelements for now only) is marked as `NeedToGather`, this case is processed by patch. Follow-up of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135. Differential Revision: https://reviews.llvm.org/D102675	2021-05-25 01:35:43 +03:00
Jon Roelofs	095e91c973	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Re-landing now that the crasher this patch previously uncovered has been fixed in: https://reviews.llvm.org/D102935 Differential revision: https://reviews.llvm.org/D102452	2021-05-24 10:10:44 -07:00
Jon Roelofs	694068d0db	[Remarks] Look through inttoptr/ptrtoint for -ftrivial-auto-var-init remarks. The crasher is a related problem that @aemerson found broke speck2k6/403.gcc when I landed https://reviews.llvm.org/D102452. It has been reduced & modified to reproduce without that patch. Differential revision: https://reviews.llvm.org/D102935	2021-05-24 09:23:22 -07:00
Adrian Prantl	4cba0a4f11	CoroSplit: Replace ad-hoc implementation of reachability with API from CFG.h The current ad-hoc implementation used to determine whether a basic block is unreachable doesn't work correctly in the general case (for example it won't detect successors of unreachable blocks as unreachable). This patch replaces it with the correct API that uses a DominatorTree to answer the question correctly and quickly. rdar://77181156 Differential Revision: https://reviews.llvm.org/D102963	2021-05-24 09:18:33 -07:00
Florian Hahn	65d3dd7c88	[VPlan] Add first VPlan version of sinkScalarOperands. This patch adds a first VPlan-based implementation of sinking of scalar operands. The current version traverse a VPlan once and processes all operands of a predicated REPLICATE recipe. If one of those operands can be sunk, it is moved to the block containing the predicated REPLICATE recipe. Continue with processing the operands of the sunk recipe. The initial version does not re-process candidates after other recipes have been sunk. It also cannot partially sink induction increments at the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the induction is used for example in a GEP, only the first lane is used and in the lowered IR the adds for the other lanes can be sunk into the predicated blocks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100258	2021-05-24 15:29:58 +01:00
Florian Hahn	e9d97d7d9d	[VPlan] Add mayReadOrWriteMemory & friends. This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.	2021-05-24 13:11:32 +01:00
Florian Hahn	4e8c28b6fb	Recommit "[VectorCombine] Scalarize vector load/extract." This reverts commit `94d54155e2`. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.	2021-05-24 11:35:07 +01:00
Roman Lebedev	32bee42719	[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant Given that BaseX is an incoming value when coming from the preheader, it should be loop-invariant, but let's just document this assumption.	2021-05-24 12:15:06 +03:00
Roman Lebedev	aa3dac95ed	[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant As per the reproducer provided by Mikael Holmén in post-commit review.	2021-05-24 12:15:06 +03:00
Florian Hahn	94d54155e2	Revert "[VectorCombine] Scalarize vector load/extract." This reverts commit `86497785d5`. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio	2021-05-24 10:11:00 +01:00
Florian Hahn	86497785d5	[VectorCombine] Scalarize vector load/extract. This patch adds a new combine that tries to scalarize chains of `extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is profitable when extracting only a few elements out of a large vector. At the moment, `store (extractelement (load %ptr), %idx), %ptr` operations on large vectors result in huge code in the backend. This can easily be triggered by using the matrix extension, e.g. https://clang.godbolt.org/z/qsccPdPf4 This should complement D98240. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D100273	2021-05-24 09:29:08 +01:00
Johannes Doerfert	6caea8a7fa	[Attributor] Introduce a helper do deal with constant type mismatches If we simplify values we sometimes end up with type mismatches. If the value is a constant we can often cast it though to still allow propagation. The logic is now put into a helper and it replaces some ad hoc things we did before. This also introduces the AA namespace for abstract attribute related functions and types.	2021-05-23 23:00:40 -05:00
Johannes Doerfert	55e9c28212	[Attributor] Teach AAIsDead about undef values Not only if the branch or switch condition is dead but also if it is assumed `undef` we can delay AAIsDead exploration.	2021-05-23 23:00:40 -05:00
Johannes Doerfert	4878d73419	[Attributor] Deal with address spaces gracefully When we do value propagation we need to cast address spaces properly.	2021-05-23 23:00:39 -05:00
Johannes Doerfert	1ba2929bb8	[Attributor] Be more careful to not disturb the CG outside the SCC We have seen various problems when the call graph was not updated or the updated did not succeed because it involved functions outside the SCC. This patch adds assertions and checks to avoid accidentally changing something outside the SCC that would impact the call graph. It also prevents us from reanalyzing functions outside the current SCC which could cause problems on its own. Note that the transformations we do might cause the CG to be "more precise" but the original one would always be a super set of the most precise one. Since the call graph is by nature an approximation, it is good enough to have a super set of all call edges.	2021-05-23 23:00:39 -05:00
Johannes Doerfert	e93ac1e2de	[Attributor][FIX] Account for undef in the constant value lattice The constant value lattice looks like this ``` <None> \| <undef> / \| \ ... <0> ... \ \| / <unknown> ``` We did not account for the undef and assumed a value meant we could not change anymore. Now we actually check if we have the same value as before, which will signal CHANGED to the users when we go from undef to a specific constant. This fixes, among other things, the bug exposed by @ipccp4 in `value-simplify.ll`.	2021-05-23 20:47:06 -05:00
Johannes Doerfert	5cdc29f795	[Attributor][FIX] Ensure we replace undef if we see the first "real" value The state of AAPotentialValues tracks if undef is contained. It should fold undef into the first non-undef value. However we missed a case before. There was also a shadowing definition of two variables that caused trouble. The test exposes both problems.	2021-05-23 20:47:06 -05:00
Johannes Doerfert	2bc51d39db	[Attributor][NFC] Add helpful debug outputs	2021-05-23 20:47:05 -05:00
Johannes Doerfert	cb511531b9	[Attributor][NFC] Clang format the Attributor source files	2021-05-23 20:47:05 -05:00
maekawatoshiki	d65c32fb41	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-23 22:32:01 +09:00
Yaxun (Sam) Liu	bf6124580d	[HIP] support ThinLTO Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling LTO for offload compilation. Allow LTO for AMDGPU target. AMDGPU target does not support codegen of object files containing call of external functions, therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all the callees. An LLVM option is added to allow function importer to import functions with noinline attribute. HIP toolchain passes proper LLVM options to lld to make sure function importer imports definitions of all the callees. Reviewed by: Teresa Johnson, Artem Belevich Differential Revision: https://reviews.llvm.org/D99683	2021-05-22 10:48:34 -04:00
Nikita Popov	9a9421a461	Reapply [InstCombine] Fold multiuse shr eq zero This was reverted due to performance regressions in ARM benchmarks, which have since been addressed by D101196 (SCEV analysis improvement) and D101778 (CGP reverse transform). ----- The single-use case is handled implicity by converting the icmp into a mask check first. When comparing with zero in particular, we don't need the one-use restriction, as we only produce a single icmp. https://alive2.llvm.org/ce/z/MSixcm https://alive2.llvm.org/ce/z/GwpG0M	2021-05-22 14:46:50 +02:00
Florian Hahn	a6de8d95db	[Matrix] Bail out early if there are no matrix intrinsics. If there are no matrix intrinsics in a function, we can directly bail out, as there's nothing left to do. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D102931	2021-05-22 11:37:25 +01:00
Arthur Eubanks	f7788e1bff	Revert "[NewPM] Only invalidate modified functions' analyses in CGSCC passes" This reverts commit `d14d84af2f`. Causes unacceptable memory regressions.	2021-05-21 16:38:03 -07:00
Florian Hahn	a0ce6439ca	[Matrix] Remove unused matrix-propagate-shape option. The option was used during the initial bringup, but it does not add any value at this point. Remove it. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D102930	2021-05-21 19:01:54 +01:00
maekawatoshiki	fd53cb4148	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `cea7a3fe3d`. To investigate sanitizer-x86_64-linux-fast failure.	2021-05-22 01:40:43 +09:00
maekawatoshiki	cea7a3fe3d	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-21 23:57:39 +09:00
Alexey Bataev	8dab25954b	[SLP]Improve handling of compensate external uses cost. External insertelement users can be represented as a result of shuffle of the vectorized element and noconsecutive insertlements too. Added support for handling non-consecutive insertelements. Differential Revision: https://reviews.llvm.org/D101555	2021-05-21 07:45:31 -07:00
Djordje Todorovic	cd49b3ae1a	[DebugInfo] Salvage dbg.value() during ADCE This has been found by using the [0]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\ test-original-debug-info-preservation-in-optimizations Differential Revision: https://reviews.llvm.org/D100844	2021-05-21 05:25:59 -07:00
Daniil Fukalov	e8e88c3353	[TTI] NFC: Change getRegUsageForType to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102541	2021-05-21 15:17:23 +03:00
Stephen Tozer	36ec97f76a	3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reapplies `c0f3dfb9`, which was reverted following the discovery of crashes on linux kernel and chromium builds - these issues have since been fixed, allowing this patch to re-land. This reverts commit `4397b7095d`.	2021-05-21 11:06:20 +01:00
Djordje Todorovic	b9076d119a	Recommit: "[Debugify][Original DI] Test dbg var loc preservation"" [Debugify][Original DI] Test dbg var loc preservation This is an improvement of [0]. This adds checking of original llvm.dbg.values()/declares() instructions in optimizations. We have picked a real issue that has been found with this (actually, picked one variable location missing from [1] and resolved the issue), and the result is the fix for that -- D100844. Before applying the D100844, using the options from [0] (but with this patch applied) on the compilation of GDB 7.11, the final HTML report for the debug-info issues can be found at [1] (please scroll down, and look for "Summary of Variable Location Bugs"). After applying the D100844, the numbers has improved a bit -- please take a look into [2]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\ test-original-debug-info-preservation-in-optimizations [1] https://djolertrk.github.io/di-check-before-adce-fix/ [2] https://djolertrk.github.io/di-check-after-adce-fix/ Differential Revision: https://reviews.llvm.org/D100845 The Unit test was failing because the pass from the test that modifies the IR, in its runOnFunction() didn't return 'true', so the expensive-check configuration triggered an assertion.	2021-05-21 02:04:29 -07:00
Xiang1 Zhang	5684851cb0	[HWASAN] No code changed, Only clang-format for HWAddressSanitizer.cpp	2021-05-21 14:00:34 +08:00
Jon Roelofs	0af3105b64	Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths" This reverts commit `4bf69fb52b`. This broke spec2k6/403.gcc under -global-isel. Details to follow once I've reduced the problem.	2021-05-20 12:19:16 -07:00
Kevin P. Neal	f21f1eea05	[FPEnv] EarlyCSE support for constrained intrinsics, default FP environment edition EarlyCSE cannot distinguish between floating point instructions and constrained floating point intrinsics that are marked as running in the default FP environment. Said intrinsics are supposed to behave exactly the same as the regular FP instructions. Teach EarlyCSE to handle them in that case. Differential Revision: https://reviews.llvm.org/D99962	2021-05-20 14:40:51 -04:00
Reid Kleckner	8f20ac9595	[PGO] Don't reference functions unless value profiling is enabled This reduces the size of chrome.dll.pdb built with optimizations, coverage, and line table info from 4,690,210,816 to 2,181,128,192, which makes it possible to fit under the 4GB limit. This change can greatly reduce binary size in coverage builds, which do not need value profiling. IR PGO builds are unaffected. There is a minor behavior change for frontend PGO. PGO and coverage both use InstrProfiling to create profile data with counters. PGO records the address of each function in the __profd_ global. It is used later to map runtime function pointer values back to source-level function names. Coverage does not appear to use this information. Recording the address of every function with code coverage drastically increases code size. Consider this program: void foo(); void bar(); inline void inlineMe(int x) { if (x > 0) foo(); else bar(); } int getVal(); int main() { inlineMe(getVal()); } With code coverage, the InstrProfiling pass runs before inlining, and it captures the address of inlineMe in the __profd_ global. This greatly increases code size, because now the compiler can no longer delete trivial code. One downside to this approach is that users of frontend PGO must apply the -mllvm -enable-value-profiling flag globally in TUs that enable PGO. Otherwise, some inline virtual method addresses may not be recorded and will not be able to be promoted. My assumption is that this mllvm flag is not popular, and most frontend PGO users don't enable it. Differential Revision: https://reviews.llvm.org/D102818	2021-05-20 11:09:24 -07:00
Sanjay Patel	f34311c402	[GlobalOpt] recompute alignments for loads and stores of updated globals GlobalOpt can slice structs/arrays and change GEPs in the process, but it was not updating alignments for load/store users. This eventually causes the crashing seen in: https://llvm.org/PR49661 https://llvm.org/PR50253 On x86, this required SLP+codegen to create an aligned vector store on an invalid address. The bugs would be easier to demonstrate on a target with stricter alignment requirements. I'm not sure if this is a complete solution. The alignment updating code is adapted from InstCombine, so I assume that part is tested and good. Differential Revision: https://reviews.llvm.org/D102552	2021-05-20 12:12:21 -04:00
Alexey Bataev	182162b616	[SLP]Try to vectorize tiny trees with shuffled gathers of extractelements. If we gather extract elements and they actually are just shuffles, it might be profitable to vectorize them even if the tree is tiny. Differential Revision: https://reviews.llvm.org/D101460	2021-05-20 08:36:16 -07:00
Djordje Todorovic	0ae3c1d4d7	Revert "[Debugify][Original DI] Test dbg var loc preservation" This reverts commit `76f375f3d9`. This will be pushed again, after investigating a test failure: https://lab.llvm.org/buildbot/#/builders/16/builds/11254	2021-05-20 07:11:35 -07:00
Djordje Todorovic	76f375f3d9	[Debugify][Original DI] Test dbg var loc preservation This is an improvement of [0]. This adds checking of original llvm.dbg.values()/declares() instructions in optimizations. We have picked a real issue that has been found with this (actually, picked one variable location missing from [1] and resolved the issue), and the result is the fix for that -- D100844. Before applying the D100844, using the options from [0] (but with this patch applied) on the compilation of GDB 7.11, the final HTML report for the debug-info issues can be found at [1] (please scroll down, and look for "Summary of Variable Location Bugs"). After applying the D100844, the numbers has improved a bit -- please take a look into [2]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html\ [1] https://djolertrk.github.io/di-check-before-adce-fix/ [2] https://djolertrk.github.io/di-check-after-adce-fix/ Differential Revision: https://reviews.llvm.org/D100845	2021-05-20 06:42:02 -07:00
Xiang1 Zhang	02f2d739e0	Revert "[HWASAN] Update the tag info for X86_64." This reverts commit `81c18ce03c`.	2021-05-20 13:12:59 +08:00
Xiang1 Zhang	81c18ce03c	[HWASAN] Update the tag info for X86_64. In LAM model X86_64 will use bits 57-62 (of 0-63) as HWASAN tag. So here we make sure the tag shift position and tag mask is correct for x86-64. Differential Revision: https://reviews.llvm.org/D102472	2021-05-20 11:22:12 +08:00
Zhiwei Chen	dbc641deb9	[sanitizer] Reduce redzone size for small size global objects Currently 1 byte global object has a ridiculous 63 bytes redzone. This patch reduces the redzone size to be less than 32 if the size of global object is less than or equal to half of 32 (the minimal size of redzone). A 12 bytes object has a 20 bytes redzone, a 20 bytes object has a 44 bytes redzone. Reviewed By: MaskRay, #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D102469	2021-05-19 19:18:50 -07:00
Jon Roelofs	3d2ffc88e6	Fix warnings in windows bots. NFC	2021-05-19 17:42:34 -07:00
Jon Roelofs	4bf69fb52b	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Differential revision: https://reviews.llvm.org/D102452	2021-05-19 15:09:18 -07:00
wlei	6539a80bc9	[CSSPGO] Avoid deleting probe instruction in FoldValueComparisonIntoPredecessors This change tries to fix a place missing `moveAndDanglePseudoProbes `. In FoldValueComparisonIntoPredecessors, it folds the BB into predecessors and then marked the BB unreachable. However, the original logic from the BB is still alive, deleting the probe will mislead the SampleLoader mark it as zero count sample. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D102721	2021-05-19 13:39:05 -07:00
Joseph Huber	2db182ff8d	[Diagnostics] Allow emitting analysis and missed remarks on functions Summary: Currently, only `OptimizationRemarks` can be emitted using a Function. Add constructors to allow this for `OptimizationRemarksAnalysis` and `OptimizationRemarkMissed` as well. Reviewed By: jdoerfert thegameg Differential Revision: https://reviews.llvm.org/D102784	2021-05-19 15:10:20 -04:00
Roman Lebedev	40fb4eeff9	[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Roman Lebedev	c60ca9856c	[NFCI][Local] MergeBlockIntoPredecessor(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Roman Lebedev	b0bb2149b3	[NFCI][Local] removeUnreachableBlocks(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Philip Reames	449d14ebd2	Do actual DCE in LoopUnroll (try 4) Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change. Commit message from try 3: Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) Original commit message: The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-19 10:25:31 -07:00
Hongtao Yu	4ca6e37b98	[CSSPGO] Overwrite branch weight annotated in previous pass. Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes: 1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication. 2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count. 3. Overwrite branch weight for basic blocks. With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink. It's not expected to work well for non-CS case with a less-accurate post-inline count quality. It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D102537	2021-05-19 09:12:24 -07:00
Amy Huang	517857421d	Revert "Do actual DCE in LoopUnroll (try 3)" This reverts commit `b6320eeb86` as it causes clang to assert; see https://reviews.llvm.org/rGb6320eeb8622f05e4a5d4c7f5420523357490fca.	2021-05-19 08:53:38 -07:00
David Sherwood	7e95a563c8	Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst In InnerLoopVectorizer::setDebugLocFromInst we were previously asserting that the VF is not scalable. This is because we want to use the number of elements to create a duplication factor for the debug profiling data. However, for scalable vectors we only know the minimum number of elements. I've simply removed the assert for now and added a FIXME saying that we assume vscale is always 1. When vscale is not 1 it just means that the profiling data isn't as accurate, but shouldn't cause any functional problems.	2021-05-19 13:33:10 +01:00
Roman Lebedev	8c2b535d6c	[NFCI][SimplifyCFG] removeEmptyCleanup(): use DeleteDeadBlock() This required some changes to, instead of eagerly making PHI's in the UnwindDest valid as-if the BB is already not a predecessor, to be valid while BB is still a predecessor.	2021-05-19 14:08:25 +03:00
Roman Lebedev	bb5d613aba	[NFCI][SimplifyCFG] removeEmptyCleanup(): streamline PHI node updating	2021-05-19 14:08:25 +03:00
Roman Lebedev	a0be081646	[NFC][SimplifyCFG] removeEmptyCleanup(): use BasicBlock::phis()	2021-05-19 14:08:24 +03:00
Sander de Smalen	4f86aa650c	[LV] Add -scalable-vectorization=<option> flag. This patch adds a new option to the LoopVectorizer to control how scalable vectors can be used. Initially, this suggests three levels to control scalable vectorization, although other more aggressive options can be added in the future. The possible options are: - Disabled: Disables vectorization with scalable vectors. - Enabled: Vectorize loops using scalable vectors or fixed-width vectors, but favors fixed-width vectors when the cost is a tie. - Preferred: Like 'Enabled', but favoring scalable vectors when the cost-model is inconclusive. Reviewed By: paulwalker-arm, vkmr Differential Revision: https://reviews.llvm.org/D101945	2021-05-19 10:40:56 +01:00
Roman Lebedev	57d20cbf46	[NFCI][SimplifyCFG] simplifyUnreachable(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	69a43e5fc5	[NFCI][SimplifyCFG] simplifyReturn(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	00f90e3fca	[NFCI][SimplifyCFG] simplifySingleResume(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	a4eb24c688	[NFCI][SimplifyCFG] simplifyCommonResume(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	729e18cbf4	[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks() In this case, it does the same thing as the original pattern does. SimplifyCFG has a few lurking miscompilations about deleting blocks that have their address taken, and consistently using DeleteDeadBlocks() instead of a hand-rolled pattern will allow to weed those cases out easierly.	2021-05-19 11:32:24 +03:00
Joseph Huber	68abc3d264	[Attributor] Change AAExecutionDomain to only accept intrinsics Summary: The OpenMP runtime functions don't always provide unique thread ID's to determine if a basic block is truly single-threaded. Change the implementation to only check NVPTX intrinsics for now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102700	2021-05-18 21:19:26 -04:00
Rong Xu	886629a8c9	[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This patch implements first part of Flow Sensitive SampleFDO (FSAFDO). It has the following changes: (1) disable current discriminator encoding scheme, (2) new hierarchical discriminator for FSAFDO. For this patch, option "-enable-fs-discriminator=true" turns on the new functionality. Option "-enable-fs-discriminator=false" (the default) keeps the current SampleFDO behavior. When the fs-discriminator is enabled, we insert a flag variable, namely, llvm_fs_discriminator, to the object. This symbol will checked by create_llvm_prof tool, and used to generate a profile with FS-AFDO discriminators enabled. If this happens, for an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. Differential Revision: https://reviews.llvm.org/D102246	2021-05-18 16:23:43 -07:00
Arthur Eubanks	b86302e500	[MSan] Set zeroext on call arguments to msan functions with zeroext parameter attribute ABI attributes need to match between the caller and callee. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D102667	2021-05-18 14:07:39 -07:00
Arthur Eubanks	6b9524a05b	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Nikita Popov	e81334a754	[LICM] Remove MaybePromotable set (PR50367) The MaybePromotable set keeps track of loads/stores for which promotion was not attempted yet. Normally, any load/stores that are promoted in the current iteration will be removed from this set, because they naturally MustAlias with the promoted value. However, if the source program has UB with metadata claiming that a store is NoAlias, while it is actually MustAlias, and multiple different pointers are promoted in the same iteration, it can happen that a store is removed that is still in the MaybePromotable set, causing a use-after-free. While this could be fixed by explicitly invalidating values in MaybePromotable in the LoopPromoter, I'm going with the more radical option of dropping the set entirely here and check all load/stores on each promotion iteration. As promotion, and especially repeated promotion, are quite rare, this doesn't seem to have any impact on compile-time. Fixes https://bugs.llvm.org/show_bug.cgi?id=50367.	2021-05-18 20:26:01 +02:00
Sanjay Patel	6d949a9c8f	[InstCombine] restrict funnel shift match to avoid miscompile As noted in the post-commit discussion for: https://reviews.llvm.org/rGabd7529625a73f405e40a63dcc446c41d51a219e ...that change exposed a logic hole that allows a miscompile if the shift amount could exceed the narrow width: https://alive2.llvm.org/ce/z/-i_CiM https://alive2.llvm.org/ce/z/NaYz28 The restriction isn't necessary for a rotate (same operand for both shifts), so we should adjust the matching for the shift value as a follow-up enhancement: https://alive2.llvm.org/ce/z/ahuuQb	2021-05-18 13:32:07 -04:00
Florian Hahn	cc1a6361d3	[VPlan] Add VPUserID to distinguish between recipes and others. This allows cast/dyn_cast'ing from VPUser to recipes. This is needed because there are VPUsers that are not recipes. Reviewed By: gilr, a.elovikov Differential Revision: https://reviews.llvm.org/D100257	2021-05-18 09:17:28 +01:00
Sander de Smalen	81fdc73e5d	[LV] Return both fixed and scalable Max VF from computeMaxVF. This patch introduces a new class, MaxVFCandidates, that holds the maximum vectorization factors that have been computed for both scalable and fixed-width vectors. This patch is intended to be NFC for fixed-width vectors, although considering a scalable max VF (which is disabled by default) pessimises tail-loop elimination, since it can no longer determine if any chosen VF (less than fixed/scalable MaxVFs) is guaranteed to handle all vector iterations if the trip-count is known. This issue will be addressed in a future patch. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D98721	2021-05-18 08:03:48 +01:00
Adam Nemet	ab1f6ffa56	[GVN] Improve analysis for missed optimization remark This change tries to handle multiple dominating users of the pointer operand by choosing the most immediately dominating one, if possible. While making this change I also found that the previous implementation had a missing break statement, making all loads with an odd number of dominating users emit an OtherAccess value, so that has also been fixed. Patch by Henrik G Olsson! Differential Revision: https://reviews.llvm.org/D79097	2021-05-17 21:51:15 -07:00
Philip Reames	ed9d70781b	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)" This reverts commit `6d3e3ae8a9`. Still seeing PPC build bot failures, and one arm self host bot failing. I'm officially stumped, and need help from a bot owner to reduce.	2021-05-17 20:53:28 -07:00
Serguei Katkov	7bed58d28f	[Inliner] Copy attributes when deoptimize intrinsic is inlined During inlining of call-site with deoptimize intrinsic callee we miss attributes set on this call site. As a result attributes like deopt-lowering are disappeared resulting in inefficient behavior of register allocator in codegen. Just copy attributes for deoptimize call like we do for others calls. Reviewers: reames, apilipenko Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D102602	2021-05-18 10:08:37 +07:00
Adam Nemet	fcffd087c6	[Matrix] Fold the transpose into the matmul operand used to fetch scalars For column-major this is: A * B^t whereas for row-major: A^t * B Differential Revision: https://reviews.llvm.org/D101762	2021-05-17 17:40:46 -07:00
Philip Reames	6d3e3ae8a9	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:59:25 -07:00
Philip Reames	d16da7343d	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `c23ce54b36`. I apparently missed some newly added non-x86 tests.	2021-05-17 16:49:32 -07:00
Philip Reames	c23ce54b36	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:33:56 -07:00
Philip Reames	b6320eeb86	Do actual DCE in LoopUnroll (try 3) Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-17 14:47:02 -07:00
Sanjay Patel	3cdd05e519	[InstCombine] fold fnegs around select This is one of the folds requested in: https://llvm.org/PR39480 https://alive2.llvm.org/ce/z/NczU3V Note - this uses the normal FMF propagation logic (flags transfer from the final value to new/intermediate ops). It's not clear if this matches what Alive2 implements, so we may want to adjust one or the other.	2021-05-17 14:53:49 -04:00
Roman Lebedev	0633d5ce7b	[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition. I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests, so in principle i'm fine with landing this without review, but just in case.. This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(unsigned val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one, since that is what i need: ``` int countActiveBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` I've followed in footstep of 'left-shift until bittest' idiom (D91038), in the sense that iff the `ctlz` intrinsic is cheap, we'll transform, regardless of all other factors. This can have a shocking effect on certain benchmarks: ``` raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm 2021-05-09T01:06:05+03:00 Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench Run on (32 X 3600.24 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 5.26, 6.29, 3.49 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322 p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319 p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0 2021-05-09T01:06:24+03:00 Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Run on (32 X 3599.95 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 4.05, 5.95, 3.43 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786 p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195 p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0 ``` Reviewed By: craig.topper, zhuhan0 Differential Revision: https://reviews.llvm.org/D102116	2021-05-17 20:33:33 +03:00
Hongtao Yu	f28ee1a2b3	[CSSPGO] Update pseudo probe distribution factor based on inline context. With prelink inlining, pseudo probes with same ID can come from different inline contexts. Such probes should not share samples and their factors should be fixed up separately. I'm seeing 0.3% speedup for SPEC2017 overall. Benchmark 631.deepsjeng_s benefits the most, about 4%. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D102429	2021-05-16 23:11:36 -07:00
Philip Reames	6ae9893ed2	Revert "Do actual DCE in LoopUnroll (try 2)" This reverts commit `653fa0b46a`. Reported to trigger pr50354. Reverting until investigated.	2021-05-16 09:38:36 -07:00
Kuter Dinel	64ef29bc66	[Attributor] Call site specific AAValueSimplification and AAIsDead. This patch makes it possible to do call site specific deductions for AAValueSimplification and AAIsDead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84722	2021-05-15 21:39:07 +00:00
Simon Pilgrim	e30540a603	SampleProfileLoader::inlineHotFunctionsWithPriority - Fix uninitialized variable warning. NFCI. findIndirectCallFunctionSamples will leave Sum uninitialized if it returns an empty vector, we don't really use Sum in this case (but we do make a copy that isn't used either) - so ensure we initialize the value to zero to at least silence the static analysis warning.	2021-05-15 15:02:52 +01:00
Nikita Popov	f9e9b0cdb4	[CFG] Move reachable from entry checks into basic block variant These checks are not specific to the instruction based variant of isPotentiallyReachable(), they are equally valid for the basic block based variant. Move them there, to make sure that switching between the instruction and basic block variants cannot introduce regressions.	2021-05-15 15:42:02 +02:00
Simon Pilgrim	f0660a977e	[Local] collectBitParts - bail out if we find more than one root input value. All the uses that we have for collectBitParts revolve around us matching down to an operation with a single root value - I don't think we're intending to change that (and a lot of collectBitParts assumes it). The binops cases (OR/FSHL/FSHR) already check if the providers are the same, but that would still mean we waste time collecting through unaryops before getting to them.	2021-05-15 13:58:42 +01:00
Simon Pilgrim	401d6685c0	[InstCombine] InstCombinerImpl::visitOr - enable bitreverse matching Currently we only match bswap intrinsics from or(shl(),lshr()) style patterns when we could often match bitreverse intrinsics almost as cheaply. Differential Revision: https://reviews.llvm.org/D90170	2021-05-15 13:39:09 +01:00
Simon Pilgrim	28aa7d378a	[Local] collectBitParts - early-out from binops. NFCI. Minor speedup by not bothering to attempt to collect the second operand's bit parts if we already know its failed in the first operand.	2021-05-15 13:04:10 +01:00
Nikita Popov	fb9ed1979a	[IR] Add BasicBlock::isEntryBlock() (NFC) This is a recurring and somewhat awkward pattern. Add a helper method for it.	2021-05-15 12:41:58 +02:00
Vitaly Buka	6ce7b2f026	Fix "is not used" warning	2021-05-14 20:58:58 -07:00
Philip Reames	fcd12fed41	Extract a helper routine to simplify D91481 [NFC]	2021-05-14 18:40:23 -07:00
Nick Desaulniers	8c72749bd9	[LowerConstantIntrinsics] reuse isManifestLogic from ConstantFolding GlobalVariables are Constants, yet should not unconditionally be considered true for __builtin_constant_p. Via the LangRef https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic: This intrinsic generates no code. If its argument is known to be a manifest compile-time constant value, then the intrinsic will be converted to a constant true value. Otherwise, it will be converted to a constant false value. In particular, note that if the argument is a constant expression which refers to a global (the address of which _is_ a constant, but not manifest during the compile), then the intrinsic evaluates to false. Move isManifestConstant from ConstantFolding to be a method of Constant so that we can reuse the same logic in LowerConstantIntrinsics. pr/41459 Reviewed By: rsmith, george.burgess.iv Differential Revision: https://reviews.llvm.org/D102367	2021-05-14 15:35:21 -07:00
wlei	e475d4d69f	[CSSPGO] Fix return value of getProbeWeight Currently we didn't support multiple return type, we work around to use error_code to represent: 1) The dangling probe. 2) Ignore the weight of non-probe instruction While merging the instructions' weight for the whole BB, it will filter out the error code. But If all instructions of the BB give error_code, the outside logic will mark it as a BB requiring the inference algorithm to infer its weight. This is different from the zero value which will be treated as a cold block. Fix one place that if we can't find the FunctionSamples in the profile data which indicates the BB is cold, we choose to return zero. Also refine the comments. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D102007	2021-05-14 14:06:09 -07:00
Sanjay Patel	e82db87fb1	[InstCombine] drop poison flags when simplifying 'shl' based on demanded bits As with other transforms in demanded bits, we must be careful not to wrongly propagate nsw/nuw if we are reducing values leading up to the shift. This bug was introduced with `1b24f35f84` and leads to the miscompile shown in: https://llvm.org/PR50341	2021-05-14 13:54:13 -04:00
Philip Reames	653fa0b46a	Do actual DCE in LoopUnroll (try 2) Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-14 10:42:36 -07:00
Philip Reames	e488bf815f	Revert "Do actual DCE in LoopUnroll" This reverts commit `9d1a61e695`. I'd missed some review feedback, and had missed updating an aarch64 test. Reverting while I fix both.	2021-05-14 10:15:30 -07:00
Philip Reames	9d1a61e695	Do actual DCE in LoopUnroll LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-14 10:05:25 -07:00
Philip Reames	3f1c218318	[rs4gc] Strip memory related attributes consistently I noticed that rs4gc is not stripping a number of memory aliasing related attributes. We do strip some from call sites, but don't strip the same ones from declarations or parameters. Why do we need to strip these? Two answers: Safepoints conceptually read and write to the entire garbage collected heap in the physical model. We need this to preserve ordering of all loads and stores with respect to possible relocation. We can infer other attributes from these. For instance, readnone can imply both nofree and nosync. Both of which don't hold after physical rewriting. Note: This exposed a latent issue which was fixed a couple weeks back in `01801d5274`. Differential Revision: https://reviews.llvm.org/D99802	2021-05-14 07:54:56 -07:00
Djordje Todorovic	01c90bbd4f	[Transforms][Debugify] Fix "Missing line" false alarm on PHI nodes This is a fix for https://bugs.llvm.org/show_bug.cgi?id=49959 The "Missing line" false alarm was introduced in D75242. Patch by Yilong Guo<yilong.guo@intel.com> Differential Revision: https://reviews.llvm.org/D100446	2021-05-14 14:06:13 +02:00
Sander de Smalen	f82966d19a	[LoopVectorizationLegality] NFC: Mark some interfaces as 'const' This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired and getSymbolicStrides as 'const'.	2021-05-14 11:53:54 +01:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
Simon Pilgrim	079bbea2b2	[Local] collectBitParts - for bswap-only matches, limit shift amounts to whole bytes to reduce compile time.	2021-05-14 11:42:52 +01:00
Simon Pilgrim	78c8451cd7	[Local] collectBitParts - reduce maximum recursion depth. As noticed on D90170, the recursion depth for matching a maximum of a i128 bitwidth was too high. @lebedev.ri mentioned that we can probably do better by limiting the number of collected Values instead of just depth, but I'll look at that later.	2021-05-14 11:42:51 +01:00
Anton Afanasyev	207cdd7ed9	[SLP] Fix spill cost computation for insertelement tree node This is follow up for D98714, bugfixing.	2021-05-14 13:14:41 +03:00
Sander de Smalen	459c48e04f	NFCI: Remove VF argument from isScalarWithPredication As discussed in D102437, the VF argument to isScalarWithPredication seems redundant, so this is intended to be a non-functional change. It seems wrong to query the widening decision at this point. Removing the operand and code to get the widening decision causes no unit/regression tests to fail. I've also found no issues running the LLVM test-suite. This subsequently removes the VF argument from isPredicatedInst as well, since it is no longer required.	2021-05-14 10:34:40 +01:00
dfukalov	fdae3fc8b3	[GVN] Clobber partially aliased loads. Use offsets stored in `AliasResult` implemented in D98718. Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95543	2021-05-14 11:17:14 +03:00
David Green	f7cb654763	[DSE] Move isOverwrite into DSEState. NFC This moves the isOverwrite function into the DSEState so that it can share the analyses and members from the state. A few extra loop tests were also added to test stores in and around multi block loops for D100464.	2021-05-14 09:16:51 +01:00
Joseph Huber	8b57ed09bd	[OpenMP] Prevent Attributor from deleting functions in OpenMPOptCGSCC pass Summary: This patch prevents the Attributor instances made in the CGSCC pass from deleting functions. This prevents the attributor from changing the call graph while OpenMPOpt is working with it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102363	2021-05-13 16:35:23 -04:00
cynecx	8ec9fd4839	Support unwinding from inline assembly I've taken the following steps to add unwinding support from inline assembly: 1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax: ``` invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"() to label %exit unwind label %uexit ``` 2.) Add Bitcode writing/reading support + LLVM-IR parsing. 3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled. 4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind. 5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled. 6.) Don't allow unwinding callbr. Reviewed By: Amanieu Differential Revision: https://reviews.llvm.org/D95745	2021-05-13 19:13:03 +01:00
Florian Hahn	bdada7546e	[VPlan] Adjust assert in splitBlock to allow splitting at end. SplitAt should only be dereferenced in the assert if it does not point to the end of the block. This fixes a crash in the added test case.	2021-05-13 13:36:35 +01:00
Jingu Kang	107d19eb01	Revert "[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch" This reverts commit `88b259c014`. It needs to fix below bugs. https://bugs.llvm.org/show_bug.cgi?id=50279 https://bugs.llvm.org/show_bug.cgi?id=50302	2021-05-13 08:40:49 +01:00
Chuanqi Xu	c1359ef07e	[Coroutines] Salvege Debug.values Summary: The previous implementation of coro-split didn't collect values used by dbg instructions into the spills which made a log debug info unavailable with optimization on. This patch tries to collect these uses which are used by dbg.values. In this way, the debugbility of coroutine could be as powerful as normal functions with optimization on. To avoid enlarging the coroutine frame, this patch only collects `dbg.value` whose value is already in the coroutine frame. This decision may make some debug info getting unavailable. But if we are with optimization on, the performance issue should be considered first. And this patch would make the debugbility of coroutine to be better only without changing the layout of the frame. Test-plan: check-llvm Reviewed By: aprantl, lxfind Differential Revision: https://reviews.llvm.org/D97673	2021-05-13 13:06:33 +08:00
Chuanqi Xu	6e5b8f489a	[Coroutines] Enable printing coroutine frame when dbg info is available Summary: This patch tries to build debug info for coroutine frame in the middle end. Although the coroutine frame is constructed and maintained by the compiler and the programmer shouldn't care about the coroutine frame by the design of C++20 coroutine, a lot of programmers told me that they want to see the layout of the coroutine frame strongly. Although C++ is designed as an abstract layer so that the programmers shouldn't care about the actual memory in bits, many experienced C++ programmers are familiar with assembler and debugger to see the memory layout in fact, After I was been told they want to see the coroutine frame about 3 times, I think it is an actual and desired demand. However, the debug information is constructed in the front end and coroutine frame is constructed in the middle end. This is a natural and clear gap. So I could only try to construct the debug information in the middle end after coroutine frame constructed. It is unusual, but we are in consensus that the approch is the best one. One hard part is we need construct the name for variables since there isn't a map from llvm variables to DIVar. Then here is the strategy this patch uses: - The name `__resume_fn `, `__destroy_fn` and `__coro_index ` are constructed by the patch. - Then the name `__promise` comes from the dbg.variable of corresponding dbg.declare of PromiseAlloca, which shows highest priority to construct the debug information for the member of coroutine frame. - Then if the member is struct, we would try to get the name of the llvm struct directly. Then replace ':' and '.' with '_' to make it printable for debugger. - If the member is a basic type like integer or double, we would try to emit the corresponding name. - Then if the member is a Pointer Type, we would add `Ptr` after corresponding pointee type. - Otherwise, we would name it with 'UnknownType'. Reviewered by: lxfind, aprantl, rjmcall, dblaikie Differential Revision: https://reviews.llvm.org/D99179	2021-05-13 12:43:08 +08:00
Anton Afanasyev	ab2c499d3a	[SLP] Add insertelement instructions to vectorizable tree Add new type of tree node for `InsertElementInst` chain forming vector. These instructions could be either removed, or replaced by shuffles during vectorization and we can add this node to cost model, so naturally estimating their cost, getting rid of `CompensateCost` tricks and reducing further work for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this patch is the first step towards revectorization of partially vectorization (to fix PR42022 completely). After adding inserts to tree the next step is to add vector instructions there (for instance, to merge `store <2 x float>` and `store <2 x float>` to `store <4 x float>`). Fixes PR40522 and PR35732. Differential Revision: https://reviews.llvm.org/D98714	2021-05-13 07:41:45 +03:00
Justin Bogner	e7d26aceca	Change the context instruction for computeKnownBits in LoadStoreVectorizer pass This change enables cases for which the index value for the first load/store instruction in a pair could be a function argument. This allows using llvm.assume to provide known bits information in such cases. Patch by Viacheslav Nikolaev. Thanks! Differential Revision: https://reviews.llvm.org/D101680	2021-05-12 15:29:29 -07:00
Nikita Popov	a8f7dee1df	[InstCombine] Support one-hot merge for logical and/or If a logical and/or is used, we need to be careful not to propagate a potential poison value from the RHS by inserting a freeze instruction. Otherwise it works the same way as bitwise and/or. This is intended to address the regression reported at https://reviews.llvm.org/D101191#2751002. Differential Revision: https://reviews.llvm.org/D102279	2021-05-12 21:01:18 +02:00
Stelios Ioannou	1124ad2f5d	[LoopFlatten] Simplify loops so that the pass can operate on unsimplified loops. The loop flattening pass requires loops to be in simplified form. If the loops are not in simplified form, the pass cannot operate. This patch simplifies all loops before flattening. As a result, all loops will be simplified regardless of whether anything ends up being flattened. This change was inspired by observing a certain loop that was not flatten because the loops were not in simplified form. This loop is added as a test to verify that it is now flattened. Differential Revision: https://reviews.llvm.org/D102249 Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30	2021-05-12 19:22:01 +01:00
Roman Lebedev	554b1bced3	[InstCombine] ~(C + X) --> ~C - X (PR50308) We can not rely on (C+X)-->(X+C) already happening, because we might not have visited that `add` yet. The added testcase would get stuck in an endless combine loop.	2021-05-12 16:10:55 +03:00
David Sherwood	b7a11274f9	[LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors In InnerLoopVectorizer::widenPHIInstruction there are cases where we have to scalarise a pointer induction variable after vectorisation. For scalable vectors we already deal with the case where the pointer induction variable is uniform, but we currently crash if not uniform. For fixed width vectors we calculate every lane of the scalarised pointer induction variable for a given VF, however this cannot work for scalable vectors. In this case I have added support for caching the whole vector value for each unrolled part so that we can always extract an arbitrary element. Additionally, we still continue to cache the known minimum number of lanes too in order to improve code quality by avoiding an extractelement operation. I have adapted an existing test `pointer_iv_mixed` from the file: Transforms/LoopVectorize/consecutive-ptr-uniforms.ll and added it here for scalable vectors instead: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D101294	2021-05-12 11:02:11 +01:00
Qiu Chaofan	6d2df18163	[VectorComine] Restrict single-element-store index to inbounds constant Vector single element update optimization is landed in `2db4979`. But the scope needs restriction. This patch restricts the index to inbounds and vector must be fixed sized. In future, we may use value tracking to relax constant restrictions. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D102146	2021-05-12 13:18:20 +08:00
Congzhe Cao	3f8be15f29	[LoopInterchange] Handle lcssa PHIs with multiple predecessors This is a bugfix in the transformation phase. If the original outer loop header branches to both the inner loop (header) and the outer loop latch, and if there is an lcssa PHI node outside the loop nest, then after interchange the new outer latch will have an lcssa PHI node inserted which has two predecessors, i.e., the original outer header and the original outer latch. Currently the transformation assumes it has only one predecessor (the original outer latch) and crashes, since the inserted lcssa PHI node does not take both predecessors as incoming BBs. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D100792	2021-05-11 21:30:54 -04:00
Jordan Rupprecht	fec2945998	Revert "[GVN] Clobber partially aliased loads." This reverts commit `6c57044231`. It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543: ``` $ cat repro.ll ; ModuleID = 'repro.ll' source_filename = "repro.ll" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct.widget = type { i32 } %struct.baz = type { i32, %struct.snork } %struct.snork = type { %struct.spam } %struct.spam = type { i32, i32 } @global = external local_unnamed_addr global %struct.widget, align 4 @global.1 = external local_unnamed_addr global i8, align 1 @global.2 = external local_unnamed_addr global i32, align 4 define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 { bb: %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1 %tmp1 = bitcast %struct.snork* %tmp to i64* %tmp2 = load i64, i64* %tmp1, align 4 %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1 %tmp4 = icmp ugt i64 %tmp2, 4294967295 br label %bb5 bb5: ; preds = %bb14, %bb %tmp6 = load i32, i32* %tmp3, align 4 %tmp7 = icmp ne i32 %tmp6, 0 %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false %tmp9 = zext i1 %tmp8 to i8 store i8 %tmp9, i8* @global.1, align 1 %tmp10 = load i32, i32* @global.2, align 4 switch i32 %tmp10, label %bb11 [ i32 1, label %bb12 i32 2, label %bb12 ] bb11: ; preds = %bb5 br label %bb14 bb12: ; preds = %bb5, %bb5 %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4 br label %bb14 bb14: ; preds = %bb12, %bb11 br label %bb5 } $ opt -O2 repro.ll -disable-output opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst , unsigned int, llvm::Type , llvm::Instruction , const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output ... ```	2021-05-11 16:08:53 -07:00
Congzhe Cao	40e3aa39bd	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 18:36:53 -04:00
Congzhe Cao	d3f89d4d16	Revert "[LoopInterchange] Fix legality for triangular loops" This reverts commit `29342291d2`. The test case requires an assert build. Will add REQUIRES and re-commit.	2021-05-11 18:10:58 -04:00
Nikita Popov	1556540372	[InstCombine] Clean up one-hot merge optimization (NFC) Remove the requirement that the instruction is a BinaryOperator, make the predicate check more compact and use slightly more meaningful naming for the and operands.	2021-05-11 23:22:11 +02:00
Fangrui Song	129f466e22	[GlobalOpt] Remove heap SROA GlobalOpt implements a heap SROA (SROA for an malloc allocatated struct or array of structs) which is largely undertested (heap-sra-[1234].ll are basically the same test with very little difference) and does not trigger at all when bootstrapping clang (it only supports the case of one single store). The heap SROA implementation causes PR50027 (GEP is not properly handled; crash or miscompile). Just drop the implementation. I have deleted some obviously duplicated tests but kept `heap-sra-[12]{,-no-nullopt}.ll`. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102257	2021-05-11 11:34:37 -07:00
Eli Friedman	61cbbba7a6	[ArgumentPromotion] Fix byval alignment handling. Make sure the alignment of the generated operations matches the alignment of the byval argument. Previously, we were just ignoring alignment and getting lucky. While I'm here, also delete the unnecessary "tail" handling. Passing a pointer to a byval argument to a "tail" call is UB, so rewriting to an alloca doesn't require any special handling. Differential Revision: https://reviews.llvm.org/D89819	2021-05-11 11:22:18 -07:00
Congzhe Cao	29342291d2	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 11:00:46 -04:00
Florian Hahn	faebc6bf10	[VPlan] Register recipe for instr if the simplified value is recipe. If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled in general may change soon, following some post-commit comments. This fixes PR50298.	2021-05-11 14:32:34 +01:00
Sanjay Patel	49950cb1f6	[SLP] restrict matching of load combine candidates The test example from https://llvm.org/PR50256 (and reduced here) shows that we can match a load combine candidate even when there are no "or" instructions. We can avoid that by confirming that we do see an "or". This doesn't apply when matching an or-reduction because that match begins from the operands of the reduction. Differential Revision: https://reviews.llvm.org/D102074	2021-05-11 08:46:40 -04:00
Sanjay Patel	5577e86691	[InstCombine] fold extract subvector of bitcast insertelt This is visible in the original example from: https://llvm.org/PR50055 (but this change doesn't solve the bug) https://alive2.llvm.org/ce/z/vM_Yq-	2021-05-10 17:20:10 -04:00
Nikita Popov	463ea28e96	[InstCombine] Fold comparison of integers by parts Let's say you represent (i32, i32) as an i64 from which the parts are extracted with lshr/trunc. Then, if you compare two tuples by parts you get something like A[0] == B[0] && A[1] == B[1], just that the part extraction happens by lshr/trunc and not a narrow load or similar. The fold implemented here reduces such equality comparisons by converting them into a comparison on a larger part of the integer (which might be the whole integer). It handles both the "and of eq" and the conjugated "or of ne" case. I'm being conservative with one-use for now, though this could be relaxed if profitable (the base pattern converts 11 instructions into 5 instructions, but there's quite a few variations on how it can play out). Differential Revision: https://reviews.llvm.org/D101232	2021-05-10 22:22:39 +02:00
Nikita Popov	aa9b02ac75	[Inliner] Fix noalias metadata handling for instructions simplified during cloning (PR50270) Instead of using VMap, which may include instructions from the caller as a result of simplification, iterate over the (FirstNewBlock, Caller->end()) range, which will only include new instructions. Fixes https://bugs.llvm.org/show_bug.cgi?id=50270. Differential Revision: https://reviews.llvm.org/D102110	2021-05-10 21:59:59 +02:00
Sanjay Patel	88d8f10baf	[PassManager] add helper function to hold set of vector passes (2nd try) This is better no-functional-change-intended than the 1st attempt. As noted in D102002, there were at least 2 diffs that went unchecked in pass manager regressions tests: different pass parameters (SimplifyCFG) and an extension point/callback. Those should be lifted from the original code blocks correctly now.	2021-05-10 14:43:00 -04:00
Sanjay Patel	822be4bec8	Revert "[PassManager] add helper function to hold set of vector passes" This reverts commit `fefcb1f878`. It was supposed to be NFC, but as noted in the post-commit comments in D102002, that was not true: SimplifyCFG uses different parameters and there's a difference in an extension point / callback.	2021-05-10 10:59:30 -04:00
Alexey Bataev	30463bc3f1	[SLP]Do not count perfect diamond matches for gathers several times. Need to remove the old code for avoiding double counting of the gather nodes with perfect diamond matches within the tree after we started detecting perfect/shuffled matching in the previous patch D100495. We may skip the cost for such nodes completely. Differential Revision: https://reviews.llvm.org/D102023	2021-05-10 07:08:07 -07:00
Teresa Johnson	220f6e5271	[SimplifyCFG] Ignore ephemeral values when counting insts for threading Ignore ephemeral values (only feeding llvm.assume intrinsics) when computing the instruction count to decide if a block is small enough for threading. This is similar to the handling of these values in the InlineCost computation. These instructions will eventually be removed and shouldn't count against code size (similar to the existing ignoring of phis). Without this change, when enabling -fwhole-program-vtables, which causes type test / assume sequences to be inserted by clang, we can get different threading decisions. In particular, when building with instrumentation FDO it can affect the optimizations decisions before FDO matching, leading to some mismatches. Differential Revision: https://reviews.llvm.org/D101494	2021-05-09 19:06:54 -07:00
Roman Lebedev	1acd9a1a29	Revert "[LICM] Hoist loads with invariant.group metadata" This appears to miscompile google benchmark's GetCacheSizesFromKVFS() when compiling with -fstrict-vtable-pointers. Runnable reproducer: https://godbolt.org/z/f9ovKqTzb The "f.fail()" crashes with BUS error, it is compiled into testb, and the adress it is testing is non-sensical. This reverts commit `4c89bcadf6`.	2021-05-08 15:44:49 +03:00

... 9 10 11 12 13 ...

28315 Commits