llvm-project

Commit Graph

Author	SHA1	Message	Date
Stephen Tozer	14b62f7e2f	[DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops This patch fixes an issue which occurred in CodeGenPrepare and HWAddressSanitizer, which both at some point create a map of Old->New instructions and update dbg.value uses of these. They did this by iterating over the dbg.value's location operands, and if an instance of the old instruction was found, replaceVariableLocationOp would be called on that dbg.value. This would cause an error if the same operand appeared multiple times as a location operand, as the first call to replaceVariableLocationOp would update all uses of the old instruction, invalidating the old iterator and eventually hitting an assertion. This has been fixed by no longer iterating over the dbg.value's location operands directly, but by first collecting them into a set and then iterating over that, ensuring that we never attempt to replace a duplicated operand multiple times. Differential Revision: https://reviews.llvm.org/D105129	2021-07-05 10:35:19 +01:00
Nikita Popov	a213f735d8	[IR] Deprecate GetElementPtrInst::CreateInBounds without element type This API is not compatible with opaque pointers, the method accepting an explicit pointer element type should be used instead. Thankfully there were few in-tree users. The BPF case still ends up using the pointer element type for now and needs something like D105407 to avoid doing so.	2021-07-04 16:49:30 +02:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Roman Lebedev	fc150cecd7	[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable This replaces the current ad-hoc implementation, by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`, with one exception that here in SimplifyCFG we are allowed to remove EH instructions. Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return), arithmetic/logic operations, etc. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105374	2021-07-03 10:45:44 +03:00
Fangrui Song	252a1eecc0	[ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for isDeclarationForLinker `GlobalValue`s. It missed a case for imported declarations (`doImportAsDefinition` is false while `isPerformingImport` is true). This can lead to a linker error for a default visibility symbol in `ld.lld -shared`. When `ClearDSOLocalOnDeclarations` is true, we check `isPerformingImport() && !doImportAsDefinition(&GV)` along with `GV.isDeclarationForLinker()`. The new condition checks an imported declaration. This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104986	2021-07-02 17:08:25 -07:00
Roman Lebedev	53fef0b293	[NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs Mimics similar change for InstCombine: `ce192ced2b` / D104602 All these uses are in blocks that aren't reachable from function's entry, and said blocks are removed by SimplifyCFG itself, so we can't really test this change.	2021-07-02 22:11:52 +03:00
Heejin Ahn	51fecd17bb	[InstCombine] Don't combine PHI before catchswitch This tries to bail out if the PHI is in a `catchswitch` BB in InstCombine. A PHI cannot be combined into a non-PHI instruction if it is in a `catchswitch` BB, because `catchswitch` BB cannot have any non-PHI instruction other than `catchswitch` itself. The given test case started crashing after D98058. Reviewed By: lebedev.ri, rnk Differential Revision: https://reviews.llvm.org/D105309	2021-07-02 12:10:24 -07:00
Roman Lebedev	da81ec6158	[SimplifyCFG] Volatile memory operations do not trap Somewhat related to D105338. While it is up for discussion whether or not volatile store traps, so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap. And even if simplifycfg currently concervatively believes that to be the case, instcombine does not: https://godbolt.org/z/5vhv4K5b8 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105343	2021-07-02 21:47:44 +03:00
Alexey Bataev	7f7e4aed21	[SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by V.Dmitriev. Reduces number of arguments	2021-07-02 10:30:13 -07:00
Jon Roelofs	37b6e03c18	[Intrinsics] Make MemCpyInlineInst a MemCpyInst This opens up more optimization opportunities in passes that already handle MemCpyInst's. Differential revision: https://reviews.llvm.org/D105247	2021-07-02 10:25:24 -07:00
Roman Lebedev	13e35ac124	[NFC][InstCombine] visitUnreachableInst(): enhance comments somewhat	2021-07-02 17:30:01 +03:00
Roman Lebedev	dadedc99e9	[InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. This is a recommit of `d9d65527c2`, which i immediately reverted because i have messed up something during branch switch, and `597ccc92ce` accidentally ended up being pushed, which was very much not the intention. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:20:21 +03:00
Roman Lebedev	24d271bb18	Revert "https://godbolt.org/z/5vhv4K5b8 " This reverts commit `597ccc92ce`.	2021-07-02 17:17:55 +03:00
Roman Lebedev	93a1642763	Revert "[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable" This reverts commit `d9d65527c2`.	2021-07-02 17:17:47 +03:00
Roman Lebedev	d9d65527c2	[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:17:03 +03:00
Roman Lebedev	597ccc92ce	https://godbolt.org/z/5vhv4K5b8	2021-07-02 17:16:19 +03:00
Nico Weber	a92964779c	Revert "[InstrProfiling] Use external weak reference for bias variable" This reverts commit `33a7b4d9d8`. Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176	2021-07-02 09:05:12 -04:00
Florian Hahn	a3ca578eb9	[Matrix] Fix crash during fusion if the same load is re-used. This patch fixes a crash when the same load is used for both operands of a fuseable multiply.	2021-07-02 14:00:17 +01:00
Alexey Bataev	28ac873bcb	[SLP]Fix gathering of the scalars by not ignoring UndefValues. The compiler should not ignore UndefValue when gathering the scalars, otherwise the resulting code may be less defined than the original one. Also, grouped scalars to insert them at first to reduce the analysis in further passes. Differential Revision: https://reviews.llvm.org/D105275	2021-07-02 04:46:48 -07:00
Florian Hahn	7655061cc6	[Matrix] Hoist address computation before multiply to enable fusion. If the store address does not dominate the matrix multiply, try to hoist address computation instructions without side-effects and/or memory reads before the multiply, to allow fusion. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D105193	2021-07-02 09:52:11 +01:00
Evgeniy Brevnov	9568811cb8	[NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D101044	2021-07-02 10:00:47 +07:00
Craig Topper	066524ea54	[ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0. Previously we used the vector type, but we're loading/storing invididual elements so I think only element alignment should matter. Noticed while looking at the code for something else so I don't have a test case. Differential Revision: https://reviews.llvm.org/D105220	2021-07-01 19:08:47 -07:00
Petr Hosek	33a7b4d9d8	[InstrProfiling] Use external weak reference for bias variable We need the compiler generated variable to override the weak symbol of the same name inside the profile runtime, but using LinkOnceODRLinkage results in weak symbol being emitted which leads to an issue where the linker might choose either of the weak symbols potentially disabling the runtime counter relocation. This change replaces the use of weak definition inside the runtime with an external weak reference to address the issue. We also place the compiler generated symbol inside a COMDAT group so dead definition can be garbage collected by the linker. Differential Revision: https://reviews.llvm.org/D105176	2021-07-01 15:25:31 -07:00
Philip Reames	955f125899	[instcombine] Fold overflow check using overflow intrinsic to comparison This follows up to D104665 (which added umulo handling alongside the existing uaddo case), and generalizes for the remaining overflow intrinsics. I went to add analogous handling to LVI, and discovered that LVI already had a more general implementation. Instead, we can port was LVI does to instcombine. (For context, LVI uses makeExactNoWrapRegion to constrain the value 'x' in blocks reached after a branch on the condition `op.with.overflow(x, C).overflow`.) Differential Revision: https://reviews.llvm.org/D104932	2021-07-01 09:41:55 -07:00
Arnold Schwaighofer	4a361f5209	[coro async] Add support for specifying which parameter is swiftself in async resume functions Differential Revision: https://reviews.llvm.org/D104147	2021-07-01 07:33:15 -07:00
David Sherwood	51b4ab26ca	[NFC] Add new setDebugLocFromInst that uses the class Builder by default In lots of places we were calling setDebugLocFromInst and passing in the same Builder member variable found in InnerLoopVectorizer. I personally found this confusing so I've changed the interface to take an Optional<IRBuilder<> *> and we can now pass in None when we want to use the class member variable. Differential Revision: https://reviews.llvm.org/D105100	2021-07-01 14:23:34 +01:00
Roman Lebedev	333d3a3cdf	[NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO' I.e. it will be `false` for thin lto.	2021-07-01 10:09:24 +03:00
Chuanqi Xu	51fbd18706	[Coroutine] Recommit Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-07-01 11:01:28 +08:00
Sanjay Patel	0c400e8953	[InstCombine] fold icmp ult of offset value with constant This is one sibling of the fold added with `c7b658aeb5` . (X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN) I'm still not sure how to describe it best, but we're translating 2 constants from an unsigned range comparison to signed because that eliminates the offset (add) op. This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/K-fMBf define i1 @src(i8 %a, i8 %c2) { %t = add i8 %a, %c2 %c = add i8 %c2, 128 ; SMIN %ov = icmp ult i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c2) { %not_c2 = xor i8 %c2, -1 %ov = icmp sgt i8 %a, %not_c2 ret i1 %ov }	2021-06-30 19:00:12 -04:00
Xun Li	822b92aae4	[Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html In the existing design, An SCC that contains a coroutine will go through the folloing passes: Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat. As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended. What we really wanted is this: Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline (note that we don't really need to run Inliner again on the ramp function after split). Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function. This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually. By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines. Reviewed By: aeubanks, ChuanqiXu Differential Revision: https://reviews.llvm.org/D95807	2021-06-30 11:38:14 -07:00
Sanjay Patel	c7b658aeb5	[InstCombine] fold icmp of offset value with constant There must be a better way to describe this pattern in words? (X + C2) >u C --> X <s -C2 (if C == C2 + SMAX) This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/rdfNFP define i1 @src(i8 %a, i8 %c1) { %t = add i8 %a, %c1 %c2 = add i8 %c1, 127 ; SMAX %ov = icmp ugt i8 %t, %c2 ret i1 %ov } define i1 @tgt(i8 %a, i8 %c1) { %neg_c1 = sub i8 0, %c1 %ov = icmp slt i8 %a, %neg_c1 ret i1 %ov } The pattern was noticed as a by-product of D104932.	2021-06-30 13:37:31 -04:00
Philip Reames	c4fc2cb5b2	[instcombine] umin(x, 1) == zext(x != 0) We already implemented this for the select form, but the intrinsic form was missing. Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.	2021-06-30 10:20:01 -07:00
Joseph Huber	ecabc6684f	[OpenMP] Change analysis remarks to not emit on cold functions The remarks will trigger on some functions that are marked cold, such as the `__muldc3` intrinsic functions. Change the remarks to avoid these functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105196	2021-06-30 11:54:24 -04:00
Nico Weber	db86e5c914	Revert "[Coroutine] Add statistics for the number of elided coroutine" This reverts commit `1d9539cf49`. Test fails in LLVM_ENABLE_ASSERTIONS=OFF builds (such as regular release builds).	2021-06-30 10:22:45 -04:00
Joseph Huber	0edb87773b	[OpenMP] Add additional remarks for OpenMPOpt This patch adds additional remarks, suggesting the use of `noescape` for failed globalization and indicating when internalization failed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105150	2021-06-30 09:49:25 -04:00
David Sherwood	7b7b5b5a26	[NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-30 11:11:49 +01:00
Chuanqi Xu	801c2b9bba	[FuncSpec] Add an option to specializing literal constant Now the option is off by default. Since we are not sure if this option would make the compile time increase aggressively. Although we tested it on SPEC2017, we may need to test more to make it on by default. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104365	2021-06-30 11:26:44 +08:00
Chuanqi Xu	1d9539cf49	[Coroutine] Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-06-30 11:20:53 +08:00
Jianzhou Zhao	ae6648cee0	[dfsan] Expose dfsan_get_track_origins to get origin tracking status This allows application code checks if origin tracking is on before printing out traces. -dfsan-track-origins can be 0,1,2. The current code only distinguishes 1 and 2 in compile time, but not at runtime. Made runtime distinguish 1 and 2 too. Reviewed By: browneee Differential Revision: https://reviews.llvm.org/D105128	2021-06-29 20:32:39 +00:00
Nikita Popov	c4de78e91c	[SanitizerCoverage] Fix global type check with opaque pointers The code was previously relying on the fact that an incorrectly typed global would result in the insertion of a BitCast constant expression. With opaque pointers, this is no longer the case, so we should check the type explicitly.	2021-06-29 20:32:14 +02:00
Alexey Bataev	129ae515fb	[INSTCOMBINE] Transform reduction(shuffle V, poison, unique_mask) to reduction(V). After SLP + LTO we may have have reduction(shuffle V, poison, mask). This can be simplified to just reduction(V) if the mask is only for single vector and just all elements from this vector are permuted, without reusing, replacing with undefs and/or other values, etc. Differential Revision: https://reviews.llvm.org/D105053	2021-06-29 10:02:38 -07:00
Philip Reames	e49d65f36d	[LV] Fix bug when unrolling (only) a loop with non-latch exit If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so. The included test case on an unmodified opt will access memory one past the expected bound. As a result, this patch is fixing a latent miscompile. Differential Revision: https://reviews.llvm.org/D103700	2021-06-29 08:04:26 -07:00
Johannes Doerfert	7af91a2b8f	[Attributor][NFCI] Make the state of AAValueSimplify explicit As we have done with other states we want the AAValueSimplify state to be explicit to use it more easily in our helpers.	2021-06-29 09:38:22 -05:00
Johannes Doerfert	dcbe58d94c	[Attributor][NFCI] Remove unneeded namespace	2021-06-29 09:38:20 -05:00
Johannes Doerfert	457bd5c8d5	[Attributor] Teach AAPotentialValues about constant select conditions There was a TODO but now we actually check if the select condition is assumed constant and only look at the relevant operand.	2021-06-29 09:38:18 -05:00
Johannes Doerfert	8dc9bb6d85	[Attributor][NFC] Clang format	2021-06-29 09:38:15 -05:00
Johannes Doerfert	a33e128012	[InstCombine] Gracefully handle an alloca outside the alloca-AS While we might eventually want to disallow allocas that do not have the alloca-AS set, it seems undesirable to crash on them. Add a cast when required so that we can support such allocas (at least here). Differential Revision: https://reviews.llvm.org/D104866	2021-06-29 09:38:13 -05:00
David Sherwood	9de63367d8	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `9dde514162`.	2021-06-29 15:20:22 +01:00
David Sherwood	9dde514162	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 14:34:30 +01:00
David Sherwood	8a3365fba2	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `dcfc2c3fac`.	2021-06-29 14:04:42 +01:00
Florian Hahn	47215e1c62	[LV] Fix crash when target instruction for sinking is dead. This patch fixes a crash when the target instruction for sinking is dead. In that case, no recipe is created and trying to get the recipe for it results in a crash. To ensure all sink targets are alive, find & use the first previous alive instruction. Note that the case where the sink source is dead is already handled. Found by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104603	2021-06-29 13:31:22 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
Roman Lebedev	6cf6f6f65f	[NFC][InstCombine] foldAggregateConstructionIntoAggregateReuse(): cast to Instruction eagerly In all of these, the value must be an instruction for us to succeed anyway, so change it to maybe hopefully make further changes more straight-forward.	2021-06-29 13:29:18 +03:00
David Sherwood	dcfc2c3fac	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 09:14:35 +01:00
Sanjay Patel	9d0bf7699c	[InstCombine] don't try to fold a constant expression that can trap (PR50906) We could use a bigger hammer and bail out on any constant expression, but there's a regression test that appears to validly do the transform (although it may not have been intending to check that optimization).	2021-06-28 17:00:21 -04:00
Joseph Huber	57ad2e1067	[OpenMP] Prevent OpenMPOpt from internalizing uncalled functions Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary. Depends on D102423 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104890	2021-06-28 16:47:53 -04:00
Nikita Popov	7ac0442fe5	[SanitizerCoverage] Support opaque pointers Pass element type rather than pointer type to some functions, so we know which type to use for the global variables.	2021-06-28 22:18:42 +02:00
Arnold Schwaighofer	3dee1e8a84	[coro] Fix rematerializable instruction sinking to coro.suspend blocks There is a constraint that coro.suspend instructions need to be in their own blocks. The coro split pass initially creates IR that obeys this constraint (which is later checked). Sinking rematerializable instructions into these blocks breaks that constraint. Instead rematerialize in the predecessor block to the suspend's single predecessor block. Differential Revision: https://reviews.llvm.org/D104051	2021-06-28 09:37:45 -07:00
Nico Weber	540b4a5fb3	Revert "[DebugInfo] Enable variadic debug value salvaging" This reverts commit `adace79652`. Still breaks things, see comment on https://reviews.llvm.org/D91722	2021-06-28 11:25:09 -04:00
Reshabh Sharma	ae983de6cc	[InferAddressSpaces] NFC: For noop IntToPtr/PtrToInt pair cast to operator instead of PtrToInt Compiler crashes at an assertion while casting operands to PtrToIntInst at some cases when ptrtoint is present as an explicit operand to inttoptr. Explicit instruction operator as operand can not be casted to an Instruction. This patch replaces cast from PtrToInst to Operator which are later checked for constant expressions. Differential Revision: https://reviews.llvm.org/D105002	2021-06-28 19:24:26 +05:30
Joseph Huber	13b2fba239	[OpenMP][NFC] Fix typo in OpenMPOpt	2021-06-28 09:49:14 -04:00
Joseph Huber	4024087731	[OpenMP][NFC] Fix missing argument	2021-06-28 09:15:01 -04:00
Joseph Huber	4a6bd8e3e7	[OpenMP] Increase attributor iterations on the GPU Increase the number of attributor iterations on a GPU target. I forgot to change this in D104416. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104920	2021-06-28 08:50:49 -04:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Max Kazantsev	d58514d41c	[LSR][NFC] Make sure that after the canonicalization the formula is canonical	2021-06-28 12:50:04 +07:00
Max Kazantsev	7c73c2ede8	[LoopDeletion] Benefit from branches by undef conditions when symbolically executing 1st iteration We can exploit branches by `undef` condition. Frankly, the LangRef says that such branches are UB, so we can assume that all outgoing edges of such blocks are dead. However, from practical perspective, we know that this is not supported correctly in some other places. So we are being conservative about it. Branch by undef is treated in the following way: - If it is a loop-exiting branch, we always assume it exits the loop; - If not, we arbitrarily assume it takes `true` value. Differential Revision: https://reviews.llvm.org/D104689 Reviewed By: nikic	2021-06-28 11:39:46 +07:00
Nikita Popov	e81702912e	[DSE] Preserve address space Preserve address space when inserting i8* cast.	2021-06-27 20:26:00 +02:00
Nikita Popov	9aa951e80e	[MemCpyOpt] Preserve address space Preserve address space when generating the cast to i8*.	2021-06-27 20:21:19 +02:00
Nikita Popov	f00941e061	[DSE] Support opaque pointers For the start shortening optimization, always use a i8 type for the GEP, as it is a raw offset calculation. Handling of non-i8* memset/memcpy arguments requires insertion of casts. These cases were previously miscompiled, as the offset calculation was performed on the wrong type.	2021-06-27 17:41:40 +02:00
Nikita Popov	f025053977	[MemCpyOpt] Handle unusual memcpy element type Apparently, it is legal to use memcpy/memset with pointer types other than i8. Prior to `81fcdae68c` this case was silently miscompiled, as the i8 offset calculation was performed on some other type. Now it would crash due to a type mismatch. Fix this by inserting an explicit bitcast to i8.	2021-06-27 16:21:44 +02:00
Sanjay Patel	153da08a6c	[InstCombine] hoist min/max intrinsics above select with constant op This is an extension of the handling for unary intrinsics and follows the logic that we use for binary ops. We don't canonicalize to min/max intrinsics yet, but this might help unlock other folds seen in D98152.	2021-06-27 10:02:23 -04:00
Nikita Popov	81fcdae68c	[MemCpyOpt] Support opaque pointers	2021-06-27 15:52:38 +02:00
Nikita Popov	a9129f8964	[LoadStoreVectorizer] Support opaque pointers There are remaining redundant bitcasts.	2021-06-27 15:42:16 +02:00
Florian Hahn	f1a6430272	[VPlan] Track both incoming values for first-order recurrence phis. This patch updates VPWidenPHI recipes for first-order recurrences to also track the incoming value from the back-edge. Similar to D99294, which did the same for reductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104197	2021-06-27 14:29:35 +01:00
Florian Hahn	7f36981977	[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC). Suggested in D104197, avoids the early exit.	2021-06-26 13:13:25 +01:00
Andrew Browne	45f6d5522f	[DFSan] Change shadow and origin memory layouts to match MSan. Previously on x86_64: +--------------------+ 0x800000000000 (top of memory) \| application memory \| +--------------------+ 0x700000008000 (kAppAddr) \| \| \| unused \| \| \| +--------------------+ 0x300000000000 (kUnusedAddr) \| origin \| +--------------------+ 0x200000008000 (kOriginAddr) \| unused \| +--------------------+ 0x200000000000 \| shadow memory \| +--------------------+ 0x100000008000 (kShadowAddr) \| unused \| +--------------------+ 0x000000010000 \| reserved by kernel \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem & ~0x600000000000 SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow Now for x86_64: +--------------------+ 0x800000000000 (top of memory) \| application 3 \| +--------------------+ 0x700000000000 \| invalid \| +--------------------+ 0x610000000000 \| origin 1 \| +--------------------+ 0x600000000000 \| application 2 \| +--------------------+ 0x510000000000 \| shadow 1 \| +--------------------+ 0x500000000000 \| invalid \| +--------------------+ 0x400000000000 \| origin 3 \| +--------------------+ 0x300000000000 \| shadow 3 \| +--------------------+ 0x200000000000 \| origin 2 \| +--------------------+ 0x110000000000 \| invalid \| +--------------------+ 0x100000000000 \| shadow 2 \| +--------------------+ 0x010000000000 \| application 1 \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem ^ 0x500000000000 SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000 Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D104896	2021-06-25 17:00:38 -07:00
Nikita Popov	fdd4c199a1	Revert "[InstCombine] Make indexed compare fold opaque ptr compatible" This reverts commit `5cb20ef8a2`. Assertion failures with this patch were reported on https://reviews.llvm.org/rG5cb20ef8a235, revert for now.	2021-06-26 00:32:59 +02:00
Eli Friedman	8d5bf0709d	[NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion The implementation is identical, but it makes the semantics a bit more obvious.	2021-06-25 14:43:13 -07:00
Juneyoung Lee	1605593440	[SimplifyLibCalls] Fix memchr opt to use CreateLogicalAnd This fixes a bug at LibCallSimplifier::optimizeMemChr which does the following transformation: ``` // memchr("\r\n", C, 2) != nullptr -> (1 << C & ((1 << '\r') \| (1 << '\n'))) // != 0 // after bounds check. ``` As written above, a bounds check on C (whether it is less than integer bitwidth) is done before doing `1 << C` otherwise 1 << C will overflow. If the bounds check is false, the result of (1 << C & ...) must not be used at all, otherwise the result of shift (which is poison) will contaminate the whole results. A correct way to encode this is `select i1 (bounds check), (1 << C & ...), false` because select does not allow the unused operand to contaminate the result. However, this optimization was introducing `and (bounds check), (1 << C & ...)` which cannot do that. The bug was found from compilation of this C++ code: https://reviews.llvm.org/rG2fd3037ac615#1007197 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104901	2021-06-26 05:59:35 +09:00
Joseph Huber	5ccb7424fa	[OpenMP] Change OpenMPOpt to check openmp metadata The metadata added in D102361 introduces a module flag that we can check to determine if the module was compiled with `-fopenmp` enables. We can now check for the precense of this instead of scanning the call graph for OpenMP runtime functions. Depends on D102361 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102423	2021-06-25 16:34:22 -04:00
Hongtao Yu	3638085ff0	[Coroutines] Define __coro_frame_ty in function scope Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope. We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D104937	2021-06-25 12:33:20 -07:00
Florian Hahn	cc5ee857f9	[LV] Doxygenize VectorizationFactor member comments (NFC). Minor cleanup for follow-up patch.	2021-06-25 18:35:00 +01:00
Philip Reames	2cd23eb243	[instcombine] Fold overflow check using umulo to comparison If we have a umul.with.overflow where the multiply result is not used and one of the operands is a constant, we can perform the overflow check cheaper with a comparison then by performing the multiply and extracting the overflow flag. (Noticed when looking at the conditions SCEV emits for overflow checks.) Differential Revision: https://reviews.llvm.org/D104665	2021-06-25 10:25:45 -07:00
Florian Hahn	91053e327c	[LV] Reflow comment for VectorizationCostTy (NFC).	2021-06-25 14:20:06 +01:00
Arthur Eubanks	1aa02b37e7	Revert "[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions." This reverts commit `1eda5453f2`. Causes https://crbug.com/1223647: Incompatible argument and return types for 'returned' attribute tail call void @llvm.memset.p0i8.i64(i8* noalias noundef returned writeonly align 1 dereferenceable(255) %arraydecay, i8 0, i64 255, i1 false), !dbg !985	2021-06-24 19:24:34 -07:00
Nikita Popov	5cb20ef8a2	[InstCombine] Make indexed compare fold opaque ptr compatible Rather than relying on pointer type equality (which, for a change, is silently incorrect with opaque pointers) check that the GEP source element types match.	2021-06-24 22:33:01 +02:00
Arthur Eubanks	7110510eca	[WPD] Don't optimize calls more than once WPD currently assumes that there is a one to one correspondence between type test assume sequences and virtual calls. However, with -fstrict-vtable-pointers this may not be true. This ends up causing crashes when we try to optimize a virtual call more than once ( applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()). applySingleImplDevirt() actually didn't previous crash because it would replace the devirtualized call with the same direct call. Adding an assert that the call is indirect causes the corresponding test to crash with the rest of the patch. This makes Chrome successfully build with -fstrict-vtable-pointers + WPD. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104798	2021-06-24 13:28:09 -07:00
Nikita Popov	8e0ff44bf8	[InstCombine] Make varargs cast transform compatible with opaque ptrs The whole transform can be dropped once we have fully transitioned to opaque pointers (as it's purpose is to remove no-op pointer casts). For now, make sure that it handles opaque pointers correctly.	2021-06-24 21:57:05 +02:00
Jonas Paulsson	1eda5453f2	[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions. - When emitting libcalls, do not only pass the calling convention from the function prototype but also the attributes. - Do not pass attributes from e.g. libc memcpy to llvm.memcpy. Review: Reid Kleckner, Eli Friedman, Arthur Eubanks Differential Revision: https://reviews.llvm.org/D103992	2021-06-24 14:47:24 -05:00
Roman Lebedev	d064182612	[SimplifyCFG] Tail-merging all blocks with `resume` terminator Similar to what we already do for `ret` terminators. As noted by @rnk, clang seems to already generate a single `ret`/`resume`, so this isn't likely to cause widespread changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104849	2021-06-24 21:25:06 +03:00
Florian Hahn	833bdbe93c	[LV] Support sinking recipe in replicate region after another region. This patch handles sinking a replicate region after another replicate region. In that case, we can connect the sink region after the target region. This properly handles the case for which an assertion has been added in `337d765282`. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D103514	2021-06-24 13:58:42 +01:00
Stephen Tozer	adace79652	[DebugInfo] Enable variadic debug value salvaging This patch enables the salvaging of debug values that may be calculated from more than one SSA value, such as with binary operators that do not use a constant argument. The actual functionality for this behaviour is added in a previous commit (`c7270567`), but with the ability to actually emit the resulting debug values switched off. The reason for this is that the prior patch has been reverted several times due to issues discovered downstream, some time after the actual landing of the patch. The patch in question is rather large and touches several widely used header files, and all issues discovered are more related to the handling of variadic debug values as a whole rather than the details of the patch itself. Therefore, to minimize the build time impact and risk of conflicts involved in any potential future revert/reapply of that patch, this significantly smaller patch (that touches no header files) will instead be used as the capstone to enable variadic debug value salvaging. The review linked to this patch is mostly implemented by the previous commit, `c7270567`, but also contains the changes in this patch. Differential Revision: https://reviews.llvm.org/D91722	2021-06-24 13:16:29 +01:00
Roman Lebedev	9c4c2f2472	[SimplifyCFG] Tail-merging all blocks with `ret` terminator Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104597	2021-06-24 13:15:39 +03:00
Stephen Tozer	c72705678c	Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This is a partial reapply of the original commit and the followup commit that were previously reverted; this reapply also includes a small fix for a potential source of non-determinism, but also has a small change to turn off variadic debug value salvaging, to ensure that any future revert/reapply steps to disable and renable this feature do not risk causing conflicts. Differential Revision: https://reviews.llvm.org/D91722 This reverts commit `386b66b2fc`.	2021-06-24 09:46:38 +01:00
Zequan Wu	9393894331	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This casues compiler crash: Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' This reverts commit `e3d24b45b8`.	2021-06-23 19:24:56 -07:00
Evgenii Stepanov	78f7e6d8d7	[hwasan] Respect llvm.asan.globals. This enable no_sanitize C++ attribute to exclude globals from hwasan testing, and automatically excludes other sanitizers' globals (such as ubsan location descriptors). Differential Revision: https://reviews.llvm.org/D104825	2021-06-23 18:37:00 -07:00
Nikita Popov	8321335fd8	[InstCombine] Use getFunctionType() Avoid fetching pointer element type...	2021-06-23 20:28:34 +02:00
Sami Tolvanen	e3d24b45b8	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. This relands commit `4474958d3a` with a fix to a use-of-uninitialized-value error that tripped MemorySanitizer. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-23 10:56:13 -07:00
Kuter Dinel	5d44d56f7d	[Attributor] Derive AAFunctionReachability attribute. This attribute uses Attributor's internal 'optimistic' call graph information to answer queries about function call reachability. Functions can become reachable over time as new call edges are discovered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104599	2021-06-23 20:43:10 +03:00
Nikita Popov	00d3f7cc3c	[LAA] Make getPointersDiff() API compatible with opaque pointers Make getPointersDiff() and sortPtrAccesses() compatible with opaque pointers by explicitly passing in the element type instead of determining it from the pointer element type. The SLPVectorizer result is slightly non-optimal in that unnecessary pointer bitcasts are added. Differential Revision: https://reviews.llvm.org/D104784	2021-06-23 18:44:34 +02:00
Datta Nagraj	ad0085d338	[InstCombine] Eliminate casts to optimize ctlz operation If a ctlz operation is performed on higher datatype and then downcasted, then this can be optimized by doing a ctlz operation on a lower datatype and adding the difference bitsize to the result of ctlz to provide the same output: https://alive2.llvm.org/ce/z/8uup9M The original problem is shown in https://llvm.org/PR50173 Differential Revision: https://reviews.llvm.org/D103788	2021-06-23 11:19:12 -04:00
Sanjay Patel	1e9b6b89a7	[InstCombine] convert FP min/max with negated op to fabs This is part of improving floating-point patterns seen in: https://llvm.org/PR39480 We don't require any FMF because the 2 potential corner cases (-0.0 and NaN) are correctly handled without FMF: 1. -0.0 is treated as strictly less than +0.0 with maximum/minimum, so fabs/fneg work as expected. 2. +/- 0.0 with maxnum/minnum is indeterminate, so transforming to fabs/fneg is more defined. 3. The sign of a NaN may be altered by this transform, but that is allowed in the default FP environment. If there are FMF, they are propagated from the min/max call to one or both new operands which seems to agree with Alive2: https://alive2.llvm.org/ce/z/bem_xC	2021-06-23 10:41:39 -04:00
Roman Lebedev	ff4b1d379f	[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future. This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it. Other restrictions of the transform remain. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104598	2021-06-23 14:33:18 +03:00
Joe Ellis	3c4dbf6ea9	[Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics With regards to overrunning, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. (llvm.experimental.vector.extract) Elements ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. For the non-mixed cases (e.g. inserting/extracting a scalable into/from another scalable, or inserting/extracting a fixed into/from another fixed), it is possible to statically check whether or not the above conditions are met. This was previously missing from the verifier, and if the conditions were found to be false, the result of the insertion/extraction would be replaced with an undef. With regards to invalid indices, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) ``idx`` represents the starting element number at which ``subvec`` will be inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum vector length. (llvm.experimental.vector.extract) The ``idx`` specifies the starting element number within ``vec`` from which a subvector is extracted. ``idx`` must be a constant multiple of the known-minimum vector length of the result type. Similarly, these conditions were not previously enforced in the verifier. In some circumstances, invalid indices were permitted silently, and in other circumstances, an undef was spawned where a verifier error would have been preferred. This commit adds verifier checks to enforce the constraints above. Differential Revision: https://reviews.llvm.org/D104468	2021-06-23 10:33:22 +00:00
Max Kazantsev	842b4c83cb	[LoopDeletion] Exploit undef Phi inputs when symbolically executing 1st iteration Follow-up on Roman's idea expressed in D103959. - If a Phi has undefined inputs from live blocks: - and no other inputs, assume it is undef itself; - and exactly one non-undef input, we can assume that all undefs are equal to this input. Differential Revision: https://reviews.llvm.org/D104618 Reviewed By: lebedev.ri, nikic	2021-06-23 11:53:48 +07:00
Max Kazantsev	b7d2c173eb	[LSR] Filter out zero factors. PR50765 Zero factor leads to division by zero and failure of corresponding assert as shown in PR50765. We should filter out such factors. Differential Revision: https://reviews.llvm.org/D104702 Reviewed By: huihuiz, reames	2021-06-23 10:43:06 +07:00
Jon Roelofs	493d6928fe	[Remarks] Make memsize remarks report as an analysis, not a missed opportunity. Differential revision: https://reviews.llvm.org/D104078	2021-06-22 18:22:47 -07:00
Liqiang Tao	a0d96fdd3a	[llvm][Inliner] Make PriorityInlineOrder lazily updated This patch makes PriorityInlineOrder lazily updated. The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D104654	2021-06-23 08:59:53 +08:00
Joseph Huber	1cfdcae653	[Attributor] Fix AAExecutionDomain returning true on invalid states This patch fixes a problem with the AAExecutionDomain attributor not checking if it is in a valid state. This can cause it to incorrectly return that a block is executed in a single threaded context after the attributor failed for any reason. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103186	2021-06-22 18:12:43 -04:00
Joseph Huber	44feacc736	[OpenMP] Change remaining globalization from an analysis remark to missed After landing the globalization optimizations, the precense of globalization on the device that was not put in shared or stack memory is a failed optimization with performance consequences so it should indicate a missed remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104735	2021-06-22 16:52:06 -04:00
Nikita Popov	7bb7fa12e7	[OpaquePtr] Support changing load type in InstCombine When the load type is changed to ptr, we need the load pointer type to also be ptr, because it's not allowed to create a pointer to an opaque pointer. This is achieved by adjusting the getPointerTo() API to return an opaque pointer for an opaque pointer base type. Differential Revision: https://reviews.llvm.org/D104718	2021-06-22 21:16:15 +02:00
Sami Tolvanen	33c9438f11	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This reverts commit `4474958d3a`. Breaks check-llvm on Mac.	2021-06-22 12:10:58 -07:00
Joseph Huber	ca1560da72	[OpenMP][NFC] Add new optimizations to OpenMPOpt comment header Summary: Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.	2021-06-22 14:40:31 -04:00
Joseph Huber	b54ccab509	[Attributor] Add an option to increase the max number of iterations Right now the Attributor defaults to 32 fixed point iterations unless it is set explicitly by a command line flag. This patch allows this to be configured when the attributor instance is created. The maximum is then increased in OpenMPOpt if the target is a kernel. This is because the globalization analysis can result in larger iteration counts due to many dependent instances running at once. Depends on D102444 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104416	2021-06-22 14:38:25 -04:00
Sanjay Patel	b1f6ef92ec	[InstCombine] reduce code duplication for FP min/max with casts fold; NFC	2021-06-22 14:15:04 -04:00
Joseph Huber	30e36c9b3c	[Attributor] Add interface to emit remarks in Attributor Summary: This patch adds support for the Attributor to emit remarks on behalf of some other pass. The attributor can now optionally take a callback function that returns an OptimizationRemarkEmitter object when given a Function pointer. If this is availible then a remark will be emitted for the corresponding pass name. Depends on D102197 Reviewed By: sstefan1 thegameg Differential Revision: https://reviews.llvm.org/D102444	2021-06-22 14:12:46 -04:00
Joseph Huber	7d69da71dd	[OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls Summary: The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087. Depends on D97818 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102197	2021-06-22 13:23:05 -04:00
Sami Tolvanen	4474958d3a	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-22 10:01:55 -07:00
Joseph Huber	03d7e61c87	[OpenMP] Internalize functions in OpenMPOpt to improve IPO passes Summary: Currently the attributor needs to give up if a function has external linkage. This means that the optimization introduced in D97818 will only apply to static functions. This change uses the Attributor to internalize OpenMP device routines by making a copy of each function with private linkage and replacing the uses in the module with it. This allows for the optimization to be applied to any regular function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102824	2021-06-22 12:38:10 -04:00
Joseph Huber	6fc51c9f7d	[OpenMP] Replace GPU globalization calls with shared memory in the middle-end Summary: The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97818	2021-06-22 11:55:44 -04:00
Nikita Popov	e790d3667e	[OpaquePtr] Handle addrspacecasts in InstCombine This adds support for addrspace casts involving opaque pointers to InstCombine, as well as the isEliminableCastPair() helper (otherwise the assertion failure would just move there). Add PointerType::hasSameElementTypeAs() to hide the element type details. Differential Revision: https://reviews.llvm.org/D104668	2021-06-22 17:45:30 +02:00
Jingu Kang	873ff5a728	[SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch There was a bug from cost calculation for partially invariant unswitch. The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should not be counted. Differential Revision: https://reviews.llvm.org/D103816	2021-06-22 16:18:00 +01:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Max Kazantsev	4c4f1ae93e	Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks" Patch was reverted due to a bug that existed before it and was exposed by it. Returning after the underlying bug has been fixed. Differential Revision: https://reviews.llvm.org/D103959	2021-06-22 12:28:46 +07:00
Max Kazantsev	575253887b	[LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically Two predecessors break the further logic, and the loop may come to the opt in non-canonicalized state.	2021-06-22 12:20:55 +07:00
Nikita Popov	39796e1ad0	Reapply [InstCombine] Don't try converting opaque pointer bitcast to GEP Reapplied without changes -- this was reverted together with an underlying patch. ----- Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 22:15:56 +02:00
Nikita Popov	e2c2124a4b	Reapply [InstCombine] Extract bitcast -> gep transform Relative to the original patch, an InstCombine test has been added to show a previously missed pattern, and the Coroutine test that resulted in the revert has been regenerated. ----- Move this into a separate function, to make sure that early returns do not accidentally skip other transforms. This previously happened for the isSized() check, which skipped folds like distributing a bitcast over a select.	2021-06-21 22:03:15 +02:00
Nikita Popov	6922ab73a5	Revert "[InstCombine] Extract bitcast -> gep transform" This reverts commit `d9f5d7b959`. This reverts commit `5780611d7e`. This causes a failure in Coroutine tests.	2021-06-21 21:34:17 +02:00
Nikita Popov	862313cf59	[LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI) As these are no longer passed to UnrollLoop(), there is no need to modify them in computeUnrollCount(). Make them non-reference parameters. Differential Revision: https://reviews.llvm.org/D104590	2021-06-21 21:34:17 +02:00
Alexey Bataev	908b753661	[SLP]Improve vectorization of PHI instructions. Perform better analysis when trying to vectorize PHIs. 1. Do not try to vectorize vector PHIs. 2. Do deeper analysis for more profitable nodes for the vectorization. Before we just tried to vectorize the PHIs of the same type. Patch improves this and tries to vectorize PHIs with incoming values which come from the same basic block, have the same and/or alternative opcodes. It allows to save the compile time and provides better vectorization results in general. Part of D57059. Differential Revision: https://reviews.llvm.org/D103638	2021-06-21 12:26:24 -07:00
Nikita Popov	5780611d7e	[InstCombine] Don't try converting opaque pointer bitcast to GEP Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 21:24:50 +02:00
Nikita Popov	d9f5d7b959	[InstCombine] Extract bitcast -> gep transform Move this into a separate function, to make sure that early returns do not accidentally skip other transforms. There is already one isSized() check that could run into this issue, thus this change is not strictly NFC.	2021-06-21 21:24:50 +02:00
Nikita Popov	a969bdc56f	[InstCombine] Remove unnecessary addres space check (NFC) It's not possible to bitcast between different address spaces, and this is ensured by the IR verifier. As such, this bitcast to addrspacecast canonicalization can never be hit.	2021-06-21 20:11:39 +02:00
Nathan Chancellor	f52666985d	Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks" This reverts commit `bb1dc876eb`. This patch causes an assertion failure when building an arm64 defconfig Linux kernel. See https://reviews.llvm.org/D103959 for a link to the original bug report and a reduced reproducer.	2021-06-21 10:18:55 -07:00
Sanjay Patel	198b79caae	[InstCombine] move bitmanipulation-of-select folds This is no outwardly-visible-difference-intended, but it is obviously better to have all transforms for an intrinsic housed together since we already have helper functions in place. It is also potentially more efficient to zap a simple pattern match before trying to do expensive computeKnownBits() calls.	2021-06-21 11:32:16 -04:00
Sanjay Patel	64b2676ca8	[InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms Building on: `4c44b02d87` ...and adding handling for the extra operand in these intrinsics. This pattern is discussed in: https://llvm.org/PR50140	2021-06-21 11:04:12 -04:00
Nikita Popov	80e0424b2c	[Mem2Reg] Use poison for unreachable cases Use poison instead of undef for cases dealing with unreachable code. This still leaves the more interesting case of "load from uninitialized memory" as undef.	2021-06-21 10:54:13 +02:00
Juneyoung Lee	c038845f58	[InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified. Resolves llvm.org/pr48975. Reviewed By: nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D96663	2021-06-21 17:39:05 +09:00
Sjoerd Meijer	342bbb7832	[FuncSpec] Don't specialise functions with NoDuplicate instructions. getSpecializationCost was returning INT_MAX for a case when specialisation shouldn't happen, but this wasn't properly checked if specialisation was forced. Differential Revision: https://reviews.llvm.org/D104461	2021-06-21 09:02:11 +01:00
Max Kazantsev	bb1dc876eb	[LoopDeletion] Handle Phis with similar inputs from different blocks This patch lifts the requirement to have the only incoming live block for Phis. There can be multiple live blocks if the same value comes to phi from all of them. Differential Revision: https://reviews.llvm.org/D103959 Reviewed By: nikic, lebedev.ri	2021-06-21 11:37:06 +07:00
Juneyoung Lee	ce192ced2b	[InstCombine] Use poison constant to represent the result of unreachable instrs This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction. This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll . Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104602	2021-06-21 09:58:44 +09:00
Nikita Popov	1ae266f452	[LoopUnroll] Use smallest exact trip count from any exit This is a more general alternative/extension to D102635. Rather than handling the special case of "header exit with non-exiting latch", this unrolls against the smallest exact trip count from any exit. The latch exit is no longer treated as priviledged when it comes to full unrolling. The motivating case is in full-unroll-one-unpredictable-exit.ll. Here the header exit is an IV-based exit, while the latch exit is a data comparison. This kind of loop does not get rotated, because the latch is already exiting, and loop rotation doesn't try to distinguish IV-based/analyzable latches. Differential Revision: https://reviews.llvm.org/D102982	2021-06-20 20:58:26 +02:00
David Green	a24b02193a	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-06-20 17:03:30 +01:00
Sanjay Patel	4c44b02d87	[InstCombine] fold ctpop-of-select with 1 or more constant arms The general pattern is mentioned in: https://llvm.org/PR50140 ...but we need to do a bit more to handle intrinsics with extra operands like ctlz/cttz.	2021-06-20 11:28:45 -04:00
Sanjay Patel	240acb0cff	[InstCombine] avoid infinite loops with select folds of constant expressions This pair of transforms was added recently with: `8591640379` And could lead to conflicting folds: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399	2021-06-20 09:46:25 -04:00
Roman Lebedev	c5b7335dc8	[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken Same as with HoistThenElseCodeToIf() (`ad87761925`).	2021-06-20 12:37:14 +03:00
Roman Lebedev	ad87761925	[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken This problem is exposed by D104598, after it tail-merges `ret` in `@test_inline_constraint_S_label`, the verifier would start complaining `invalid operand for inline asm constraint 'S'`. Essentially, taking address of a block is mismodelled in IR. It should probably be an explicit instruction, a first one in block, that isn't identical to any other instruction of the same type, so that it can't be hoisted.	2021-06-20 12:18:15 +03:00
Nikita Popov	1bd4085e0b	[LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop() Currently, UnrollLoop() is passed an AllowRuntime flag and decides itself whether runtime unrolling should be used or not. This patch pushes the decision into the caller and allows us to eliminate the ULO.TripCount and ULO.TripMultiple parameters. Differential Revision: https://reviews.llvm.org/D104487	2021-06-19 09:25:57 +02:00
Liqiang Tao	671a87104b	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-19 10:17:32 +08:00
Guozhi Wei	575ba6f425	[InstCombine] Don't transform code if DoTransform is false In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case. Differential Revision: https://reviews.llvm.org/D104567	2021-06-18 18:01:34 -07:00
Fangrui Song	3307240f05	[InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling On ELF, the D1003372 optimization can apply to more cases. There are two prerequisites for making `__profd_` private: * `__profc_` keeps `__profd_` live under compiler/linker GC * `__profd_` is not referenced by code The first is satisfied because all counters/data are in a section group (either `comdat any` or `comdat noduplicates`). The second requires that the function does not use value profiling. Regarding the second point: `__profd_` may be referenced by other text sections due to inlining. There will be a linker error if a prevailing text section references the non-prevailing local symbol. With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) clang is 4.2% smaller (1-169620032/177066968). `stat -c %s */.o \| awk '{s+=$1}END{print s}' is 2.5% smaller. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-18 17:01:17 -07:00
Hongtao Yu	bd52495518	[CSSPGO] Undoing the concept of dangling pseudo probe As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen. I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104477	2021-06-18 15:14:11 -07:00
Nikita Popov	3308205ae9	[LoopUnroll] Simplify optimization remarks Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and debug code. For debug code, print information about all exits. For optimization remarks, only include the unroll count and the type of unroll (complete, partial or runtime), but omit detailed information about exit folding, now that more than one exit may be folded. Differential Revision: https://reviews.llvm.org/D104482	2021-06-18 23:47:03 +02:00
Nick Desaulniers	bef2992861	[GCOVProfiling] don't profile Fn's w/ noprofile attribute Similar to D104475, the Linux kernel would like to avoid compiler generated code in certain functions. The no_profile function attribute can be used in C to generate the the noprofile fn attr in IR. Respect that from GCOVProfiling. Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/ Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D104257	2021-06-18 13:58:34 -07:00
Andrew Browne	14407332de	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-18 11:21:46 -07:00
Liqiang Tao	93183a41b9	Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder"	2021-06-18 18:52:00 +08:00
Max Kazantsev	de92287cf8	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3) This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. This implementation uses InstSimplify. SCEV version was rejected due to high compile time impact. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: nikic	2021-06-18 17:31:57 +07:00
Haojian Wu	3f5d53a525	[Attributor] Fix UB behavior on uninitalized bool variables. Found by ASAN.	2021-06-18 11:49:42 +02:00
Daniil Seredkin	6643e51d79	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 16:28:06 +07:00
Liqiang Tao	a740b707d1	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-18 16:55:38 +08:00
Max Kazantsev	fa5eb22ad4	[NFC] Assert non-zero factor before division This is to ensure that zero denominator leads to controlled assertion failure rather than UB.	2021-06-18 15:50:50 +07:00
Haojian Wu	7670938bba	[Attributor] Don't print the call-graph in a hard-coded file. This looks like not a practical pattern in our codebase (it could fail in some sandbox environement). Instead we print it via standard output, and it is controled by the -attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.	2021-06-18 09:38:07 +02:00
Daniil Seredkin	6de741de08	Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)" This reverts commit `31053338c9`.	2021-06-18 14:21:02 +07:00
Daniil Seredkin	31053338c9	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 14:12:00 +07:00
Fangrui Song	5798be8458	Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF" This reverts commit `76d0747e08`. If a group has `__llvm_prf_vals` due to static value profiler counters (`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text section may reference `__llvm_prf_data` and will cause a `relocation refers to a discarded section` linker error. Note: while a `__profc_` group is non-prevailing, it may be referenced by a prevailing text section due to inlining. ``` group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections: [Index] Name [ 67] __llvm_prf_cnts [ 68] __llvm_prf_vals [ 69] __llvm_prf_data [ 70] .rela__llvm_prf_data ```	2021-06-17 23:38:17 -07:00
Johannes Doerfert	30c9d68ad9	[Attributor][FIX] Arguments of unknown functions can be undef This should fix PR50683. The wrong assumption was that we could always know what the callee is when we replace a call site argument with undef. We wanted to know that to remove the `noundef` that might be attached to the argument. Since no callee means we did the propagation on the caller site, there is no need to remove an attribute. It is only needed if we replace all uses and therefore pass `undef` instead of the value that was passed in otherwise.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	666dc6f126	[Attributor] Use a centralized value simplification interface To allow outside AAs that simplify values we need to ensure all value simplification goes through the Attributor, not AAValueSimplify (or any of the other AAs we have already like AAPotentialValues). This patch also introduces an interface for the outside AAs to register simplification callbacks for an IRPosition. To make this work as expected we have to pass IRPositions instead of Values in AAValueSimplify, which makes sense by itself.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	d9194b6efb	[Attributor] Introduce a helper do deal with constant type mismatches If we simplify values we sometimes end up with type mismatches. If the value is a constant we can often cast it though to still allow propagation. The logic is now put into a helper and it replaces some ad hoc things we did before. This also introduces the AA namespace for abstract attribute related functions and types. Differential Revision: https://reviews.llvm.org/D103856	2021-06-18 01:07:52 -05:00
Johannes Doerfert	9959eee001	[Attributor] Make sure Heap2Stack works properly on a GPU target If the target stack is not accessible between different running "threads" we have to make sure not to create allocas for mallocs that might be used by multiple "threads". The "use check" is sufficient to prevent this but if we apply the "free check" we have to make sure the pointer is not communicated to others before the free is reached. Differential Revision: https://reviews.llvm.org/D98608	2021-06-18 01:07:52 -05:00
Johannes Doerfert	9a23e673ca	[OpenMP][NFC] Expose AAExecutionDomain and rename its getter The initial use for AAExecutionDomain was to determine if a single thread executes a block. While this is sometimes informative most of the time, and for other reasons, we actually want to know if it is the "initial thread". Thus, the thread that started execution on the current device. The deduction needs to be adjusted in a follow up as the methods we use right not are looking for the OpenMP thread id which is resets whenever a thread enters a parallel region. What we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and equivalent functions.	2021-06-18 01:07:52 -05:00
Johannes Doerfert	8d7bace3b5	[Attributor][NFC] AAReachability is currently stateless, don't invalidate it We invalidated AAReachabilityImpl directly which is not helpful and confusing as we still used it regardless. We now avoid invalidating it (not needed anyway) and add checks for the state. This has by itself no actual effect but prepares for later extensions.	2021-06-18 01:07:51 -05:00
George Balatsouras	c6b5a25eeb	[dfsan] Replace dfs$ prefix with .dfsan suffix The current naming scheme adds the `dfs$` prefix to all DFSan-instrumented functions. This breaks mangling and prevents stack trace printers and other tools from automatically demangling function names. This new naming scheme is mangling-compatible, with the `.dfsan` suffix being a vendor-specific suffix: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure With this fix, demangling utils would work out-of-the-box. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104494	2021-06-17 22:42:47 -07:00
Xun Li	3522167efd	[Coroutine] Properly deal with byval and noalias parameters This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857. Previous attempts can be found in D104007 and D101980. A lot of discussions can be found in those two patches. To summarize the bug: When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly. However, in some cases we find that arguments are still directly used: When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free. This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly. For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass. For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer. Differential Revision: https://reviews.llvm.org/D104184	2021-06-17 19:06:10 -07:00
Roman Lebedev	84eeb82888	[NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining unswitching failure It's not prohibitively verbose, and allows easier understanding why certain unswitching ultimately wasn't performed.	2021-06-18 00:46:04 +03:00
Kuter Dinel	eaf1b6810c	[Attributor] Derive AACallEdges attribute This attribute computes the optimistic live call edges using the attributor liveness information. This attribute will be used for deriving a inter-procedural function reachability attribute. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104059	2021-06-18 03:29:22 +03:00
Andrew Browne	39295e92f7	Revert "[DFSan] Cleanup code for platforms other than Linux x86_64." This reverts commit `8441b993bd`. Buildbot failures.	2021-06-17 14:19:18 -07:00
Fangrui Song	76d0747e08	[InstrProfiling] Make __profd_ unconditionally private for ELF For ELF, since all counters/data are in a section group (either `comdat any` or `comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the D1003372 optimization prerequisite (linker GC cannot discard data variables while the text section is retained) is always satisified, we can make __profd_ unconditionally private. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-17 14:16:54 -07:00
Craig Topper	99e95856fb	[PartiallyInlineLibCalls] Disable sqrt expansion for strictfp. This pass emits a floating point compare and a conditional branch, but if strictfp is enabled we don't emit a constrained compare intrinsic. The backend also won't expand the readonly sqrt call this pass inserts to a sqrt instruction under strictfp. So we end up with 2 libcalls as seen here. https://godbolt.org/z/oax5zMEWd Fix these things by disabling the pass. Differential Revision: https://reviews.llvm.org/D104479	2021-06-17 14:15:12 -07:00
Andrew Browne	8441b993bd	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-17 14:08:40 -07:00
Nikita Popov	f7c54c4603	[LoopUnroll] Fold all exits based on known trip count/multiple Fold all exits based on known trip count/multiple information from SCEV. Previously only the latch exit or the single exit were folded. This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple entirely: They're still used to a) decide whether runtime unrolling should be performed and b) for ORE remarks. However, the core unrolling logic is independent of them now. Differential Revision: https://reviews.llvm.org/D104203	2021-06-17 20:58:34 +02:00
Roman Lebedev	37dfc467ac	[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg This really isn't talking about vectors in general, but only about either fixed or scalable vectors, and it's pretty confusing to see it state that there aren't any vectors :)	2021-06-17 21:07:34 +03:00
hyeongyukim	69b0ed9a0a	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-06-17 19:46:17 +09:00
Sjoerd Meijer	dcd23d875a	[FuncSpec] Don't specialise functions with attribute NoDuplicate. Differential Revision: https://reviews.llvm.org/D104378	2021-06-17 10:32:29 +01:00
Florian Hahn	80a403348b	[VPlan] Support PHIs as LastInst when inserting scalars in ::get(). At the moment, we create insertelement instructions directly after LastInst when inserting scalar values in a vector in VPTransformState::get. This results in invalid IR when LastInst is a phi, followed by another phi. In that case, the new instructions should be inserted just after the last PHI node in the block. At the moment, I don't think the problematic case can be triggered, but it can happen once predicate regions are merged and multiple VPredInstPHI recipes are in the same block (D100260). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104188	2021-06-17 09:36:44 +01:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Sjoerd Meijer	49ab3b1735	[FuncSpec] Statistics Adds some bookkeeping for collecting the number of specialised functions and a test for that. Differential Revision: https://reviews.llvm.org/D104102	2021-06-16 09:11:51 +01:00
Evgeniy Brevnov	96cded5b79	[SLP] Incorrect handling of external scalar values Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D103954	2021-06-16 13:27:36 +07:00
Andrew Browne	e652d99169	[DFSan][NFC] Fix shadowing variable name.	2021-06-15 22:58:22 -07:00
Rong Xu	82a0bb1afc	[SampleFDO] Place the discriminator flag variable into the used list. We create flag variable "__llvm_fs_discriminator__" in the binary to indicate that FSAFDO hierarchical discriminators are used. This variable might be GC'ed by the linker since it is not explicitly reference. I initially added the var to the use list in pass MIRFSDiscriminator but it did not work. It turned out the used global list is collected in lowering (before MIR pass) and then emitted in the end of pass pipeline. Here I add the variable to the use list in IR level's AddDiscriminators pass. The machine level code is still keep in the case IR's AddDiscriminators is not invoked. If this is the case, this just use -Wl,--export-dynamic-symbol=__llvm_fs_discriminator__ to force the emit. Differential Revision: https://reviews.llvm.org/D103988	2021-06-15 21:51:04 -07:00
Chuanqi Xu	86906304d8	[FuncSpec] Use std::pow instead of operator^ The original implementation calculating UserBonus uses operator ^, which means XOR in C++ language. At the first glance of reviewing, I thought it should be power, my bad. It doesn't make sense to use XOR here. So I believe it should be a carelessness as I made. Test Plan: check-all Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104282	2021-06-16 10:13:21 +08:00
Andrew Browne	af93157625	[DFSan] Handle landingpad inst explicitly as zero shadow. Before this change, DFSan was relying fallback cases when getting origin address. Differential Revision: https://reviews.llvm.org/D104266	2021-06-15 18:28:20 -07:00
Vitaly Buka	6478ef61b1	[asan] Remove Asan, Ubsan support of RTEMS and Myriad Differential Revision: https://reviews.llvm.org/D104279	2021-06-15 12:59:05 -07:00
Roman Lebedev	e52364532a	[NewPM] Remove SpeculateAroundPHIs pass Addition of this pass has been botched. There is no particular reason why it had to be sold as an inseparable part of new-pm transition. It was added when old-pm was still the default, and very very few users were actually tracking new-pm, so it's effects weren't measured. Which means, some of the turnoil of the new-pm transition are actually likely regressions due to this pass. Likewise, there has been a number of post-commit feedback (post new-pm switch), namely * https://reviews.llvm.org/D37467#2787157 (regresses HW-loops) * https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before) * https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata) and in the half year past, the pass authors (google) still haven't found time to respond to any of that. Hereby it is proposed to backout the pass from the pipeline, until someone who cares about it can address the issues reported, and properly start the process of adding a new pass into the pipeline, with proper performance evaluation. Furthermore, neither google nor facebook reports any perf changes from this change, so i'm dropping the pass completely. It can always be re-reverted should/if anyone want to pick it up again. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D104099	2021-06-15 20:35:55 +03:00
Andrew Litteken	2c21278e74	[IROutliner] Adding DebugInfo handling for IR Outlined Functions This adds support for functions outlined by the IR Outliner to be recognized by the debugger. The expected behavior is that it will skip over the instructions included in that section. This is due to the fact that we can not say which of the original locations the instructions originated from. These functions will show up in the call stack, but you cannot step through them. Reviewers: paquette, vsk, djtodoro Differential Revision: https://reviews.llvm.org/D87302	2021-06-15 10:57:08 -05:00
Florian Hahn	f7fc8927c0	[LoopDeletion] Check for irreducible cycles when deleting loops. Loops with irreducible cycles may loop infinitely. Those cannot be removed, unless the loop/function is marked as mustprogress. Also discussed in D103382. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104238	2021-06-15 12:56:12 +01:00
Neil Henning	1540da3b78	ABI breaking changes fixes. This commit mostly just replaces bad uses of `NDEBUG` with uses of `LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI breaking changes (normally extra struct elements in headers). Differential Revision: https://reviews.llvm.org/D104216	2021-06-15 11:08:13 +01:00
Vitaly Buka	b8919fb0ea	[NFC][sanitizer] clang-format some code	2021-06-14 18:05:22 -07:00
Huihui Zhang	1c096bf09f	[SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE. Currently, Loop strengh reduce is not handling loops with scalable stride very well. Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance, (refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added). Memory accesses are incremented by "16vscale", while induction variable is incremented by "8vscale". The scaling factor "2" needs to be extracted to build candidate formula i.e., "reg(%in) + 2reg({0,+,(8 %vscale)}". So that addrec register reg({0,+,(8vscale)}) can be reused among Address and ICmpZero LSRUses to enable optimal solution selection. This patch allow LSR getExactSDiv to recognize special cases like "C1XY /s C2X*Y", and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR is missing candidate formula with proper scaled factor to leverage target scaled-index addressing mode. Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable vector. But allow simple valid scale to pass through. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103939	2021-06-14 16:42:34 -07:00
Matt Morehouse	b87894a1d2	[HWASan] Enable globals support for LAM. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104265	2021-06-14 14:20:44 -07:00
Sanjay Patel	8591640379	[InstCombine] add DeMorgan folds for logical ops in select form We canonicalized to these select patterns (poison-safe logic) with D101191, so we need to reduce 'not' ops when possible as we would with 'and'/'or' instructions. This is shown in a secondary example in: https://llvm.org/PR50389 https://alive2.llvm.org/ce/z/BvsESh	2021-06-14 12:54:35 -04:00
Florian Hahn	96ca03493a	[VectorCombine] Limit scalarization to non-poison indices for now. As Eli mentioned post-commit in D103378, the result of the freeze may still be out-of-range according to Alive2. So for now, just limit the transform to indices that are non-poison.	2021-06-14 16:40:14 +01:00
Jeroen Dobbelaere	bb8ce25e88	Intrinsic::getName: require a Module argument Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary. This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288 (as mentioned by @fhahn), after committing D91250. Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99173	2021-06-14 14:52:29 +02:00
Aditya Kumar	dcbbc69cc5	Calculate getTerminator only when necessary Differential Revision: https://reviews.llvm.org/D104202	2021-06-13 20:16:07 -07:00
Simon Pilgrim	9efe89d82f	BoundsChecking.cpp - tidy implicit header dependencies. NFCI. We don't use <vector> but we do use std::pair (<utility>)	2021-06-13 17:08:15 +01:00
Simon Pilgrim	56541d1377	GVN.cpp - remove unused <vector> include. NFCI.	2021-06-13 14:06:32 +01:00
Simon Pilgrim	c14fd171fe	LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI.	2021-06-13 14:06:32 +01:00
Sanjay Patel	afd44bb6f2	[InstCombine] fold ctlz/cttz of bool types https://alive2.llvm.org/ce/z/tX4pUT	2021-06-13 08:26:40 -04:00
Simon Pilgrim	2477b498f2	ArgumentPromotion.cpp - remove unused <string> include. NFCI.	2021-06-13 13:03:47 +01:00
Simon Pilgrim	b013c58e82	VPlanSLP.cpp - tidy implicit header dependencies. NFCI. We don't use std::string and std::vector, but we do use std::pair and std::max.	2021-06-13 12:37:17 +01:00
Xun Li	fae7debadc	[CHR] Don't run ControlHeightReduction if any BB has address taken This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610. In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable. CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list. However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code. To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken. Added a few test cases. Reviewed By: aeubanks, wenlei, hoy Differential Revision: https://reviews.llvm.org/D103867	2021-06-12 10:29:53 -07:00
Kevin Athey	1d22596b2f	[sanitizer] Remove numeric values from -asan-use-after-return flag. (NFC) for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104152	2021-06-11 15:14:51 -07:00
Kevin Athey	e0b469ffa1	[clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang. Also: - add driver test (fsanitize-use-after-return.c) - add basic IR test (asan-use-after-return.cpp) - (NFC) cleaned up logic for generating table of __asan_stack_malloc depending on flag. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104076	2021-06-11 12:07:35 -07:00
Valery N Dmitriev	94a07c79cf	[SLP][NFC] Fix condition that was supposed to save a bit of compile time. It was found by chance revealing discrepancy between comment (few lines above), the condition and how re-ordering of instruction is done inside the if statement it guards. The condition was always evaluated to true. Differential Revision: https://reviews.llvm.org/D104064	2021-06-11 10:08:55 -07:00
Adam Nemet	e0efebb8eb	[Matrix] In transpose opts, handle a^t * a^t Without the fix the testcase crashes because we remove the same instruction twice. Differential Revision: https://reviews.llvm.org/D104127	2021-06-11 09:29:43 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Matt Morehouse	0867edfc64	[HWASan] Add basic stack tagging support for LAM. Adds the basic instrumentation needed for stack tagging. Currently does not support stack short granules or TLS stack histories, since a different code path is followed for the callback instrumentation we use. We may simply wait to support these two features until we switch to a custom calling convention. Patch By: xiangzhangllvm, morehouse Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102901	2021-06-11 08:21:17 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Sjoerd Meijer	9907746f5d	Move Function Specialization to its correct location. NFC. As a follow up of rGc4a0969b9c14, and as part of D104102, move it to the IPO transformations directory.	2021-06-11 15:00:10 +01:00
Sanjay Patel	602ab24833	[SimplifyCFG] avoid crash on degenerate loop The problematic code pattern in the test is based on: https://llvm.org/PR50638 If the IfCond is itself the phi that we are trying to remove, then the loop around line 2835 can end up with something like: %cmp = select i1 %cmp, i1 false, i1 true That can then lead to a use-after-free and assert (although I'm still not seeing that locally in my release + asserts build). I think this can only happen with unreachable code. Differential Revision: https://reviews.llvm.org/D104063	2021-06-11 09:37:06 -04:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Roman Lebedev	20542b47d6	[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper This results in slightly more optimistic alignments in some cases	2021-06-11 12:47:10 +03:00
Roman Lebedev	abc0e0125c	[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function	2021-06-11 12:47:09 +03:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Sjoerd Meijer	c4a0969b9c	Function Specialization Pass This adds a function specialization pass to LLVM. Constant parameters like function pointers and constant globals are propagated to the callee by specializing the function. This is a first version with a number of limitations: - The pass is off by default, so needs to be enabled on the command line, - It does not handle specialization of recursive functions, - It does not yet handle constants and constant ranges, - Only 1 argument per function is specialised, - The cost-model could be further looked into, and perhaps related, - We are not yet caching analysis results. This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan. More recently this was also discussed on the list, see: https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html. The motivation for this work is that function specialisation often comes up as a reason for performance differences of generated code between LLVM and GCC, which has this enabled by default from optimisation level -O3 and up. And while this certainly helps a few cpu benchmark cases, this also triggers in real world codes and is thus a generally useful transformation to have in LLVM. Function specialisation has great potential to increase compile-times and code-size. The summary from some investigations with this patch is: - Compile-time increases for short compile jobs is high relatively, but the increase in absolute numbers still low. - For longer compile-jobs, the extra compile time is around 1%, and very much in line with GCC. - It is difficult to blame one thing for compile-time increases: it looks like everywhere a little bit more time is spent processing more functions and instructions. - But the function specialisation pass itself is not very expensive; it doesn't show up very high in the profile of the optimisation passes. The goal of this work is to reach parity with GCC which means that eventually we would like to get this enabled by default. But first we would like to address some of the limitations before that. Differential Revision: https://reviews.llvm.org/D93838	2021-06-11 09:11:29 +01:00
Qiu Chaofan	2670c7dd5b	[VectorCombine] Fix alignment in single element store This fixes the concern in single element store scalarization that the alignment of new store may be larger than it should be. It calculates the largest alignment if index is constant, and a safe one if not. Reviewed By: lebedev.ri, spatel Differential Revision: https://reviews.llvm.org/D103419	2021-06-11 10:28:15 +08:00
Slava Nikolaev	119965865c	LoadStoreVectorizer: support different operand orders in the add sequence match First we refactor the code which does no wrapping add sequences match: we need to allow different operand orders for the key add instructions involved in the match. Then we use the refactored code trying 4 variants of matching operands. Originally the code relied on the fact that the matching operands of the two last add instructions of memory index calculations had the same LHS argument. But which operand is the same in the two instructions is actually not essential, so now we allow that to be any of LHS or RHS of each of the two instructions. This increases the chances of vectorization to happen. Reviewed By: volkan Differential Revision: https://reviews.llvm.org/D103912	2021-06-10 16:31:35 -07:00
Andy Kaylor	41555eaf65	Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store. Added a LIT test to catch one case. Patch by Mark Mendell Differential Revision: https://reviews.llvm.org/D103254	2021-06-10 15:47:03 -07:00
Joachim Meyer	4f01122c3f	[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads. This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL. The question remains why this was initially added and what the implications of removing this optimization would be. Do we need an alternative mechanism to propagate the information about legality of if-conversion? Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general? I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous. Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103907	2021-06-10 23:37:57 +02:00
Philip Reames	7629b2a09c	[LI] Add a cover function for checking if a loop is mustprogress [nfc] Essentially, the cover function simply combines the loop level check and the function level scope into one call. This simplifies several callers and is (subjectively) less error prone.	2021-06-10 13:37:32 -07:00
Philip Reames	b6ee5f2b1d	Move code for checking loop metadata into Analysis [nfc] I need the mustprogress loop metadata in ScalarEvolution and it makes sense to keep all the accessors for quering loop metadate together.	2021-06-10 13:01:22 -07:00
Alexey Bataev	a893b44187	[SLP]Disable scheduling of insertelements. There is no need to schedule insertelement instructions. The compiler did not schedule them before it started support their vectorization and it should not do it after. We pre-schedule them manually when finding a build vector sequence. Disabling scheduling of insertelement instructions improves compile time and vectorization of the very large basic blocks by saving scheduling budget for other instructions. Differential Revision: https://reviews.llvm.org/D104026	2021-06-10 10:25:26 -07:00
Keith Smiley	026170d17d	Fix range-loop-analysis warning ``` llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis] for (const auto VF : VFCandidates) { ^ llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying for (const auto VF : VFCandidates) { ^~~~~~~~~~~~~~~ & 1 warning generated. ``` Differential Revision: https://reviews.llvm.org/D103970	2021-06-10 08:39:54 -07:00
Caroline Concatto	1ad52105eb	[InstCombine] Add fold for extracting known elements from a stepvector This patch allows folding stepvector + extract to the lane when the lane is lower than the minimum size of the scalable vector. This fold is possible because lane X of a stepvector is also X! For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3. Differential Revision: https://reviews.llvm.org/D103153	2021-06-10 13:36:57 +01:00
Simon Pilgrim	b01d393fc0	Fix MSVC int64_t -> uint64_t "narrowing conversion" warning.	2021-06-10 10:55:24 +01:00
Jon Roelofs	f8f1c9c389	Annotate memcpy's of globals with info about the src/dst Differential revision: https://reviews.llvm.org/D103994	2021-06-09 18:11:08 -07:00
Joseph Huber	4c9471581f	[Attributor] Set floating point loads and stores as nofree in AANoFreeFloating Summary: The current implementation of AANoFreeFloating will incorrectly list floating point loads and stores as may-free. This prevents other attributor instances like HeapToStack from pushing some allocations to the stack. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103975	2021-06-09 16:16:37 -04:00
Leonard Chan	314c049142	[compiler-rt][hwasan] Decouple use of the TLS global for getting the shadow base and using the frame record feature This allows for using the frame record feature (which uses __hwasan_tls) independently from however the user wants to access the shadow base, which prior was only usable if shadow wasn't accessed through the TLS variable or ifuncs. Frame recording can be explicitly set according to ShadowMapping::WithFrameRecord in ShadowMapping::init. Currently, it is only enabled on Fuchsia and if TLS is used, so this should mimic the old behavior. Added an extra case to prologue.ll that covers this new case. Differential Revision: https://reviews.llvm.org/D103841	2021-06-09 12:55:19 -07:00
LemonBoy	d3faef6eef	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 16:36:58 +02:00
Alexey Bataev	a0086add2e	[SLP]Improve gathering of scalar elements. 1. Better sorting of scalars to be gathered. Trying to insert constants/arguments/instructions-out-of-loop at first and only then the instructions which are inside the loop. It improves hoisting of invariant insertelements instructions. 2. Better detection of shuffle candidates in gathering function. 3. The cost of insertelement for constants is 0. Part of D57059. Differential Revision: https://reviews.llvm.org/D103458	2021-06-09 05:23:21 -07:00
Nico Weber	205cde63c7	Revert "[SROA] Avoid splitting loads/stores with irregular type" This reverts commit `905f4eb537`. Breaks check-llvm on most (all?) bots, see https://reviews.llvm.org/D99435	2021-06-09 06:32:58 -04:00
LemonBoy	905f4eb537	[SROA] Avoid splitting loads/stores with irregular type Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D99435	2021-06-09 11:48:20 +02:00
Jingu Kang	8eee02020b	[LoopBoundSplit] Ignore phi node which is not scevable There was a bug in LoopBoundSplit. The pass should ignore phi node which is not scevable. Differential Revision: https://reviews.llvm.org/D103913	2021-06-09 09:44:36 +01:00
Kevin Athey	af8c59e06d	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-08 14:39:06 -07:00
Sanjay Patel	d2012d965d	[InstCombine] fix nsz (fast-math) propagation from fneg-of-select As discussed in the post-commit comments for: `3cdd05e519` It seems to be safe to propagate all flags from the final fneg except for 'nsz' to the new select: https://alive2.llvm.org/ce/z/J_APDc nsz has unique FMF semantics: it is not poison, it is only "insignificant" in the calculation according to the LangRef.	2021-06-08 17:04:30 -04:00
David Green	297088d1ad	Revert "[DSE] Remove stores in the same loop iteration" Apparently non-dead stores are being removed, as noted in D100464. This reverts commit `222aeb4d51`.	2021-06-08 21:23:08 +01:00
Hans Wennborg	386b66b2fc	Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" > This reapplies `c0f3dfb9`, which was reverted following the discovery of > crashes on linux kernel and chromium builds - these issues have since > been fixed, allowing this patch to re-land. This reverts commit `36ec97f76a`. The change caused non-determinism in the compiler, see comments on the code review at https://reviews.llvm.org/D91722. Reverting to unbreak people's builds until that can be addressed. This also reverts the follow-up "[DebugInfo] Limit the number of values that may be referenced by a dbg.value" in `a0bd6105d8`.	2021-06-08 14:54:08 +02:00
maekawatoshiki	09e92c607c	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Also, a crash problem on legacy pass manager is fixed. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-06-08 20:30:02 +09:00
Simon Pilgrim	596004a947	MemCpyOptimizer.cpp - hasUndefContentsMSSA - Pass DataLayout by reference. NFCI.	2021-06-08 10:41:02 +01:00
Kerry McLaughlin	14eeccfe9a	[LoopVectorize] Don't use strict reductions when reordering is allowed If the `-enable-strict-reductions` flag is set to true, then currently we will always choose to vectorize the loop with strict in-order reductions. This is not necessary where we allow the reordering of FP operations, such as when loop hints are passed via metadata. This patch moves useOrderedReductions so that we can also check whether loop hints allow reordering, in which case we should use the default behaviour of vectorizing with unordered reductions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103814	2021-06-08 10:39:29 +01:00
George Balatsouras	5b4dda550e	[dfsan] Add full fast8 support Complete support for fast8: - amend shadow size and mapping in runtime - remove fast16 mode and -dfsan-fast-16-labels flag - remove legacy mode and make fast8 mode the default - remove dfsan-fast-8-labels flag - remove functions in dfsan interface only applicable to legacy - remove legacy-related instrumentation code and tests - update documentation. Reviewed By: stephan.yichao.zhao, browneee Differential Revision: https://reviews.llvm.org/D103745	2021-06-07 17:20:54 -07:00
Arthur Eubanks	47211fa889	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" Needs to be discussed more. This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46 This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1 This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23 This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd	2021-06-07 16:07:44 -07:00
Nikita Popov	8fdd7c2ff1	[LoopUnroll] Clamp unroll count to MaxTripCount Unrolling with more iterations than MaxTripCount is pointless, as those iterations can never be executed. As such, we clamp ULO.Count to MaxTripCount if it is known. This means we no longer need to consider iterations after MaxTripCount for exit folding, and the CompletelyUnroll flag becomes independent of ULO.TripCount. Differential Revision: https://reviews.llvm.org/D103748	2021-06-07 21:08:42 +02:00
Philip Reames	c880d5e583	[RS4GC] Treat inttoptr as base pointer This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message. inttoptr for a non-integral address space is currently ill defined in the LangRef. Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled. Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons. First, as a simple consistency argument. We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them. Having a lack of constant folding trigger a crash during lowering is non-ideal. Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code. At the same time, we can't assume that dynamically dead code is always pruned before lowering. As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths. We need the lowering to not crash. The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash. Differential Revision: https://reviews.llvm.org/D103492	2021-06-07 10:27:23 -07:00
Sanjay Patel	4675beaa21	[InstCombine] intersect nsz and ninf fast-math-flags (FMF) for fneg(fdiv) fold https://alive2.llvm.org/ce/z/3KPvih https://llvm.org/PR49654	2021-06-07 13:22:49 -04:00
Sanjay Patel	519e98cd9a	[InstCombine] refactor match clauses; NFC We need to adjust the FMF propagation on at least one of these transforms as discussed in: https://llvm.org/PR49654 ...so this should make it easier to intersect flags.	2021-06-07 13:22:49 -04:00
Florian Hahn	1465e7770b	[VPlan] Print successors of VPRegionBlocks. The non-DOT printing does not include the successors of VPregionBlocks. This patch use the same style for printing successors as for VPBasicBlock. I think the printing of successors could be a bit improved further, as at the moment it is hard to ensure a check line matches all successors. But that can be done as follow-up. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D103515	2021-06-07 17:57:21 +01:00
Fraser Cormack	ae3f6de3a8	[InstCombine] Support negation of scalable-vector splats This patch is an extension of D103421. It allows the InstCombiner to generate the negated form of integer scalable-vector splats. It can technically handle fixed-length vectors too but those are completely covered by the preceding logic. This enables extra combining opportunities for scalable vector types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103801	2021-06-07 15:14:00 +01:00
Daniil Seredkin	7736c1936a	[InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math If FP reassociation (fast-math) is allowed, then LLVM is free to do the following transformation pow(x, y) * pow(x, z) -> pow(x, y + z). This patch adds this transformation and tests for it. See more https://bugs.llvm.org/show_bug.cgi?id=47205 It handles two cases 1. When operands of fmul are different instructions %4 = call reassoc float @llvm.pow.f32(float %0, float %1) %5 = call reassoc float @llvm.pow.f32(float %0, float %2) %6 = fmul reassoc float %5, %4 --> %3 = fadd reassoc float %1, %2 %4 = call reassoc float @llvm.pow.f32(float %0, float %3) 2. When operands of fmul are the same instruction %4 = call reassoc float @llvm.pow.f32(float %0, float %1) %5 = fmul reassoc float %4, %4 --> %3 = fadd reassoc float %1, %1 %4 = call reassoc float @llvm.pow.f32(float %0, float %3) Differential Revision: https://reviews.llvm.org/D102574	2021-06-07 08:08:05 -04:00
Liqiang Tao	4a0de622c3	[llvm] Add interface to order inlining This patch abstract Calls in Inliner:run() to InlineOrder. With this patch, it's possible to customize the inlining order, e.g. use queue or priority queue. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D103315	2021-06-07 18:27:55 +08:00
Jingu Kang	a2a0ac42ab	[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV This pass transforms loops that contain a conditional branch with induction variable. For example, it transforms left code to right code: newbound = min(n, c) while (iv < n) { while(iv < newbound) { A A if (iv < c) B B C C } } if (iv != n) { while (iv < n) { A C } } Differential Revision: https://reviews.llvm.org/D102234	2021-06-07 10:55:25 +01:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
maekawatoshiki	0a9d079931	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `2165360003`. To fix the crash problem in legacy pass manager	2021-06-07 01:26:47 +09:00
Simon Pilgrim	9ced408fe9	SimplifyCFG.cpp - remove dead early-return code added at rGcc63203908da. NFCI. We've already checked that ScanIdx == 0 a few lines above.	2021-06-06 14:15:11 +01:00
Liqiang Tao	48252d7570	Revert "[llvm] Add interface to order inlining"	2021-06-06 14:45:03 +08:00
Liqiang Tao	478dc47292	[llvm] Add interface to order inlining This patch abstract Calls in Inliner:run() to InlineOrder. With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D103315	2021-06-06 12:03:02 +08:00
Roman Lebedev	e350494fb0	[NFC] Promote willNotOverflow() / getStrengthenedNoWrapFlagsFromBinOp() from IndVars into SCEV proper We might want to use it when creating SCEV proper in createSCEV(), now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`, which might have caused us to loose some optimization potential.	2021-06-05 12:17:51 +03:00
Nikita Popov	db45746821	[LoopUnroll] Separate peeling from unrolling Loop peeling is currently performed as part of UnrollLoop(). Outside test scenarios, it is always performed with an unroll count of 1. This means that unrolling doesn't actually do anything apart from performing post-unroll simplification. When testing, it's currently possible to specify both an explicit peel count and an explicit unroll count. This doesn't perform any sensible operation and may result in miscompiles, see https://bugs.llvm.org/show_bug.cgi?id=45939. This patch moves peeling from UnrollLoop() into tryToUnrollLoop(), so that peeling does not also perform a susequent unroll. We only run the post-unroll simplifications. Specifying both an explicit peel count and unroll count is forbidden. In the future, we may want to support both (non-PGO) peeling a loop and unrolling it, but this needs to be done by first performing the peel and then recalculating unrolling heuristics on a now possibly analyzable loop. Differential Revision: https://reviews.llvm.org/D103362	2021-06-05 10:32:00 +02:00
Vitaly Buka	e3258b0894	Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always)." Windows is still broken. This reverts commit `927688a4cd`.	2021-06-05 00:39:50 -07:00
Kevin Athey	927688a4cd	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-05 00:26:10 -07:00
Fangrui Song	06e7de795b	Fix some -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build	2021-06-04 23:34:43 -07:00
Vitaly Buka	d8a4a2cb93	Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always)." Reverts commits of D103304, it breaks Darwin. This reverts commit `60e5243e59`. This reverts commit `26b3ea224e`. This reverts commit `17600ec32a`.	2021-06-04 20:20:11 -07:00
Kevin Athey	60e5243e59	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-04 16:30:47 -07:00
Fangrui Song	9e51d1f348	[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat `__profd_` variables are referenced by code only when value profiling is enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just waste space on ELF/Mach-O. We change the comdat symbol from `__profd_` to `__profc_` because an internal symbol does not provide deduplication features on COFF. The choice doesn't matter on ELF. (In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_` symbols.) On Windows this enables further optimization. We are no longer affected by the link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can cause duplicate definition error. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585). This avoids many `/INCLUDE:` directives in `.drectve`. Here is rnk's measurement for Chrome: ``` This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%: #BEFORE $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 1047758867 $ du -cksh base_unittests.exe 82M base_unittests.exe 82M total # AFTER $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 937886499 $ du -cksh base_unittests.exe 78M base_unittests.exe 78M total ``` The change is NFC for Mach-O. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103372	2021-06-04 13:27:56 -07:00
Nikita Popov	14f350daf2	[IndVars] Don't forget value when inferring nowrap flags When SimplifyIndVars infers IR nowrap flags from SCEV, this may happen in two ways: Either nowrap flags were already present in SCEV and just get transferred to IR. Or zero/sign extension of addrecs infers additional nowrap flags, and those get transferred to IR. In the latter case, calling forgetValue() ensures that the newly inferred nowrap flags get propagated to any other SCEV expressions based on the addrec. However, the invalidation can also have a major compile-time effect in some cases. For https://bugs.llvm.org/show_bug.cgi?id=50384 with n=512 compile- time drops from 7.1s to 0.8s without this invalidation. At the same time, removing the invalidation doesn't affect any codegen in test-suite. Differential Revision: https://reviews.llvm.org/D103424	2021-06-04 20:57:22 +02:00
Rong Xu	8d581857d7	[SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is for llvm-profdata part of change. It sets the bit masks for the profile reader in llvm-profdata. Also add an internal option "-fs-discriminator-pass" for show and merge command to process the profile offline. This patch also moved setDiscriminatorMaskedBitFrom() to SampleProfileReader::create() to simplify the interface. Differential Revision: https://reviews.llvm.org/D103550	2021-06-04 11:22:06 -07:00
Adam Nemet	ffde966cd9	[Matrix] Fix transpose-multiply folding if transpose has multiple uses Don't add it to FusedInsts in this case. Differential Revision: https://reviews.llvm.org/D103627	2021-06-04 10:55:03 -07:00
Joseph Huber	4a08163c73	[Attributor] Check HeapToStack's state for isKnownHeapToStack This patch changes the `isKnownHeapToStack` and `isAssumedHeapToStack` member functions to return if a function call is going to be altered by HeapToStack. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103574	2021-06-04 12:38:33 -04:00
Nico Weber	e9a9c85098	Revert "[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat" This reverts commit `a14fc749aa`. Breaks check-profile on macOS. See https://reviews.llvm.org/D103372 for details.	2021-06-04 10:00:12 -04:00
Sanjay Patel	23a116c8c4	[InstCombine] convert lshr to ashr to eliminate cast op This is similar to `b865eead76` ( D103617 ) and fixes: https://llvm.org/PR50575 `41b71f718b` did this and more (noted with TODO comments in the tests), but it didn't handle the case where the destination is narrower than the source, so it got reverted. This is a simple match-and-replace. If there's evidence that the TODO cases are useful, we can revisit/extend.	2021-06-04 07:04:37 -04:00
Nico Weber	5c600dc6d4	Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always)." This reverts commit `41b3088c3f`. Doesn't build on macOS, see comments on https://reviews.llvm.org/D103304	2021-06-03 21:01:11 -04:00
Arthur Eubanks	edf2056ff3	[BuildLibCalls] Properly set ABI attributes on arguments Some floating point lib calls have ABI attributes that need to be set on the caller. Found via D103412. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103415	2021-06-03 15:45:07 -07:00
Philip Reames	a4b924a017	Kill a variable which is unused after `cddcc4cf` [nfc]	2021-06-03 14:38:57 -07:00
Philip Reames	cddcc4cff5	A couple style tweaks on top of `5c0d1b2f9` [nfc]	2021-06-03 14:14:59 -07:00
Philip Reames	5c0d1b2f90	[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed. The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile. Differential Revision: https://reviews.llvm.org/D103620	2021-06-03 14:09:16 -07:00
Fangrui Song	a14fc749aa	[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat `__profd_` variables are referenced by code only when value profiling is enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just waste space on ELF/Mach-O. We change the comdat symbol from `__profd_` to `__profc_` because an internal symbol does not provide deduplication features on COFF. The choice doesn't matter on ELF. (In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_` symbols.) On Windows this enables further optimization. We are no longer affected by the link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can cause duplicate definition error. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585). This avoids many `/INCLUDE:` directives in `.drectve`. Here is rnk's measurement for Chrome: ``` This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%: #BEFORE $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 1047758867 $ du -cksh base_unittests.exe 82M base_unittests.exe 82M total # AFTER $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 937886499 $ du -cksh base_unittests.exe 78M base_unittests.exe 78M total ``` Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103372	2021-06-03 13:16:13 -07:00
Kevin Athey	41b3088c3f	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-03 13:13:51 -07:00
Sanjay Patel	b865eead76	[InstCombine] eliminate sext and/or trunc if value has enough signbits If we have enough signbits in a source value, we can skip an intermediate cast for a trunc+sext pair: https://alive2.llvm.org/ce/z/A_mQt- This is the original problem shown in: https://llvm.org/PR49543 There's a test that shows we transformed what used to be a pair of shifts, so that suggests we could add another ComputeNumSignBits fold starting from a shift. There does not appear to be any change in compile-time from the extra analysis: https://llvm-compile-time-tracker.com/compare.php?from=3d2c9069dcafd0cbb641841aa3dd6e851fb7d760&to=b9513cdf2419704c7bb0c3a02a9ca06aae13d902&stat=instructions Differential Revision: https://reviews.llvm.org/D103617	2021-06-03 13:58:19 -04:00
Philip Reames	44d70d298a	[LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc] This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly. Differential Revision: https://reviews.llvm.org/D103584	2021-06-03 10:33:14 -07:00
Alexey Bataev	8c48d77cdf	[SLP]Improve cost estimation/emission of externally used extractelements. No need to recalculate the cost of extractelements, just no need to compensate the cost of all extractelements, need to check before if this is actually going to be removed at the vectorization. Also, no need to generate new extractelement instruction, we may just regenerate the original one. It may improve the final vectorization. Differential Revision: https://reviews.llvm.org/D102933	2021-06-03 10:26:59 -07:00
Philip Reames	bb5e1c6dcb	[LoopUnroll] Reorder code to max dom tree update more obvious [nfc] This cleans up the unroll action into two phases. Phase 1 does the mechanical act of unrolling, and leaves all conditional branches in place. Phase 2 optimizes away some of the conditional branches and then simplifies the loop. The primary benefit of the reordering is that we can delete some special cases dom tree update logic. Differential Revision: https://reviews.llvm.org/D103561	2021-06-03 10:19:56 -07:00
Alexey Bataev	89f3bc7698	[SLP]Allow to reorder nodes with >2 scalar values. tryToVectorizeList function allows to reorder only 2 scalars. Patch allows to reorder >2 scalars. Also, to avoid possible regressions, it allows extra vectorization of the remaining parts of the scalars elements if possible. Part of D57059. Differential Revision: https://reviews.llvm.org/D103247	2021-06-03 10:01:36 -07:00
Harald van Dijk	5d2b3de284	[SLP] Avoid std::stable_sort(properlyDominates()). As noticed by NAKAMURA Takumi back in 2017, we cannot use properlyDominates for std::stable_sort as properlyDominates only partially orders blocks. That is, for blocks A, B, C, D, where A dominates B and C dominates D, we have A == C, B == C, but A < B. This is not a valid comparison function for std::stable_sort and causes different results between libstdc++ and libc++. This change uses DFS numbering to give deterministic results for all reachable blocks. Unreachable blocks are ignored already, so do not need special consideration. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103441	2021-06-03 17:51:52 +01:00
Hamza Mahfooz	83235b07e3	[Matrix] Preserve existing fast-math flags during lowering This patch makes it so, floating-point instructions created in LowerMatrixIntrinsics retain fast-math flags from instructions that are higher up the chain. Fixes https://bugs.llvm.org/show_bug.cgi?id=49738 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103233	2021-06-03 15:29:31 +01:00
Arthur Eubanks	1faff79b7c	[DFSan] Properly set argument ABI attributes Calls must properly match argument ABI attributes with the callee. Found via D103412. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D103414	2021-06-02 22:24:46 -07:00
Fangrui Song	87c43f3aa9	[InstrProfiling] Delete linkage/visibility toggling for Windows The linkage/visibility of `__profn_` variables are derived from the profiled functions. extern_weak => linkonce available_externally => linkonce_odr internal => private extern => private _ => unchanged The linkage/visibility of `__profc_`/`__profd_` variables are derived from `__profn_` with linkage/visibility wrestling for Windows. The changes can be folded to the following without changing semantics. ``` if (TT.isOSBinFormatCOFF() && !NeedComdat) { Linkage = GlobalValue::InternalLinkage; Visibility = GlobalValue::DefaultVisibility; } ``` That said, I think we can just delete the code block. An extern/internal function will now use private `__profc_`/`__profd_` variables, instead of internal ones. This saves some symbol table entries. A non-comdat {linkonce,weak}_odr function will now use hidden external `__profc_`/`__profd_` variables instead of internal ones. There is potential object file size increase because such symbols need `/INCLUDE:` directives. However such non-comdat functions are rare (note that non-comdat weak definitions don't prevent duplicate definition error). The behavior changes match ELF. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103355	2021-06-02 16:49:54 -07:00
Dave Lee	60ce8babf7	[coro] Preserve scope line for compiler generated functions Coro-split functions with an active suspend point have their scope line set to the line of the suspend point. However for compiler generated functions, this results in debug info with unconventional results: a file named `<compiler-generated>` with a non-zero line number. The convention for `<compiler-generated>` is that the line number is zero. This change propagates the scope line only for non-compiler generated functions. Differential Revision: https://reviews.llvm.org/D102412	2021-06-02 15:57:12 -07:00
Andrew Browne	70804f2a2f	Fix dfsan handling of musttail calls. Without this change, a callsite like: [[clang::musttail]] return func_call(x); will cause an error like: fatal error: error in backend: failed to perform tail call elimination on a call site marked musttail due to DFSan inserting instrumentation between the musttail call and the return. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D103542	2021-06-02 11:38:35 -07:00
Rong Xu	6745ffe4fa	[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators. This patch also simplified the bit API used by FS discriminators. Differential Revision: https://reviews.llvm.org/D103041	2021-06-02 10:32:52 -07:00
Stephen Tozer	4316b0e59c	[LoopStrengthReduce] Ensure that debug intrinsics do not affect LSR's output During Loop Strength Reduce, if the terminating condition for the loop is not immediately adjacent to the terminating branch and it has more than one use, a clone of the condition will be created just before the terminating branch and will be used as the branch condition. Currently, whether the instructions are "immediately adjacent" is determined by checking whether the next instruction after the condition is the terminating branch; this is incorrect however, as the presence of a debug intrinsic between the two will result in a change to the output. This is fixed by using getNextNonDebugInstruction() instead. Differential Revision: https://reviews.llvm.org/D103033	2021-06-02 15:56:23 +01:00
Arnold Schwaighofer	f1a0c5d67c	[coro async] Add the swiftasync attribute to the resume partial function Transfer the swiftasync attribute to the resume partial function according to suspend.async specification. It's first argument denotes which argument is the async context. rdar://71499498 Differential Revision: https://reviews.llvm.org/D103285	2021-06-02 07:44:33 -07:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Sander de Smalen	034503e9d2	[LV] NFC: Remove redundant isLegalMasked(Gather\|Scatter) functions. This NFC change follows from conversation in D102437, where it was discussed to remove these functions as a separate patch.	2021-06-02 14:09:07 +01:00
Sander de Smalen	3472d3fd9d	[LV] NFC: Replace custom getMemInstValueType by llvm::getLoadStoreType. llvm::getLoadStoreType was added recently and has the same implementation as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no value in having two implementations, this patch removes the custom LV implementation in favor of the generic one defined in Instructions.h.	2021-06-02 14:09:06 +01:00
Jingu Kang	f3a27511c9	[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch This re-enables commit `107d19eb01` with bug fixes. Differential Revision: https://reviews.llvm.org/D99354	2021-06-02 10:58:22 +01:00
Bjorn Pettersson	9c54ee4378	[SimplifyLibCalls] Take size of int into consideration when emitting ldexp/ldexpf When rewriting powf(2.0, itofp(x)) -> ldexpf(1.0, x) exp2(sitofp(x)) -> ldexp(1.0, sext(x)) exp2(uitofp(x)) -> ldexp(1.0, zext(x)) the wrong type was used for the second argument in the ldexp/ldexpf libc call, for target architectures with 16 bit "int" type. The transform incorrectly used a bitcasted function pointer with a 32-bit argument when emitting the ldexp/ldexpf call for such targets. The fault is solved by using the correct function prototype in the call, by asking TargetLibraryInfo about the size of "int". TargetLibraryInfo by default derives the size of the int type by assuming that it is 16 bits for 16-bit architectures, and 32 bits otherwise. If this isn't true for a target it should be possible to override that default in the TargetLibraryInfo initializer. Differential Revision: https://reviews.llvm.org/D99438	2021-06-02 11:40:34 +02:00
Daniil Fukalov	0b34acdab7	[NFC] Fix 'Load' name masking. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103456	2021-06-02 11:09:53 +03:00
Arthur Eubanks	2983053d23	[NFC][OpaquePtr] Explicitly pass GEP source type to IRBuilder in more places	2021-06-01 13:13:37 -07:00
Harald van Dijk	f126e8ec28	[SLPVectorizer] Ignore unreachable blocks As the existing test unreachable.ll shows, we should be doing more work to avoid entering unreachable blocks: we should not stop vectorization just because a PHI incoming value from an unreachable block cannot be vectorized. We know that particular value will never be used so we can just replace it with poison.	2021-06-01 20:21:04 +01:00
Alexey Bataev	36911971a5	[SLP]Better detection of perfect/shuffles matches for gather nodes. Implemented better scheme for perfect/shuffled matches of the gather nodes which allows to fix the performance regressions introduced by earlier patches. Starting detecting matches for broadcast nodes and extractelement gathering. Differential Revision: https://reviews.llvm.org/D102920	2021-06-01 07:08:07 -07:00
Daniil Seredkin	13140120dc	[InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y) InstCombine didn't perform the transformations when fmul's operands were the same instruction because it required to have one use for each of them which is false in the case. This patch fixes this + adds tests for them and introduces a new function isOnlyUserOfAnyOperand to check these cases in a single place. This patch is a result of discussion in D102574. Differential Revision: https://reviews.llvm.org/D102698	2021-06-01 08:33:23 -04:00
Florian Hahn	1b84acb23a	[LoopDeletion] Consider infinite loops alive, unless mustprogress. The current loop or any of its sub-loops may be infinite. Unless the function or the loops are marked as mustprogress, this in itself makes the loop not dead. This patch moves the logic to check whether the current loop is finite or mustprogress to `isLoopDead` and also extends it to check the sub-loops. This should fix PR50511. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D103382	2021-06-01 13:07:36 +01:00
Florian Hahn	d4c070d801	[VectorCombine] Freeze index unless it is known to be non-poison. If the index itself is already poison, the poison propagates through instructions clamping the index to a valid range. This still causes introducing a load of poison, as flagged by Alive2 and pointed out at `575e2aff55`. This patch updates the code to freeze the index, unless it is proven to not be poison. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D103378	2021-06-01 10:40:57 +01:00
Nathan Chancellor	e6b086bef2	Revert "[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)" This reverts commit `4f2fd3818b`. The Linux kernel fails to build after this commit. See https://reviews.llvm.org/D99481 for a reproducer. Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2021-05-31 20:21:26 -07:00
Arthur Eubanks	372237487e	[OpaquePtr] Remove some uses of PointerType::getElementType()	2021-05-31 16:11:25 -07:00
Congzhe Cao	bfefde22b6	[LoopInterhcange] Handle movement of reduction phis appropriately This patch fixes pr43326 and pr48212. Currently when we move reduction phis to the right place, loop interchange assumes the first phi in loop headers is an induction phi, skips the first phi and assumes the rest of phis are candidate reduction phis to move. However, it may not always be the case. This patch loops over all phis in loop headers and considers a phi node as a candidate reduction phi to move only when it is indeed a reduction phi across outer and inner loop. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102743	2021-05-31 16:27:38 -04:00
Florian Hahn	aa00b1d763	[LV] Try to sink users recursively for first-order recurrences. Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction. Fixes PR44286 (and another PR or two). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84951	2021-05-31 19:55:33 +01:00
Roman Lebedev	f7c95c3322	[NFC] ScalarEvolution: apply SSO to the ExprValueMap value ExprValueMap is a map from SCEV * to a set-vector of (Value , ConstantInt ) pair, and while the map itself will likely be big-ish (have many keys), it is a reasonable assumption that each key will refer to a small-ish number of pairs. In particular looking at n=512 case from https://bugs.llvm.org/show_bug.cgi?id=50384, the small-size of 4 appears to be the sweet spot, it results in the least allocations while minimizing memory footprint. ``` $ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i \| tail -n 6; echo ""; done heaptrack.opt.0-orig.gz total runtime: 14.32s. calls to allocation functions: 8222442 (574192/s) temporary memory allocations: `2419000` (168924/s) peak heap memory consumption: 190.98MB peak RSS (including heaptrack overhead): 239.65MB total memory leaked: 67.58KB heaptrack.opt.1-n1.gz total runtime: 13.72s. calls to allocation functions: 7184188 (523705/s) temporary memory allocations: 2419017 (176338/s) peak heap memory consumption: 191.38MB peak RSS (including heaptrack overhead): 239.64MB total memory leaked: 67.58KB heaptrack.opt.2-n2.gz total runtime: 12.24s. calls to allocation functions: 6146827 (502355/s) temporary memory allocations: 2418997 (197695/s) peak heap memory consumption: 163.31MB peak RSS (including heaptrack overhead): 211.01MB total memory leaked: 67.58KB heaptrack.opt.3-n4.gz total runtime: 12.28s. calls to allocation functions: 6068532 (494260/s) temporary memory allocations: 2418985 (197017/s) peak heap memory consumption: 155.43MB peak RSS (including heaptrack overhead): 201.77MB total memory leaked: 67.58KB heaptrack.opt.4-n8.gz total runtime: 12.06s. calls to allocation functions: 6068042 (503321/s) temporary memory allocations: 2418992 (200646/s) peak heap memory consumption: 166.03MB peak RSS (including heaptrack overhead): 213.55MB total memory leaked: 67.58KB heaptrack.opt.5-n16.gz total runtime: 12.14s. calls to allocation functions: 6067993 (499958/s) temporary memory allocations: 2418999 (199307/s) peak heap memory consumption: 187.24MB peak RSS (including heaptrack overhead): 233.69MB total memory leaked: 67.58KB ``` While that test may be an edge worst-case scenario, https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions agrees that this also results in improvements in the usual situations.	2021-05-31 15:34:03 +03:00
Juneyoung Lee	7161bb87c9	[InsCombine] Fix a few remaining vec transforms to use poison instead of undef This is a patch that replaces shufflevector and insertelement's placeholder value with poison. Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead (D93818) The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 . This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.	2021-05-31 18:47:09 +09:00
David Green	222aeb4d51	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-05-31 10:22:37 +01:00
Hyeongyu Kim	4f2fd3818b	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-05-31 14:08:20 +09:00
Sanjay Patel	7bb8bfa062	[InstCombine] fix miscompile from vector select substitution This is similar to the fix in `c590a9880d` ( PR49832 ), but we missed handling the pattern for select of bools (no compare inst). We can't substitute a vector value because the equality condition replacement that we are attempting requires that the condition is true/false for the entire value. Vector select can be partly true/false. I added an assert for vector types, so we shouldn't hit this again. Fixed formatting while auditing the callers. https://llvm.org/PR50500	2021-05-30 07:11:58 -04:00
Mindong Chen	71acce68da	[NFCI] Move DEBUG_TYPE definition below #includes When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE definition defined around the #includes in files include it could result in redefinition warnings even compile errors. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102594	2021-05-30 17:31:01 +08:00
Sanjay Patel	c7da0c383a	[InstCombine] fold zext of masked bit set/clear This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1 Note: this is a re-post of a patch that I committed at: rGa041c4ec6f7a The commit was reverted because it exposed another bug: rGb212eb7159b40 But that has since been corrected with: rG8a156d1c2795189 ( D101191 ) Differential Revision: https://reviews.llvm.org/D72396	2021-05-29 08:52:26 -04:00
Sanjay Patel	52f2970036	[InstCombine] reduce code duplication; NFC	2021-05-29 08:33:25 -04:00
Nikita Popov	625920dabf	[LoopUnroll] Make DomTree explicitly required (NFC) Some of the code was already assuming that DT is non-null, so make that requirement more explicit and remove unnecessary null checks.	2021-05-29 09:37:32 +02:00
Fangrui Song	38dbdde792	[Internalize] Simplify comdat renaming with noduplicates after D103043 I realized that we can use `comdat noduplicates` which is available on ELF. Add a special case for wasm which doesn't support the feature.	2021-05-28 16:58:38 -07:00
Nikita Popov	90310dfff8	[LoopUnroll] Use changeToUnreachable() (NFC) When fulling unrolling with a non-latch exit, the latch block is folded to unreachable. Replace this folding with the existing changeToUnreachable() helper, rather than performing it manually. This also moves the fold to happen after the manual DT update for exit blocks. I believe this is correct in that the conversion of an unconditional backedge into unreachable should not affect the DT at all. Differential Revision: https://reviews.llvm.org/D103340	2021-05-29 00:11:21 +02:00
Nikita Popov	f765445a69	[LoopUnroll] Clean up exit folding (NFC) This does some non-functional cleanup of exit folding during unrolling. The two main changes are: * First rewrite latch->header edges, which is unrelated to exit folding. * Combine folding for latch and non-latch exits. After the previous change, the only difference in their logic is that for non-latch exits we currently only fold "known non-exit" cases, but not "known exit" cases. I think this helps a lot to clarify this code and prepare it for future changes. Differential Revision: https://reviews.llvm.org/D103333	2021-05-28 22:31:13 +02:00
Bardia Mahjour	06eaffa858	[NFC] Remove confusing info about MainLoop VF/UF from debug message	2021-05-28 16:10:04 -04:00
Florian Hahn	007f268c35	[VectorCombine] Check indices for all extracts we scalarize. We need to make sure that the indices of all extracts we scalarize are valid.	2021-05-28 18:35:29 +01:00
Stefan Pintilie	0159652058	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)" This reverts commit `be1a23203b`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	24bd657202	Revert "[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop" This reverts commit `b0b2bf3b5d`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	fd55331203	Revert "[NFC] Formatting fix" This reverts commit `59d938e649`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	807fc7cdc9	Revert "[NFC] Reuse existing variables instead of re-requesting successors" This reverts commit `c467585682`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	dd226803c2	Revert "[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC" This reverts commit `7d418dadf6`.	2021-05-28 12:21:21 -05:00
Sanjay Patel	403cfe5d70	[PassManager] unify late simplifycfg options between regular and LTO pipelines This is split off from D102002, and I think it is clear that the difference in behavior was not intended. Options were added to SimplifyCFG over time, but different chunks of the pass pipelines were not kept in sync.	2021-05-28 13:06:49 -04:00
eopXD	fa488ea864	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 15:43:12 +00:00
dongAxis	66ff1cbd71	[NFC][Transforms][Utils] remove useless variable in CloneBasicBlock	2021-05-28 17:50:38 +08:00
eopXD	e96d6f4821	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `7952ddb21f`. Differential Revision: https://reviews.llvm.org/D103302	2021-05-28 07:58:06 +00:00
eopXD	7e06cf8f1b	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `ffc4d3e068`.	2021-05-28 07:48:04 +00:00
eopXD	ffc4d3e068	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:25:53 +00:00
eopXD	7952ddb21f	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:11:26 +00:00
Max Kazantsev	6a2af607ad	Revert "[NFCI] Lazily evaluate SCEVs of PHIs" This reverts commit `51d334a845`. Reported failures, need to analyze.	2021-05-28 11:05:30 +07:00
Jinsong Ji	b2581196eb	[AIX] Enable stackprotect feature AIX use `__ssp_canary_word` instead of `__stack_chk_guard`. This patch update the target hook to use correct symbol, so that the basic stackprotect feature can work. The traceback will be handled in follow up patch. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D103100	2021-05-28 02:18:15 +00:00
Jianzhou Zhao	fc1d39849e	[dfsan] Add a flag about whether to propagate offset labels at gep DFSan has flags to control flows between pointers and objects referred by pointers. For example, a = p; L(a) = L(p) when -dfsan-combine-pointer-labels-on-load = false L(a) = L(p) + L(p) when -dfsan-combine-pointer-labels-on-load = true p = b; L(p) = L(b) when -dfsan-combine-pointer-labels-on-store = false L(p) = L(b) + L(p) when -dfsan-combine-pointer-labels-on-store = true The question is what to do with p += c. In practice we found many confusing flows if we propagate labels from c to p. So a new flag works like this p += c; L(p) = L(p) when -dfsan-propagate-via-pointer-arithmetic = false L(p) = L(p) + L(c) when -dfsan-propagate-via-pointer-arithmetic = true Reviewed-by: gbalats Differential Revision: https://reviews.llvm.org/D103176	2021-05-28 00:06:19 +00:00
Arthur Eubanks	2d2a902078	[SanCov] Properly set ABI parameter attributes Arguments need to have the proper ABI parameter attributes set. Followup to D101806. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103288	2021-05-27 15:27:21 -07:00
Adrian Prantl	f3869a5c32	Support stripping indirectly referenced DILocations from !llvm.loop metadata in stripDebugInfo(). This patch fixes an oversight in https://reviews.llvm.org/D96181 and also takes into account loop metadata pointing to other MDNodes that point into the debug info. rdar://78487175 Differential Revision: https://reviews.llvm.org/D103220	2021-05-27 13:23:33 -07:00
maekawatoshiki	2165360003	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-28 01:17:23 +09:00
Florian Hahn	38641ddf3e	[VPlan] Do not sink uniform recipes in sinkScalarOperands. For uniform ReplicateRecipes, only the first lane should be used, so sinking them would mean we have to compute the value of the first lane multiple times. Also, at the moment, sinking them causes a crash because the value of the first lane is re-used by all users. Reported post-commit for D100258.	2021-05-27 14:07:48 +01:00
Max Kazantsev	7d418dadf6	[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC	2021-05-27 15:29:37 +07:00
Max Kazantsev	c467585682	[NFC] Reuse existing variables instead of re-requesting successors	2021-05-27 15:29:37 +07:00
Max Kazantsev	51d334a845	[NFCI] Lazily evaluate SCEVs of PHIs Eager evaluation has cost of compile time. Only query them if they are required for proving predicates.	2021-05-27 13:35:31 +07:00
Max Kazantsev	59d938e649	[NFC] Formatting fix	2021-05-27 12:50:54 +07:00
Max Kazantsev	b0b2bf3b5d	[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop	2021-05-27 12:44:22 +07:00
Yevgeny Rouban	4d26f41f76	[RS4GC] Introduce intrinsics to get base ptr and offset There can be a need for some optimizations to get (base, offset) for any GC pointer. The base can be calculated by generating needed instructions as it is done by the RewriteStatepointsForGC::findBasePointer() function. The offset can be calculated in the same way. Though to not expose the base calculation and to make the offset calculation as simple as ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal outside RS4GC, this patch introduces 2 intrinsics: @llvm.experimental.gc.get.pointer.base(%derived_ptr) @llvm.experimental.gc.get.pointer.offset(%derived_ptr) These intrinsics are inlined by RS4GC along with generation of statepoint sequences. With these new intrinsics the GC parseable lowering for atomic memcpy intrinsics (`6ec2c5e402`) could be implemented as a separate pass. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D100445	2021-05-27 09:14:14 +07:00
Heejin Ahn	5bfe06ad35	[SimplifyCFG] Use make_early_inc_range() while deleting instructions We are deleting `phi` nodes within the for loop, so this makes sure we increment the iterator before we delete the instruction pointed by the iterator. This started to break in `a0be081646`. Reviewed By: dschuff, lebedev.ri Differential Revision: https://reviews.llvm.org/D103181	2021-05-26 11:43:11 -07:00
Alexey Bataev	27d3528acf	[SLP]Fix vectorization of insertelements with multiple uses. SLP vectorizer should not consider in sertelements with multiple uses as a part of high level build vector, it must be considered as a terminating insertelement in the vector build, otherwise it may produce incorrect code. Differential Revision: https://reviews.llvm.org/D103164	2021-05-26 09:42:18 -07:00
Stephen Tozer	a0bd6105d8	[DebugInfo] Limit the number of values that may be referenced by a dbg.value Following the addition of salvaging dbg.values using DIArgLists to reference multiple values, a case has been found where excessively large DIArgLists are produced as a result of this salvaging, resulting in large enough performance costs to effectively freeze the compiler. This patch introduces an upper bound of 16 to the number of values that may be salvaged into a dbg.value, to limit the impact of these extreme cases to performance. Differential Revision: https://reviews.llvm.org/D103162	2021-05-26 17:34:05 +01:00
Philip Reames	9cc2181ec3	[unroll] Use value domain for symbolic execution based cost model The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost. By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928. Differential Revision: https://reviews.llvm.org/D102934	2021-05-26 08:41:25 -07:00
Kerry McLaughlin	9f76a85260	[LoopVectorize] Enable strict reductions when allowReordering() returns false When loop hints are passed via metadata, the allowReordering function in LoopVectorizationLegality will allow the order of floating point operations to be changed: bool allowReordering() const { // When enabling loop hints are provided we allow the vectorizer to change // the order of operations that is given by the scalar loop. This is not // enabled by default because can be unsafe or inefficient. The -enable-strict-reductions flag introduced in D98435 will currently only vectorize reductions in-loop if hints are used, since canVectorizeFPMath() will return false if reordering is not allowed. This patch changes canVectorizeFPMath() to query whether it is safe to vectorize the loop with ordered reductions if no hints are used. For testing purposes, an additional flag (-hints-allow-reordering) has been added to disable the reordering behaviour described above. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D101836	2021-05-26 13:59:12 +01:00
Max Kazantsev	be1a23203b	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2) The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:47:14 +07:00
Max Kazantsev	0de553dce0	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"" This reverts commit `43d2e51c2e`. Commited wrong version.	2021-05-26 19:29:07 +07:00
Max Kazantsev	43d2e51c2e	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:23:21 +07:00
David Sherwood	70d8365e33	Fix warning introduced by `9c766f4090`	2021-05-26 10:20:39 +01:00
David Sherwood	9c766f4090	[InstCombine] Fold extractelement + vector GEP with one use We sometimes see code like this: Case 1: %gep = getelementptr i32, i32* %a, <2 x i64> %splat %ext = extractelement <2 x i32> %gep, i32 0 or this: Case 2: %gep = getelementptr i32, <4 x i32> %a, i64 1 %ext = extractelement <4 x i32> %gep, i32 0 where there is only one use of the GEP. In such cases it makes sense to fold the two together such that we create a scalar GEP: Case 1: %ext = extractelement <2 x i64> %splat, i32 0 %gep = getelementptr i32, i32 %a, i64 %ext Case 2: %ext = extractelement <2 x i32> %a, i32 0 %gep = getelementptr i32, i32 %ext, i64 1 This may create further folding opportunities as a result, i.e. the extract of a splat vector can be completely eliminated. Also, even for the general case where the vector operand is not a splat it seems beneficial to create a scalar GEP and extract the scalar element from the operand. Therefore, in this patch I've assumed that a scalar GEP is always preferrable to a vector GEP and have added code to unconditionally fold the extract + GEP. I haven't added folds for the case when we have both a vector of pointers and a vector of indices, since this would require generating an additional extractelement operation. Tests have been added here: Transforms/InstCombine/gep-vector-indices.ll Differential Revision: https://reviews.llvm.org/D101900	2021-05-26 09:54:26 +01:00
Teresa Johnson	d35fe04fa3	[LTT] Handle merged llvm.assume when dropping type tests When the lower type test pass is invoked a second time with DropTypeTests set to true, it expects that all remaining type tests feed assume instructions, which are removed along with the type tests. In some cases the llvm.assume might have been merged with another one, i.e. from a builtin_assume instruction, in which case the type test would actually feed a phi that in turn feeds the merged assume instruction. In this case we can simply replace that operand of the phi with "true" before removing the type test. Differential Revision: https://reviews.llvm.org/D103073	2021-05-25 17:02:13 -07:00
Kevin Athey	52ac114771	LLVM Detailed IR tests for introduction of flag -fsanitize-address-detect-stack-use-after-return-mode. Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102462	2021-05-25 16:17:39 -07:00
Fangrui Song	b426b45d10	[Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member Beside the `comdat any` deduplication feature, instrumentations use comdat to establish dependencies among a group of sections, to prevent section based linker garbage collection from discarding some members without discarding all. LangRef acknowledges this usage with the following wording: > All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key. On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated `__llvm_prf_data` section are placed in the same GRP_COMDAT group. A `__llvm_prf_data` is usually not referenced and expects the liveness of its associated `__llvm_prf_cnts` to retain it. The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained). The main goal of this patch is to fix the dependency relationship. I think it makes sense for InternalizePass to internalize a comdat and thus suppress the deduplication feature, e.g. a relocatable link of a regular LTO can create an object file affected by InternalizePass. If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the a.o references to the comdat definitions will be non-resolvable (references cannot bind to STB_LOCAL definitions in b.o). On PE-COFF, for a non-external selection symbol, deduplication is naturally suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend to believe the spec creator has not thought about this use case (see D102973). GNU ld and gold are still using the "signature is name based" interpretation. So even if D102973 for ld.lld is accepted, for portability, a better approach is to rename the comdat. A comdat with one single member is the common case, leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by deleting the comdat; otherwise we rename the comdat. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103043	2021-05-25 14:15:27 -07:00
Matt Morehouse	832c99f727	Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" This reverts commit `2531fd70d1` due to performance regression on the PPC buildbot.	2021-05-25 13:58:42 -07:00
Benjamin Kramer	d2d4f16806	[Matrix] Use LLVM_DEBUG for a debug flag dump() doesn't exist in release builds. ld.lld: error: undefined symbol: llvm::Value::dump() const >>> referenced by LowerMatrixIntrinsics.cpp >>> LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())	2021-05-25 21:10:19 +02:00
Nikita Popov	9c91614959	[CVP] Guard against poison in common phi value transform (PR50399) The common phi value transform replaces constants with values that have the same value as the constant on a given edge. However, LVI generally only provides information that is correct up to poison, so this can end up replacing a well-defined value with poison. D69442 addressed an instance of this problem by clearing poison flags on the generating instruction, which was sufficient at the time. rGa917fb89dc28 made LVI's edge value analysis slightly more powerful, and clearing poison flags is no longer sufficient. This patch changes the transform to instead explicitly guard against a poison value instead. This should be satisfied for most cases due to a prior branch on poison. Fixes https://bugs.llvm.org/show_bug.cgi?id=50399. Differential Revision: https://reviews.llvm.org/D102966	2021-05-25 20:47:17 +02:00
Adam Nemet	dfd1bbd00a	[Matrix] Factor and distribute transposes across multiplies Now that we can fold some transposes into multiplies (CM: A * B^t and RM: A^t * B), we want to move them around to create the optimal expressions: * fold away double transposes while still using them to assert the shape * sink transposes hoping they cancel out * lift transposes when both operands are transposed This also modifies the matrix remarks to include the number of exposed transposes (i.e. transposes that we couldn't fold into a multiply). The adjustment to the test remarks-inlining is a bit subtle: I am changing the double transpose to a single transpose so that we don't remove it completely. More importantly this changes some of the total instruction count, most notable stores because we can no longer use a vector store. Differential Revision: https://reviews.llvm.org/D102733	2021-05-25 11:12:20 -07:00
Roman Lebedev	149e018d12	[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: https://github.com/AliveToolkit/alive2/issues/726	2021-05-25 21:02:28 +03:00
Sanjay Patel	ae1bc9ebf3	[InstCombine] avoid infinite loop from vector select transforms The 2nd test is based on the fuzzer example in post-commit comments of D101191 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661 The 1st test shows that we don't deal with this symmetrically. We should be able to reduce both examples (possibly in instsimplify instead of instcombine).	2021-05-25 13:28:38 -04:00
Florian Hahn	8e83ff58c9	[VectorCombine] Remove unneeded InsertPointGuard (NFCI). All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.	2021-05-25 17:01:05 +01:00
Florian Hahn	575e2aff55	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Sanjay Patel	0bab0f6161	[InstCombine] canonicalize cast before unary shuffle We could go either direction on this transform. VectorCombine already goes this way for bitcasts (and handles more complicated cases using the cost model), so let's try cast-first. Deferring completely to VectorCombine is another possibility. But the backend should be able to invert this easily when the vectors have the same shape, so it doesn't seem like a transform that we need to avoid. The motivating example from https://llvm.org/PR49081 has an int-to-float sandwiched between 2 shuffles, and the backend currently does not reduce that, so on x86, we get something like: pshufd $249, %xmm0, %xmm0] cvtdq2ps %xmm0, %xmm0 shufps $144, %xmm0, %xmm0 ...instead of just a single conversion instruction. Differential Revision: https://reviews.llvm.org/D103038	2021-05-25 08:43:09 -04:00
Chuanqi Xu	400a9d3501	[NFC] [Coroutines] Remove unused variable: UnreachableCache	2021-05-25 20:33:46 +08:00
Roman Lebedev	8f4db14d1c	[LoopIdiom] Support 'left-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countBits(unsigned val) { int cnt = 0; for( ; (val << cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val << (cnt + off); cnt++) ; return cnt; } ``` alive2 is happy with all the tests there. Note that, again, much like with the right-shift cases, we don't require the `val != 0` guard. This is the last pattern that was supported by `detectShiftUntilZeroIdiom()`, which now becomes obsolete.	2021-05-25 15:26:35 +03:00
Roman Lebedev	f1c5f78d38	[LoopIdiom] Support 'arithmetic right-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(signed val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countActiveBits(signed val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` This directly matches the existing 'logical right-shift until zero' idiom. alive2 is happy with all the tests there. Note that, again, much like with the original unsigned case, we don't require the `val != 0` guard. The old `detectShiftUntilZeroIdiom()` already supports this pattern, the idea here is that the `val` must be positive (have at least one leading zero), because otherwise the loop is non-terminating, but since it is not `while(1)`, that would have been UB.	2021-05-25 14:30:49 +03:00
Marco Elver	280333021e	[SanitizeCoverage] Add support for NoSanitizeCoverage function attribute We really ought to support no_sanitize("coverage") in line with other sanitizers. This came up again in discussions on the Linux-kernel mailing lists, because we currently do workarounds using objtool to remove coverage instrumentation. Since that support is only on x86, to continue support coverage instrumentation on other architectures, we must support selectively disabling coverage instrumentation via function attributes. Unfortunately, for SanitizeCoverage, it has not been implemented as a sanitizer via fsanitize= and associated options in Sanitizers.def, but rolls its own option fsanitize-coverage. This meant that we never got "automatic" no_sanitize attribute support. Implement no_sanitize attribute support by special-casing the string "coverage" in the NoSanitizeAttr implementation. To keep the feature as unintrusive to existing IR generation as possible, define a new negative function attribute NoSanitizeCoverage to propagate the information through to the instrumentation pass. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035 Reviewed By: vitalybuka, morehouse Differential Revision: https://reviews.llvm.org/D102772	2021-05-25 12:57:14 +02:00
Alexey Lapshin	10c2e26159	[TRE] Reland: allow TRE for non-capturing calls. The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap. This patch does the same as D82085 plus fixes bootstrap error. The problem with D82085 is that it does not create copies for byval operands, while replacing function call with a branch. Consider following example: ``` int zoo ( S p1 ); int foo ( int count, S p1 ) { if ( count > 10 ) return zoo(p1); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). // lifetime.start p1.byvalue.temp return foo(count+1, p1); // lifetime.end p1.byvalue.temp } ``` After recursive call to foo is replaced with a jump into start of the function, its parameters could be passed to zoo function. i.e. temporarily variable created for byvalue parameter "p1" could be passed to zoo. Finally zoo receives broken operand: ``` int foo ( int count, S p1 ) { :tailrecurse p1_tr = phi p1, p1.byvalue.temp if ( count > 10 ) return zoo(p1_tr); // temporarily variable created for passing byvalue parameter // p1 could be used when zoo(p1) is called(after TRE is done). lifetime.start p1.byvalue.temp memcpy (p1.byvalue.temp, p1_tr) count = count + 1 lifetime.end p1.byvalue.temp br tailrecurse } ``` To prevent using p1.byvalue.temp after its scope finished by lifetime.end marker this patch copies value from p1.byvalue.temp into another temporarily variable and then copies this variable into the input parameter for next iteration. This patch passes bootstrap build and bootstrap build with AddressSanitizer. Differential Revision: https://reviews.llvm.org/D85614	2021-05-25 11:35:48 +03:00
Max Kazantsev	2531fd70d1	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: reames	2021-05-25 12:43:31 +07:00
maekawatoshiki	e77d24f70a	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `d65c32fb41`.	2021-05-25 11:39:49 +09:00
Anton Afanasyev	b2cd895011	[SLP] Fix "gathering" of insertelement instructions For rare exceptional case vector tree node (insertelements for now only) is marked as `NeedToGather`, this case is processed by patch. Follow-up of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135. Differential Revision: https://reviews.llvm.org/D102675	2021-05-25 01:35:43 +03:00
Jon Roelofs	095e91c973	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Re-landing now that the crasher this patch previously uncovered has been fixed in: https://reviews.llvm.org/D102935 Differential revision: https://reviews.llvm.org/D102452	2021-05-24 10:10:44 -07:00
Jon Roelofs	694068d0db	[Remarks] Look through inttoptr/ptrtoint for -ftrivial-auto-var-init remarks. The crasher is a related problem that @aemerson found broke speck2k6/403.gcc when I landed https://reviews.llvm.org/D102452. It has been reduced & modified to reproduce without that patch. Differential revision: https://reviews.llvm.org/D102935	2021-05-24 09:23:22 -07:00
Adrian Prantl	4cba0a4f11	CoroSplit: Replace ad-hoc implementation of reachability with API from CFG.h The current ad-hoc implementation used to determine whether a basic block is unreachable doesn't work correctly in the general case (for example it won't detect successors of unreachable blocks as unreachable). This patch replaces it with the correct API that uses a DominatorTree to answer the question correctly and quickly. rdar://77181156 Differential Revision: https://reviews.llvm.org/D102963	2021-05-24 09:18:33 -07:00
Florian Hahn	65d3dd7c88	[VPlan] Add first VPlan version of sinkScalarOperands. This patch adds a first VPlan-based implementation of sinking of scalar operands. The current version traverse a VPlan once and processes all operands of a predicated REPLICATE recipe. If one of those operands can be sunk, it is moved to the block containing the predicated REPLICATE recipe. Continue with processing the operands of the sunk recipe. The initial version does not re-process candidates after other recipes have been sunk. It also cannot partially sink induction increments at the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the induction is used for example in a GEP, only the first lane is used and in the lowered IR the adds for the other lanes can be sunk into the predicated blocks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100258	2021-05-24 15:29:58 +01:00
Florian Hahn	e9d97d7d9d	[VPlan] Add mayReadOrWriteMemory & friends. This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.	2021-05-24 13:11:32 +01:00
Florian Hahn	4e8c28b6fb	Recommit "[VectorCombine] Scalarize vector load/extract." This reverts commit `94d54155e2`. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.	2021-05-24 11:35:07 +01:00
Roman Lebedev	32bee42719	[NFCI][LoopIdiom] 'left-shift until bittest': assert that BaseX is loop-invariant Given that BaseX is an incoming value when coming from the preheader, it should be loop-invariant, but let's just document this assumption.	2021-05-24 12:15:06 +03:00
Roman Lebedev	aa3dac95ed	[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant As per the reproducer provided by Mikael Holmén in post-commit review.	2021-05-24 12:15:06 +03:00
Florian Hahn	94d54155e2	Revert "[VectorCombine] Scalarize vector load/extract." This reverts commit `86497785d5`. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio	2021-05-24 10:11:00 +01:00
Florian Hahn	86497785d5	[VectorCombine] Scalarize vector load/extract. This patch adds a new combine that tries to scalarize chains of `extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is profitable when extracting only a few elements out of a large vector. At the moment, `store (extractelement (load %ptr), %idx), %ptr` operations on large vectors result in huge code in the backend. This can easily be triggered by using the matrix extension, e.g. https://clang.godbolt.org/z/qsccPdPf4 This should complement D98240. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D100273	2021-05-24 09:29:08 +01:00
Johannes Doerfert	6caea8a7fa	[Attributor] Introduce a helper do deal with constant type mismatches If we simplify values we sometimes end up with type mismatches. If the value is a constant we can often cast it though to still allow propagation. The logic is now put into a helper and it replaces some ad hoc things we did before. This also introduces the AA namespace for abstract attribute related functions and types.	2021-05-23 23:00:40 -05:00
Johannes Doerfert	55e9c28212	[Attributor] Teach AAIsDead about undef values Not only if the branch or switch condition is dead but also if it is assumed `undef` we can delay AAIsDead exploration.	2021-05-23 23:00:40 -05:00
Johannes Doerfert	4878d73419	[Attributor] Deal with address spaces gracefully When we do value propagation we need to cast address spaces properly.	2021-05-23 23:00:39 -05:00
Johannes Doerfert	1ba2929bb8	[Attributor] Be more careful to not disturb the CG outside the SCC We have seen various problems when the call graph was not updated or the updated did not succeed because it involved functions outside the SCC. This patch adds assertions and checks to avoid accidentally changing something outside the SCC that would impact the call graph. It also prevents us from reanalyzing functions outside the current SCC which could cause problems on its own. Note that the transformations we do might cause the CG to be "more precise" but the original one would always be a super set of the most precise one. Since the call graph is by nature an approximation, it is good enough to have a super set of all call edges.	2021-05-23 23:00:39 -05:00
Johannes Doerfert	e93ac1e2de	[Attributor][FIX] Account for undef in the constant value lattice The constant value lattice looks like this ``` <None> \| <undef> / \| \ ... <0> ... \ \| / <unknown> ``` We did not account for the undef and assumed a value meant we could not change anymore. Now we actually check if we have the same value as before, which will signal CHANGED to the users when we go from undef to a specific constant. This fixes, among other things, the bug exposed by @ipccp4 in `value-simplify.ll`.	2021-05-23 20:47:06 -05:00
Johannes Doerfert	5cdc29f795	[Attributor][FIX] Ensure we replace undef if we see the first "real" value The state of AAPotentialValues tracks if undef is contained. It should fold undef into the first non-undef value. However we missed a case before. There was also a shadowing definition of two variables that caused trouble. The test exposes both problems.	2021-05-23 20:47:06 -05:00
Johannes Doerfert	2bc51d39db	[Attributor][NFC] Add helpful debug outputs	2021-05-23 20:47:05 -05:00
Johannes Doerfert	cb511531b9	[Attributor][NFC] Clang format the Attributor source files	2021-05-23 20:47:05 -05:00
maekawatoshiki	d65c32fb41	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-23 22:32:01 +09:00
Yaxun (Sam) Liu	bf6124580d	[HIP] support ThinLTO Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling LTO for offload compilation. Allow LTO for AMDGPU target. AMDGPU target does not support codegen of object files containing call of external functions, therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all the callees. An LLVM option is added to allow function importer to import functions with noinline attribute. HIP toolchain passes proper LLVM options to lld to make sure function importer imports definitions of all the callees. Reviewed by: Teresa Johnson, Artem Belevich Differential Revision: https://reviews.llvm.org/D99683	2021-05-22 10:48:34 -04:00
Nikita Popov	9a9421a461	Reapply [InstCombine] Fold multiuse shr eq zero This was reverted due to performance regressions in ARM benchmarks, which have since been addressed by D101196 (SCEV analysis improvement) and D101778 (CGP reverse transform). ----- The single-use case is handled implicity by converting the icmp into a mask check first. When comparing with zero in particular, we don't need the one-use restriction, as we only produce a single icmp. https://alive2.llvm.org/ce/z/MSixcm https://alive2.llvm.org/ce/z/GwpG0M	2021-05-22 14:46:50 +02:00
Florian Hahn	a6de8d95db	[Matrix] Bail out early if there are no matrix intrinsics. If there are no matrix intrinsics in a function, we can directly bail out, as there's nothing left to do. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D102931	2021-05-22 11:37:25 +01:00
Arthur Eubanks	f7788e1bff	Revert "[NewPM] Only invalidate modified functions' analyses in CGSCC passes" This reverts commit `d14d84af2f`. Causes unacceptable memory regressions.	2021-05-21 16:38:03 -07:00
Florian Hahn	a0ce6439ca	[Matrix] Remove unused matrix-propagate-shape option. The option was used during the initial bringup, but it does not add any value at this point. Remove it. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D102930	2021-05-21 19:01:54 +01:00
maekawatoshiki	fd53cb4148	Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass" This reverts commit `cea7a3fe3d`. To investigate sanitizer-x86_64-linux-fast failure.	2021-05-22 01:40:43 +09:00
maekawatoshiki	cea7a3fe3d	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-21 23:57:39 +09:00
Alexey Bataev	8dab25954b	[SLP]Improve handling of compensate external uses cost. External insertelement users can be represented as a result of shuffle of the vectorized element and noconsecutive insertlements too. Added support for handling non-consecutive insertelements. Differential Revision: https://reviews.llvm.org/D101555	2021-05-21 07:45:31 -07:00
Djordje Todorovic	cd49b3ae1a	[DebugInfo] Salvage dbg.value() during ADCE This has been found by using the [0]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\ test-original-debug-info-preservation-in-optimizations Differential Revision: https://reviews.llvm.org/D100844	2021-05-21 05:25:59 -07:00
Daniil Fukalov	e8e88c3353	[TTI] NFC: Change getRegUsageForType to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102541	2021-05-21 15:17:23 +03:00
Stephen Tozer	36ec97f76a	3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reapplies `c0f3dfb9`, which was reverted following the discovery of crashes on linux kernel and chromium builds - these issues have since been fixed, allowing this patch to re-land. This reverts commit `4397b7095d`.	2021-05-21 11:06:20 +01:00
Djordje Todorovic	b9076d119a	Recommit: "[Debugify][Original DI] Test dbg var loc preservation"" [Debugify][Original DI] Test dbg var loc preservation This is an improvement of [0]. This adds checking of original llvm.dbg.values()/declares() instructions in optimizations. We have picked a real issue that has been found with this (actually, picked one variable location missing from [1] and resolved the issue), and the result is the fix for that -- D100844. Before applying the D100844, using the options from [0] (but with this patch applied) on the compilation of GDB 7.11, the final HTML report for the debug-info issues can be found at [1] (please scroll down, and look for "Summary of Variable Location Bugs"). After applying the D100844, the numbers has improved a bit -- please take a look into [2]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\ test-original-debug-info-preservation-in-optimizations [1] https://djolertrk.github.io/di-check-before-adce-fix/ [2] https://djolertrk.github.io/di-check-after-adce-fix/ Differential Revision: https://reviews.llvm.org/D100845 The Unit test was failing because the pass from the test that modifies the IR, in its runOnFunction() didn't return 'true', so the expensive-check configuration triggered an assertion.	2021-05-21 02:04:29 -07:00
Xiang1 Zhang	5684851cb0	[HWASAN] No code changed, Only clang-format for HWAddressSanitizer.cpp	2021-05-21 14:00:34 +08:00
Jon Roelofs	0af3105b64	Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths" This reverts commit `4bf69fb52b`. This broke spec2k6/403.gcc under -global-isel. Details to follow once I've reduced the problem.	2021-05-20 12:19:16 -07:00
Kevin P. Neal	f21f1eea05	[FPEnv] EarlyCSE support for constrained intrinsics, default FP environment edition EarlyCSE cannot distinguish between floating point instructions and constrained floating point intrinsics that are marked as running in the default FP environment. Said intrinsics are supposed to behave exactly the same as the regular FP instructions. Teach EarlyCSE to handle them in that case. Differential Revision: https://reviews.llvm.org/D99962	2021-05-20 14:40:51 -04:00
Reid Kleckner	8f20ac9595	[PGO] Don't reference functions unless value profiling is enabled This reduces the size of chrome.dll.pdb built with optimizations, coverage, and line table info from 4,690,210,816 to 2,181,128,192, which makes it possible to fit under the 4GB limit. This change can greatly reduce binary size in coverage builds, which do not need value profiling. IR PGO builds are unaffected. There is a minor behavior change for frontend PGO. PGO and coverage both use InstrProfiling to create profile data with counters. PGO records the address of each function in the __profd_ global. It is used later to map runtime function pointer values back to source-level function names. Coverage does not appear to use this information. Recording the address of every function with code coverage drastically increases code size. Consider this program: void foo(); void bar(); inline void inlineMe(int x) { if (x > 0) foo(); else bar(); } int getVal(); int main() { inlineMe(getVal()); } With code coverage, the InstrProfiling pass runs before inlining, and it captures the address of inlineMe in the __profd_ global. This greatly increases code size, because now the compiler can no longer delete trivial code. One downside to this approach is that users of frontend PGO must apply the -mllvm -enable-value-profiling flag globally in TUs that enable PGO. Otherwise, some inline virtual method addresses may not be recorded and will not be able to be promoted. My assumption is that this mllvm flag is not popular, and most frontend PGO users don't enable it. Differential Revision: https://reviews.llvm.org/D102818	2021-05-20 11:09:24 -07:00
Sanjay Patel	f34311c402	[GlobalOpt] recompute alignments for loads and stores of updated globals GlobalOpt can slice structs/arrays and change GEPs in the process, but it was not updating alignments for load/store users. This eventually causes the crashing seen in: https://llvm.org/PR49661 https://llvm.org/PR50253 On x86, this required SLP+codegen to create an aligned vector store on an invalid address. The bugs would be easier to demonstrate on a target with stricter alignment requirements. I'm not sure if this is a complete solution. The alignment updating code is adapted from InstCombine, so I assume that part is tested and good. Differential Revision: https://reviews.llvm.org/D102552	2021-05-20 12:12:21 -04:00
Alexey Bataev	182162b616	[SLP]Try to vectorize tiny trees with shuffled gathers of extractelements. If we gather extract elements and they actually are just shuffles, it might be profitable to vectorize them even if the tree is tiny. Differential Revision: https://reviews.llvm.org/D101460	2021-05-20 08:36:16 -07:00
Djordje Todorovic	0ae3c1d4d7	Revert "[Debugify][Original DI] Test dbg var loc preservation" This reverts commit `76f375f3d9`. This will be pushed again, after investigating a test failure: https://lab.llvm.org/buildbot/#/builders/16/builds/11254	2021-05-20 07:11:35 -07:00
Djordje Todorovic	76f375f3d9	[Debugify][Original DI] Test dbg var loc preservation This is an improvement of [0]. This adds checking of original llvm.dbg.values()/declares() instructions in optimizations. We have picked a real issue that has been found with this (actually, picked one variable location missing from [1] and resolved the issue), and the result is the fix for that -- D100844. Before applying the D100844, using the options from [0] (but with this patch applied) on the compilation of GDB 7.11, the final HTML report for the debug-info issues can be found at [1] (please scroll down, and look for "Summary of Variable Location Bugs"). After applying the D100844, the numbers has improved a bit -- please take a look into [2]. [0] https://llvm.org/docs/HowToUpdateDebugInfo.html\ [1] https://djolertrk.github.io/di-check-before-adce-fix/ [2] https://djolertrk.github.io/di-check-after-adce-fix/ Differential Revision: https://reviews.llvm.org/D100845	2021-05-20 06:42:02 -07:00
Xiang1 Zhang	02f2d739e0	Revert "[HWASAN] Update the tag info for X86_64." This reverts commit `81c18ce03c`.	2021-05-20 13:12:59 +08:00
Xiang1 Zhang	81c18ce03c	[HWASAN] Update the tag info for X86_64. In LAM model X86_64 will use bits 57-62 (of 0-63) as HWASAN tag. So here we make sure the tag shift position and tag mask is correct for x86-64. Differential Revision: https://reviews.llvm.org/D102472	2021-05-20 11:22:12 +08:00
Zhiwei Chen	dbc641deb9	[sanitizer] Reduce redzone size for small size global objects Currently 1 byte global object has a ridiculous 63 bytes redzone. This patch reduces the redzone size to be less than 32 if the size of global object is less than or equal to half of 32 (the minimal size of redzone). A 12 bytes object has a 20 bytes redzone, a 20 bytes object has a 44 bytes redzone. Reviewed By: MaskRay, #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D102469	2021-05-19 19:18:50 -07:00
Jon Roelofs	3d2ffc88e6	Fix warnings in windows bots. NFC	2021-05-19 17:42:34 -07:00
Jon Roelofs	4bf69fb52b	[Remarks] Add analysis remarks for memset/memcpy/memmove lengths Differential revision: https://reviews.llvm.org/D102452	2021-05-19 15:09:18 -07:00
wlei	6539a80bc9	[CSSPGO] Avoid deleting probe instruction in FoldValueComparisonIntoPredecessors This change tries to fix a place missing `moveAndDanglePseudoProbes `. In FoldValueComparisonIntoPredecessors, it folds the BB into predecessors and then marked the BB unreachable. However, the original logic from the BB is still alive, deleting the probe will mislead the SampleLoader mark it as zero count sample. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D102721	2021-05-19 13:39:05 -07:00
Joseph Huber	2db182ff8d	[Diagnostics] Allow emitting analysis and missed remarks on functions Summary: Currently, only `OptimizationRemarks` can be emitted using a Function. Add constructors to allow this for `OptimizationRemarksAnalysis` and `OptimizationRemarkMissed` as well. Reviewed By: jdoerfert thegameg Differential Revision: https://reviews.llvm.org/D102784	2021-05-19 15:10:20 -04:00
Roman Lebedev	40fb4eeff9	[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Roman Lebedev	c60ca9856c	[NFCI][Local] MergeBlockIntoPredecessor(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Roman Lebedev	b0bb2149b3	[NFCI][Local] removeUnreachableBlocks(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Philip Reames	449d14ebd2	Do actual DCE in LoopUnroll (try 4) Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change. Commit message from try 3: Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) Original commit message: The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-19 10:25:31 -07:00
Hongtao Yu	4ca6e37b98	[CSSPGO] Overwrite branch weight annotated in previous pass. Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes: 1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication. 2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count. 3. Overwrite branch weight for basic blocks. With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink. It's not expected to work well for non-CS case with a less-accurate post-inline count quality. It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D102537	2021-05-19 09:12:24 -07:00
Amy Huang	517857421d	Revert "Do actual DCE in LoopUnroll (try 3)" This reverts commit `b6320eeb86` as it causes clang to assert; see https://reviews.llvm.org/rGb6320eeb8622f05e4a5d4c7f5420523357490fca.	2021-05-19 08:53:38 -07:00
David Sherwood	7e95a563c8	Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst In InnerLoopVectorizer::setDebugLocFromInst we were previously asserting that the VF is not scalable. This is because we want to use the number of elements to create a duplication factor for the debug profiling data. However, for scalable vectors we only know the minimum number of elements. I've simply removed the assert for now and added a FIXME saying that we assume vscale is always 1. When vscale is not 1 it just means that the profiling data isn't as accurate, but shouldn't cause any functional problems.	2021-05-19 13:33:10 +01:00
Roman Lebedev	8c2b535d6c	[NFCI][SimplifyCFG] removeEmptyCleanup(): use DeleteDeadBlock() This required some changes to, instead of eagerly making PHI's in the UnwindDest valid as-if the BB is already not a predecessor, to be valid while BB is still a predecessor.	2021-05-19 14:08:25 +03:00
Roman Lebedev	bb5d613aba	[NFCI][SimplifyCFG] removeEmptyCleanup(): streamline PHI node updating	2021-05-19 14:08:25 +03:00
Roman Lebedev	a0be081646	[NFC][SimplifyCFG] removeEmptyCleanup(): use BasicBlock::phis()	2021-05-19 14:08:24 +03:00
Sander de Smalen	4f86aa650c	[LV] Add -scalable-vectorization=<option> flag. This patch adds a new option to the LoopVectorizer to control how scalable vectors can be used. Initially, this suggests three levels to control scalable vectorization, although other more aggressive options can be added in the future. The possible options are: - Disabled: Disables vectorization with scalable vectors. - Enabled: Vectorize loops using scalable vectors or fixed-width vectors, but favors fixed-width vectors when the cost is a tie. - Preferred: Like 'Enabled', but favoring scalable vectors when the cost-model is inconclusive. Reviewed By: paulwalker-arm, vkmr Differential Revision: https://reviews.llvm.org/D101945	2021-05-19 10:40:56 +01:00
Roman Lebedev	57d20cbf46	[NFCI][SimplifyCFG] simplifyUnreachable(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	69a43e5fc5	[NFCI][SimplifyCFG] simplifyReturn(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	00f90e3fca	[NFCI][SimplifyCFG] simplifySingleResume(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	a4eb24c688	[NFCI][SimplifyCFG] simplifyCommonResume(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	729e18cbf4	[NFCI] SimplifyCFGPass: mergeEmptyReturnBlocks(): use DeleteDeadBlocks() In this case, it does the same thing as the original pattern does. SimplifyCFG has a few lurking miscompilations about deleting blocks that have their address taken, and consistently using DeleteDeadBlocks() instead of a hand-rolled pattern will allow to weed those cases out easierly.	2021-05-19 11:32:24 +03:00
Joseph Huber	68abc3d264	[Attributor] Change AAExecutionDomain to only accept intrinsics Summary: The OpenMP runtime functions don't always provide unique thread ID's to determine if a basic block is truly single-threaded. Change the implementation to only check NVPTX intrinsics for now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102700	2021-05-18 21:19:26 -04:00
Rong Xu	886629a8c9	[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This patch implements first part of Flow Sensitive SampleFDO (FSAFDO). It has the following changes: (1) disable current discriminator encoding scheme, (2) new hierarchical discriminator for FSAFDO. For this patch, option "-enable-fs-discriminator=true" turns on the new functionality. Option "-enable-fs-discriminator=false" (the default) keeps the current SampleFDO behavior. When the fs-discriminator is enabled, we insert a flag variable, namely, llvm_fs_discriminator, to the object. This symbol will checked by create_llvm_prof tool, and used to generate a profile with FS-AFDO discriminators enabled. If this happens, for an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. Differential Revision: https://reviews.llvm.org/D102246	2021-05-18 16:23:43 -07:00
Arthur Eubanks	b86302e500	[MSan] Set zeroext on call arguments to msan functions with zeroext parameter attribute ABI attributes need to match between the caller and callee. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D102667	2021-05-18 14:07:39 -07:00
Arthur Eubanks	6b9524a05b	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Nikita Popov	e81334a754	[LICM] Remove MaybePromotable set (PR50367) The MaybePromotable set keeps track of loads/stores for which promotion was not attempted yet. Normally, any load/stores that are promoted in the current iteration will be removed from this set, because they naturally MustAlias with the promoted value. However, if the source program has UB with metadata claiming that a store is NoAlias, while it is actually MustAlias, and multiple different pointers are promoted in the same iteration, it can happen that a store is removed that is still in the MaybePromotable set, causing a use-after-free. While this could be fixed by explicitly invalidating values in MaybePromotable in the LoopPromoter, I'm going with the more radical option of dropping the set entirely here and check all load/stores on each promotion iteration. As promotion, and especially repeated promotion, are quite rare, this doesn't seem to have any impact on compile-time. Fixes https://bugs.llvm.org/show_bug.cgi?id=50367.	2021-05-18 20:26:01 +02:00
Sanjay Patel	6d949a9c8f	[InstCombine] restrict funnel shift match to avoid miscompile As noted in the post-commit discussion for: https://reviews.llvm.org/rGabd7529625a73f405e40a63dcc446c41d51a219e ...that change exposed a logic hole that allows a miscompile if the shift amount could exceed the narrow width: https://alive2.llvm.org/ce/z/-i_CiM https://alive2.llvm.org/ce/z/NaYz28 The restriction isn't necessary for a rotate (same operand for both shifts), so we should adjust the matching for the shift value as a follow-up enhancement: https://alive2.llvm.org/ce/z/ahuuQb	2021-05-18 13:32:07 -04:00
Florian Hahn	cc1a6361d3	[VPlan] Add VPUserID to distinguish between recipes and others. This allows cast/dyn_cast'ing from VPUser to recipes. This is needed because there are VPUsers that are not recipes. Reviewed By: gilr, a.elovikov Differential Revision: https://reviews.llvm.org/D100257	2021-05-18 09:17:28 +01:00
Sander de Smalen	81fdc73e5d	[LV] Return both fixed and scalable Max VF from computeMaxVF. This patch introduces a new class, MaxVFCandidates, that holds the maximum vectorization factors that have been computed for both scalable and fixed-width vectors. This patch is intended to be NFC for fixed-width vectors, although considering a scalable max VF (which is disabled by default) pessimises tail-loop elimination, since it can no longer determine if any chosen VF (less than fixed/scalable MaxVFs) is guaranteed to handle all vector iterations if the trip-count is known. This issue will be addressed in a future patch. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D98721	2021-05-18 08:03:48 +01:00
Adam Nemet	ab1f6ffa56	[GVN] Improve analysis for missed optimization remark This change tries to handle multiple dominating users of the pointer operand by choosing the most immediately dominating one, if possible. While making this change I also found that the previous implementation had a missing break statement, making all loads with an odd number of dominating users emit an OtherAccess value, so that has also been fixed. Patch by Henrik G Olsson! Differential Revision: https://reviews.llvm.org/D79097	2021-05-17 21:51:15 -07:00
Philip Reames	ed9d70781b	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)" This reverts commit `6d3e3ae8a9`. Still seeing PPC build bot failures, and one arm self host bot failing. I'm officially stumped, and need help from a bot owner to reduce.	2021-05-17 20:53:28 -07:00
Serguei Katkov	7bed58d28f	[Inliner] Copy attributes when deoptimize intrinsic is inlined During inlining of call-site with deoptimize intrinsic callee we miss attributes set on this call site. As a result attributes like deopt-lowering are disappeared resulting in inefficient behavior of register allocator in codegen. Just copy attributes for deoptimize call like we do for others calls. Reviewers: reames, apilipenko Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D102602	2021-05-18 10:08:37 +07:00
Adam Nemet	fcffd087c6	[Matrix] Fold the transpose into the matmul operand used to fetch scalars For column-major this is: A * B^t whereas for row-major: A^t * B Differential Revision: https://reviews.llvm.org/D101762	2021-05-17 17:40:46 -07:00
Philip Reames	6d3e3ae8a9	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:59:25 -07:00
Philip Reames	d16da7343d	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `c23ce54b36`. I apparently missed some newly added non-x86 tests.	2021-05-17 16:49:32 -07:00
Philip Reames	c23ce54b36	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:33:56 -07:00
Philip Reames	b6320eeb86	Do actual DCE in LoopUnroll (try 3) Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-17 14:47:02 -07:00
Sanjay Patel	3cdd05e519	[InstCombine] fold fnegs around select This is one of the folds requested in: https://llvm.org/PR39480 https://alive2.llvm.org/ce/z/NczU3V Note - this uses the normal FMF propagation logic (flags transfer from the final value to new/intermediate ops). It's not clear if this matches what Alive2 implements, so we may want to adjust one or the other.	2021-05-17 14:53:49 -04:00
Roman Lebedev	0633d5ce7b	[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition. I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests, so in principle i'm fine with landing this without review, but just in case.. This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(unsigned val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one, since that is what i need: ``` int countActiveBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` I've followed in footstep of 'left-shift until bittest' idiom (D91038), in the sense that iff the `ctlz` intrinsic is cheap, we'll transform, regardless of all other factors. This can have a shocking effect on certain benchmarks: ``` raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm 2021-05-09T01:06:05+03:00 Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench Run on (32 X 3600.24 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 5.26, 6.29, 3.49 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322 p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319 p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0 2021-05-09T01:06:24+03:00 Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Run on (32 X 3599.95 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 4.05, 5.95, 3.43 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786 p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195 p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0 ``` Reviewed By: craig.topper, zhuhan0 Differential Revision: https://reviews.llvm.org/D102116	2021-05-17 20:33:33 +03:00
Hongtao Yu	f28ee1a2b3	[CSSPGO] Update pseudo probe distribution factor based on inline context. With prelink inlining, pseudo probes with same ID can come from different inline contexts. Such probes should not share samples and their factors should be fixed up separately. I'm seeing 0.3% speedup for SPEC2017 overall. Benchmark 631.deepsjeng_s benefits the most, about 4%. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D102429	2021-05-16 23:11:36 -07:00
Philip Reames	6ae9893ed2	Revert "Do actual DCE in LoopUnroll (try 2)" This reverts commit `653fa0b46a`. Reported to trigger pr50354. Reverting until investigated.	2021-05-16 09:38:36 -07:00
Kuter Dinel	64ef29bc66	[Attributor] Call site specific AAValueSimplification and AAIsDead. This patch makes it possible to do call site specific deductions for AAValueSimplification and AAIsDead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84722	2021-05-15 21:39:07 +00:00
Simon Pilgrim	e30540a603	SampleProfileLoader::inlineHotFunctionsWithPriority - Fix uninitialized variable warning. NFCI. findIndirectCallFunctionSamples will leave Sum uninitialized if it returns an empty vector, we don't really use Sum in this case (but we do make a copy that isn't used either) - so ensure we initialize the value to zero to at least silence the static analysis warning.	2021-05-15 15:02:52 +01:00
Nikita Popov	f9e9b0cdb4	[CFG] Move reachable from entry checks into basic block variant These checks are not specific to the instruction based variant of isPotentiallyReachable(), they are equally valid for the basic block based variant. Move them there, to make sure that switching between the instruction and basic block variants cannot introduce regressions.	2021-05-15 15:42:02 +02:00
Simon Pilgrim	f0660a977e	[Local] collectBitParts - bail out if we find more than one root input value. All the uses that we have for collectBitParts revolve around us matching down to an operation with a single root value - I don't think we're intending to change that (and a lot of collectBitParts assumes it). The binops cases (OR/FSHL/FSHR) already check if the providers are the same, but that would still mean we waste time collecting through unaryops before getting to them.	2021-05-15 13:58:42 +01:00
Simon Pilgrim	401d6685c0	[InstCombine] InstCombinerImpl::visitOr - enable bitreverse matching Currently we only match bswap intrinsics from or(shl(),lshr()) style patterns when we could often match bitreverse intrinsics almost as cheaply. Differential Revision: https://reviews.llvm.org/D90170	2021-05-15 13:39:09 +01:00
Simon Pilgrim	28aa7d378a	[Local] collectBitParts - early-out from binops. NFCI. Minor speedup by not bothering to attempt to collect the second operand's bit parts if we already know its failed in the first operand.	2021-05-15 13:04:10 +01:00
Nikita Popov	fb9ed1979a	[IR] Add BasicBlock::isEntryBlock() (NFC) This is a recurring and somewhat awkward pattern. Add a helper method for it.	2021-05-15 12:41:58 +02:00
Vitaly Buka	6ce7b2f026	Fix "is not used" warning	2021-05-14 20:58:58 -07:00
Philip Reames	fcd12fed41	Extract a helper routine to simplify D91481 [NFC]	2021-05-14 18:40:23 -07:00
Nick Desaulniers	8c72749bd9	[LowerConstantIntrinsics] reuse isManifestLogic from ConstantFolding GlobalVariables are Constants, yet should not unconditionally be considered true for __builtin_constant_p. Via the LangRef https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic: This intrinsic generates no code. If its argument is known to be a manifest compile-time constant value, then the intrinsic will be converted to a constant true value. Otherwise, it will be converted to a constant false value. In particular, note that if the argument is a constant expression which refers to a global (the address of which _is_ a constant, but not manifest during the compile), then the intrinsic evaluates to false. Move isManifestConstant from ConstantFolding to be a method of Constant so that we can reuse the same logic in LowerConstantIntrinsics. pr/41459 Reviewed By: rsmith, george.burgess.iv Differential Revision: https://reviews.llvm.org/D102367	2021-05-14 15:35:21 -07:00
wlei	e475d4d69f	[CSSPGO] Fix return value of getProbeWeight Currently we didn't support multiple return type, we work around to use error_code to represent: 1) The dangling probe. 2) Ignore the weight of non-probe instruction While merging the instructions' weight for the whole BB, it will filter out the error code. But If all instructions of the BB give error_code, the outside logic will mark it as a BB requiring the inference algorithm to infer its weight. This is different from the zero value which will be treated as a cold block. Fix one place that if we can't find the FunctionSamples in the profile data which indicates the BB is cold, we choose to return zero. Also refine the comments. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D102007	2021-05-14 14:06:09 -07:00
Sanjay Patel	e82db87fb1	[InstCombine] drop poison flags when simplifying 'shl' based on demanded bits As with other transforms in demanded bits, we must be careful not to wrongly propagate nsw/nuw if we are reducing values leading up to the shift. This bug was introduced with `1b24f35f84` and leads to the miscompile shown in: https://llvm.org/PR50341	2021-05-14 13:54:13 -04:00
Philip Reames	653fa0b46a	Do actual DCE in LoopUnroll (try 2) Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-14 10:42:36 -07:00
Philip Reames	e488bf815f	Revert "Do actual DCE in LoopUnroll" This reverts commit `9d1a61e695`. I'd missed some review feedback, and had missed updating an aarch64 test. Reverting while I fix both.	2021-05-14 10:15:30 -07:00
Philip Reames	9d1a61e695	Do actual DCE in LoopUnroll LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-14 10:05:25 -07:00
Philip Reames	3f1c218318	[rs4gc] Strip memory related attributes consistently I noticed that rs4gc is not stripping a number of memory aliasing related attributes. We do strip some from call sites, but don't strip the same ones from declarations or parameters. Why do we need to strip these? Two answers: Safepoints conceptually read and write to the entire garbage collected heap in the physical model. We need this to preserve ordering of all loads and stores with respect to possible relocation. We can infer other attributes from these. For instance, readnone can imply both nofree and nosync. Both of which don't hold after physical rewriting. Note: This exposed a latent issue which was fixed a couple weeks back in `01801d5274`. Differential Revision: https://reviews.llvm.org/D99802	2021-05-14 07:54:56 -07:00
Djordje Todorovic	01c90bbd4f	[Transforms][Debugify] Fix "Missing line" false alarm on PHI nodes This is a fix for https://bugs.llvm.org/show_bug.cgi?id=49959 The "Missing line" false alarm was introduced in D75242. Patch by Yilong Guo<yilong.guo@intel.com> Differential Revision: https://reviews.llvm.org/D100446	2021-05-14 14:06:13 +02:00
Sander de Smalen	f82966d19a	[LoopVectorizationLegality] NFC: Mark some interfaces as 'const' This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired and getSymbolicStrides as 'const'.	2021-05-14 11:53:54 +01:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
Simon Pilgrim	079bbea2b2	[Local] collectBitParts - for bswap-only matches, limit shift amounts to whole bytes to reduce compile time.	2021-05-14 11:42:52 +01:00
Simon Pilgrim	78c8451cd7	[Local] collectBitParts - reduce maximum recursion depth. As noticed on D90170, the recursion depth for matching a maximum of a i128 bitwidth was too high. @lebedev.ri mentioned that we can probably do better by limiting the number of collected Values instead of just depth, but I'll look at that later.	2021-05-14 11:42:51 +01:00
Anton Afanasyev	207cdd7ed9	[SLP] Fix spill cost computation for insertelement tree node This is follow up for D98714, bugfixing.	2021-05-14 13:14:41 +03:00
Sander de Smalen	459c48e04f	NFCI: Remove VF argument from isScalarWithPredication As discussed in D102437, the VF argument to isScalarWithPredication seems redundant, so this is intended to be a non-functional change. It seems wrong to query the widening decision at this point. Removing the operand and code to get the widening decision causes no unit/regression tests to fail. I've also found no issues running the LLVM test-suite. This subsequently removes the VF argument from isPredicatedInst as well, since it is no longer required.	2021-05-14 10:34:40 +01:00
dfukalov	fdae3fc8b3	[GVN] Clobber partially aliased loads. Use offsets stored in `AliasResult` implemented in D98718. Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95543	2021-05-14 11:17:14 +03:00
David Green	f7cb654763	[DSE] Move isOverwrite into DSEState. NFC This moves the isOverwrite function into the DSEState so that it can share the analyses and members from the state. A few extra loop tests were also added to test stores in and around multi block loops for D100464.	2021-05-14 09:16:51 +01:00
Joseph Huber	8b57ed09bd	[OpenMP] Prevent Attributor from deleting functions in OpenMPOptCGSCC pass Summary: This patch prevents the Attributor instances made in the CGSCC pass from deleting functions. This prevents the attributor from changing the call graph while OpenMPOpt is working with it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102363	2021-05-13 16:35:23 -04:00
cynecx	8ec9fd4839	Support unwinding from inline assembly I've taken the following steps to add unwinding support from inline assembly: 1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax: ``` invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"() to label %exit unwind label %uexit ``` 2.) Add Bitcode writing/reading support + LLVM-IR parsing. 3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled. 4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind. 5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled. 6.) Don't allow unwinding callbr. Reviewed By: Amanieu Differential Revision: https://reviews.llvm.org/D95745	2021-05-13 19:13:03 +01:00
Florian Hahn	bdada7546e	[VPlan] Adjust assert in splitBlock to allow splitting at end. SplitAt should only be dereferenced in the assert if it does not point to the end of the block. This fixes a crash in the added test case.	2021-05-13 13:36:35 +01:00
Jingu Kang	107d19eb01	Revert "[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch" This reverts commit `88b259c014`. It needs to fix below bugs. https://bugs.llvm.org/show_bug.cgi?id=50279 https://bugs.llvm.org/show_bug.cgi?id=50302	2021-05-13 08:40:49 +01:00
Chuanqi Xu	c1359ef07e	[Coroutines] Salvege Debug.values Summary: The previous implementation of coro-split didn't collect values used by dbg instructions into the spills which made a log debug info unavailable with optimization on. This patch tries to collect these uses which are used by dbg.values. In this way, the debugbility of coroutine could be as powerful as normal functions with optimization on. To avoid enlarging the coroutine frame, this patch only collects `dbg.value` whose value is already in the coroutine frame. This decision may make some debug info getting unavailable. But if we are with optimization on, the performance issue should be considered first. And this patch would make the debugbility of coroutine to be better only without changing the layout of the frame. Test-plan: check-llvm Reviewed By: aprantl, lxfind Differential Revision: https://reviews.llvm.org/D97673	2021-05-13 13:06:33 +08:00
Chuanqi Xu	6e5b8f489a	[Coroutines] Enable printing coroutine frame when dbg info is available Summary: This patch tries to build debug info for coroutine frame in the middle end. Although the coroutine frame is constructed and maintained by the compiler and the programmer shouldn't care about the coroutine frame by the design of C++20 coroutine, a lot of programmers told me that they want to see the layout of the coroutine frame strongly. Although C++ is designed as an abstract layer so that the programmers shouldn't care about the actual memory in bits, many experienced C++ programmers are familiar with assembler and debugger to see the memory layout in fact, After I was been told they want to see the coroutine frame about 3 times, I think it is an actual and desired demand. However, the debug information is constructed in the front end and coroutine frame is constructed in the middle end. This is a natural and clear gap. So I could only try to construct the debug information in the middle end after coroutine frame constructed. It is unusual, but we are in consensus that the approch is the best one. One hard part is we need construct the name for variables since there isn't a map from llvm variables to DIVar. Then here is the strategy this patch uses: - The name `__resume_fn `, `__destroy_fn` and `__coro_index ` are constructed by the patch. - Then the name `__promise` comes from the dbg.variable of corresponding dbg.declare of PromiseAlloca, which shows highest priority to construct the debug information for the member of coroutine frame. - Then if the member is struct, we would try to get the name of the llvm struct directly. Then replace ':' and '.' with '_' to make it printable for debugger. - If the member is a basic type like integer or double, we would try to emit the corresponding name. - Then if the member is a Pointer Type, we would add `Ptr` after corresponding pointee type. - Otherwise, we would name it with 'UnknownType'. Reviewered by: lxfind, aprantl, rjmcall, dblaikie Differential Revision: https://reviews.llvm.org/D99179	2021-05-13 12:43:08 +08:00
Anton Afanasyev	ab2c499d3a	[SLP] Add insertelement instructions to vectorizable tree Add new type of tree node for `InsertElementInst` chain forming vector. These instructions could be either removed, or replaced by shuffles during vectorization and we can add this node to cost model, so naturally estimating their cost, getting rid of `CompensateCost` tricks and reducing further work for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this patch is the first step towards revectorization of partially vectorization (to fix PR42022 completely). After adding inserts to tree the next step is to add vector instructions there (for instance, to merge `store <2 x float>` and `store <2 x float>` to `store <4 x float>`). Fixes PR40522 and PR35732. Differential Revision: https://reviews.llvm.org/D98714	2021-05-13 07:41:45 +03:00
Justin Bogner	e7d26aceca	Change the context instruction for computeKnownBits in LoadStoreVectorizer pass This change enables cases for which the index value for the first load/store instruction in a pair could be a function argument. This allows using llvm.assume to provide known bits information in such cases. Patch by Viacheslav Nikolaev. Thanks! Differential Revision: https://reviews.llvm.org/D101680	2021-05-12 15:29:29 -07:00
Nikita Popov	a8f7dee1df	[InstCombine] Support one-hot merge for logical and/or If a logical and/or is used, we need to be careful not to propagate a potential poison value from the RHS by inserting a freeze instruction. Otherwise it works the same way as bitwise and/or. This is intended to address the regression reported at https://reviews.llvm.org/D101191#2751002. Differential Revision: https://reviews.llvm.org/D102279	2021-05-12 21:01:18 +02:00
Stelios Ioannou	1124ad2f5d	[LoopFlatten] Simplify loops so that the pass can operate on unsimplified loops. The loop flattening pass requires loops to be in simplified form. If the loops are not in simplified form, the pass cannot operate. This patch simplifies all loops before flattening. As a result, all loops will be simplified regardless of whether anything ends up being flattened. This change was inspired by observing a certain loop that was not flatten because the loops were not in simplified form. This loop is added as a test to verify that it is now flattened. Differential Revision: https://reviews.llvm.org/D102249 Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30	2021-05-12 19:22:01 +01:00
Roman Lebedev	554b1bced3	[InstCombine] ~(C + X) --> ~C - X (PR50308) We can not rely on (C+X)-->(X+C) already happening, because we might not have visited that `add` yet. The added testcase would get stuck in an endless combine loop.	2021-05-12 16:10:55 +03:00
David Sherwood	b7a11274f9	[LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors In InnerLoopVectorizer::widenPHIInstruction there are cases where we have to scalarise a pointer induction variable after vectorisation. For scalable vectors we already deal with the case where the pointer induction variable is uniform, but we currently crash if not uniform. For fixed width vectors we calculate every lane of the scalarised pointer induction variable for a given VF, however this cannot work for scalable vectors. In this case I have added support for caching the whole vector value for each unrolled part so that we can always extract an arbitrary element. Additionally, we still continue to cache the known minimum number of lanes too in order to improve code quality by avoiding an extractelement operation. I have adapted an existing test `pointer_iv_mixed` from the file: Transforms/LoopVectorize/consecutive-ptr-uniforms.ll and added it here for scalable vectors instead: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D101294	2021-05-12 11:02:11 +01:00
Qiu Chaofan	6d2df18163	[VectorComine] Restrict single-element-store index to inbounds constant Vector single element update optimization is landed in `2db4979`. But the scope needs restriction. This patch restricts the index to inbounds and vector must be fixed sized. In future, we may use value tracking to relax constant restrictions. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D102146	2021-05-12 13:18:20 +08:00
Congzhe Cao	3f8be15f29	[LoopInterchange] Handle lcssa PHIs with multiple predecessors This is a bugfix in the transformation phase. If the original outer loop header branches to both the inner loop (header) and the outer loop latch, and if there is an lcssa PHI node outside the loop nest, then after interchange the new outer latch will have an lcssa PHI node inserted which has two predecessors, i.e., the original outer header and the original outer latch. Currently the transformation assumes it has only one predecessor (the original outer latch) and crashes, since the inserted lcssa PHI node does not take both predecessors as incoming BBs. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D100792	2021-05-11 21:30:54 -04:00
Jordan Rupprecht	fec2945998	Revert "[GVN] Clobber partially aliased loads." This reverts commit `6c57044231`. It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543: ``` $ cat repro.ll ; ModuleID = 'repro.ll' source_filename = "repro.ll" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %struct.widget = type { i32 } %struct.baz = type { i32, %struct.snork } %struct.snork = type { %struct.spam } %struct.spam = type { i32, i32 } @global = external local_unnamed_addr global %struct.widget, align 4 @global.1 = external local_unnamed_addr global i8, align 1 @global.2 = external local_unnamed_addr global i32, align 4 define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 { bb: %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1 %tmp1 = bitcast %struct.snork* %tmp to i64* %tmp2 = load i64, i64* %tmp1, align 4 %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1 %tmp4 = icmp ugt i64 %tmp2, 4294967295 br label %bb5 bb5: ; preds = %bb14, %bb %tmp6 = load i32, i32* %tmp3, align 4 %tmp7 = icmp ne i32 %tmp6, 0 %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false %tmp9 = zext i1 %tmp8 to i8 store i8 %tmp9, i8* @global.1, align 1 %tmp10 = load i32, i32* @global.2, align 4 switch i32 %tmp10, label %bb11 [ i32 1, label %bb12 i32 2, label %bb12 ] bb11: ; preds = %bb5 br label %bb14 bb12: ; preds = %bb5, %bb5 %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4 br label %bb14 bb14: ; preds = %bb12, %bb11 br label %bb5 } $ opt -O2 repro.ll -disable-output opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst , unsigned int, llvm::Type , llvm::Instruction , const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output ... ```	2021-05-11 16:08:53 -07:00
Congzhe Cao	40e3aa39bd	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 18:36:53 -04:00
Congzhe Cao	d3f89d4d16	Revert "[LoopInterchange] Fix legality for triangular loops" This reverts commit `29342291d2`. The test case requires an assert build. Will add REQUIRES and re-commit.	2021-05-11 18:10:58 -04:00
Nikita Popov	1556540372	[InstCombine] Clean up one-hot merge optimization (NFC) Remove the requirement that the instruction is a BinaryOperator, make the predicate check more compact and use slightly more meaningful naming for the and operands.	2021-05-11 23:22:11 +02:00
Fangrui Song	129f466e22	[GlobalOpt] Remove heap SROA GlobalOpt implements a heap SROA (SROA for an malloc allocatated struct or array of structs) which is largely undertested (heap-sra-[1234].ll are basically the same test with very little difference) and does not trigger at all when bootstrapping clang (it only supports the case of one single store). The heap SROA implementation causes PR50027 (GEP is not properly handled; crash or miscompile). Just drop the implementation. I have deleted some obviously duplicated tests but kept `heap-sra-[12]{,-no-nullopt}.ll`. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102257	2021-05-11 11:34:37 -07:00
Eli Friedman	61cbbba7a6	[ArgumentPromotion] Fix byval alignment handling. Make sure the alignment of the generated operations matches the alignment of the byval argument. Previously, we were just ignoring alignment and getting lucky. While I'm here, also delete the unnecessary "tail" handling. Passing a pointer to a byval argument to a "tail" call is UB, so rewriting to an alloca doesn't require any special handling. Differential Revision: https://reviews.llvm.org/D89819	2021-05-11 11:22:18 -07:00
Congzhe Cao	29342291d2	[LoopInterchange] Fix legality for triangular loops This is a bug fix in legality check. When we encounter triangular loops such as the following form: for (int i = 0; i < m; i++) for (int j = 0; j < i; j++), or for (int i = 0; i < m; i++) for (int j = 0; j*i < n; j++), we should not perform interchange since the number of executions of the loop body will be different before and after interchange, resulting in incorrect results. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D101305	2021-05-11 11:00:46 -04:00
Florian Hahn	faebc6bf10	[VPlan] Register recipe for instr if the simplified value is recipe. If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled in general may change soon, following some post-commit comments. This fixes PR50298.	2021-05-11 14:32:34 +01:00
Sanjay Patel	49950cb1f6	[SLP] restrict matching of load combine candidates The test example from https://llvm.org/PR50256 (and reduced here) shows that we can match a load combine candidate even when there are no "or" instructions. We can avoid that by confirming that we do see an "or". This doesn't apply when matching an or-reduction because that match begins from the operands of the reduction. Differential Revision: https://reviews.llvm.org/D102074	2021-05-11 08:46:40 -04:00
Sanjay Patel	5577e86691	[InstCombine] fold extract subvector of bitcast insertelt This is visible in the original example from: https://llvm.org/PR50055 (but this change doesn't solve the bug) https://alive2.llvm.org/ce/z/vM_Yq-	2021-05-10 17:20:10 -04:00
Nikita Popov	463ea28e96	[InstCombine] Fold comparison of integers by parts Let's say you represent (i32, i32) as an i64 from which the parts are extracted with lshr/trunc. Then, if you compare two tuples by parts you get something like A[0] == B[0] && A[1] == B[1], just that the part extraction happens by lshr/trunc and not a narrow load or similar. The fold implemented here reduces such equality comparisons by converting them into a comparison on a larger part of the integer (which might be the whole integer). It handles both the "and of eq" and the conjugated "or of ne" case. I'm being conservative with one-use for now, though this could be relaxed if profitable (the base pattern converts 11 instructions into 5 instructions, but there's quite a few variations on how it can play out). Differential Revision: https://reviews.llvm.org/D101232	2021-05-10 22:22:39 +02:00
Nikita Popov	aa9b02ac75	[Inliner] Fix noalias metadata handling for instructions simplified during cloning (PR50270) Instead of using VMap, which may include instructions from the caller as a result of simplification, iterate over the (FirstNewBlock, Caller->end()) range, which will only include new instructions. Fixes https://bugs.llvm.org/show_bug.cgi?id=50270. Differential Revision: https://reviews.llvm.org/D102110	2021-05-10 21:59:59 +02:00
Sanjay Patel	88d8f10baf	[PassManager] add helper function to hold set of vector passes (2nd try) This is better no-functional-change-intended than the 1st attempt. As noted in D102002, there were at least 2 diffs that went unchecked in pass manager regressions tests: different pass parameters (SimplifyCFG) and an extension point/callback. Those should be lifted from the original code blocks correctly now.	2021-05-10 14:43:00 -04:00
Sanjay Patel	822be4bec8	Revert "[PassManager] add helper function to hold set of vector passes" This reverts commit `fefcb1f878`. It was supposed to be NFC, but as noted in the post-commit comments in D102002, that was not true: SimplifyCFG uses different parameters and there's a difference in an extension point / callback.	2021-05-10 10:59:30 -04:00
Alexey Bataev	30463bc3f1	[SLP]Do not count perfect diamond matches for gathers several times. Need to remove the old code for avoiding double counting of the gather nodes with perfect diamond matches within the tree after we started detecting perfect/shuffled matching in the previous patch D100495. We may skip the cost for such nodes completely. Differential Revision: https://reviews.llvm.org/D102023	2021-05-10 07:08:07 -07:00
Teresa Johnson	220f6e5271	[SimplifyCFG] Ignore ephemeral values when counting insts for threading Ignore ephemeral values (only feeding llvm.assume intrinsics) when computing the instruction count to decide if a block is small enough for threading. This is similar to the handling of these values in the InlineCost computation. These instructions will eventually be removed and shouldn't count against code size (similar to the existing ignoring of phis). Without this change, when enabling -fwhole-program-vtables, which causes type test / assume sequences to be inserted by clang, we can get different threading decisions. In particular, when building with instrumentation FDO it can affect the optimizations decisions before FDO matching, leading to some mismatches. Differential Revision: https://reviews.llvm.org/D101494	2021-05-09 19:06:54 -07:00
Roman Lebedev	1acd9a1a29	Revert "[LICM] Hoist loads with invariant.group metadata" This appears to miscompile google benchmark's GetCacheSizesFromKVFS() when compiling with -fstrict-vtable-pointers. Runnable reproducer: https://godbolt.org/z/f9ovKqTzb The "f.fail()" crashes with BUS error, it is compiled into testb, and the adress it is testing is non-sensical. This reverts commit `4c89bcadf6`.	2021-05-08 15:44:49 +03:00
Qiu Chaofan	2db4979c0f	[VectorCombine] Simplify to scalar store if only one element updated This patch simplifies load-insertelt-store pattern into getelementptr-store. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98240	2021-05-08 18:14:51 +08:00
Arthur Eubanks	34a8a437bf	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
Adrian Prantl	c6ddf669dc	Fix the module-enabled build by removing a redundant type definition.	2021-05-07 14:45:17 -07:00
Florian Hahn	75b9997760	[LV] Remove reference of PHI from comment, they are not recorded (NFC). The comment incorrectly states that the PHI is recorded. That's not accurate, only the recipe for the incoming value is recorded. Suggested post-commit for `4ba8720f88`.	2021-05-07 21:34:23 +01:00
Florian Hahn	337d765282	[LV] Assert if trying to sink replicate region into another region (NFC) Currently sinking a replicate region into another replicate region is not supported. Add an assert, to make the problem more obvious, should it occur. Discussed post-commit for `ccebf7a109`.	2021-05-07 21:25:35 +01:00
Florian Hahn	01c26d4e04	[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC). Adjust the name to make it clearer this is the region containing the target recipe, similar to SinkRegion below. Suggested post-commit for `ccebf7a109`.	2021-05-07 21:25:35 +01:00
Fangrui Song	d8aba75a76	Internalize some cl::opt global variables or move them under namespace llvm	2021-05-07 11:15:43 -07:00
Caroline Concatto	cf06c8eee3	[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction The function fixReduction used to assert/crash for scalable vector when a vector reduce could be done with a smaller vector. This patch removes this assertion as it is safe to use scalable vector for vector reduce and truncate. Differential Revision: https://reviews.llvm.org/D101260	2021-05-07 09:37:37 +01:00
Sanjay Patel	fefcb1f878	[PassManager] add helper function to hold set of vector passes This is no-functional-change-intended (NFC) and split off from D102002 (which proposes to eliminate the LTO-based differences).	2021-05-06 15:36:15 -04:00
Simon Pilgrim	338c1b701f	[SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	2dab059021	[SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	1b47489fd0	[SLP] Use empty() instead of size() == 0. NFCI.	2021-05-06 16:20:18 +01:00
David Green	4979c90458	[LV] Account for tripcount when calculation vectorization profitability The loop vectorizer will currently assume a large trip count when calculating which of several vectorization factors are more profitable. That is often not a terrible assumption to make as small trip count loops will usually have been fully unrolled. There are cases however where we will try to vectorize them, and especially when folding the tail by masking can incorrectly choose to vectorize loops that are not beneficial, due to the folded tail rounding the iteration count up for the vectorized loop. The motivating example here has a trip count of 5, so either performs 5 scalar iterations or 2 vector iterations (with VF=4). At a high enough trip count the vectorization becomes profitable, but the rounding up to 2 vector iterations vs only 5 scalar makes it unprofitable. This adds an alternative cost calculation when we know the max trip count and are folding tail by masking, rounding the iteration count up to the correct number for the vector width. We still do not account for anything like setup cost or the mixture of vector and scalar loops, but this is at least an improvement in a few cases that we have had reported. Differential Revision: https://reviews.llvm.org/D101726	2021-05-06 12:36:46 +01:00
Kerry McLaughlin	8c9742bd23	[SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences Adds support for scalable vectorization of loops containing first-order recurrences, e.g: ``` for(int i = 0; i < n; i++) b[i] = a[i] + a[i - 1] ``` This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into account when inserting into and extracting from the last lane of a vector. CreateVectorSplice has been added to construct a vector for the recurrence, which returns a splice intrinsic for scalable types. For fixed-width the behaviour remains unchanged as CreateVectorSplice will return a shufflevector instead. The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D101076	2021-05-06 11:35:39 +01:00
Juneyoung Lee	8a156d1c27	[InstCombine] Fully disable select to and/or i1 folding This is a patch that disables the poison-unsafe select -> and/or i1 folding. It has been blocking D72396 and also has been the source of a few miscompilations described in llvm.org/pr49688 . D99674 conditionally blocked this folding and successfully fixed the latter one. The former one was still blocked, and this patch addresses it. Note that a few test functions that has `_logical` suffix are now deoptimized. These are created by @nikic to check the impact of disabling this optimization by copying existing original functions and replacing and/or with select. I can see that most of these are poison-unsafe; they can be revived by introducing freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll) because I think they are good targets for freeze fix. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101191	2021-05-06 09:29:52 +09:00
Coplin, Jared	6251b2f7f6	Attach metadata to simplified masked loads and stores	2021-05-05 18:01:49 -05:00
Roman Lebedev	8048005739	[NFC][SimplifyCFG] Update documentation comments for SinkCommonCodeFromPredecessors() after `1886aad`	2021-05-05 20:34:59 +03:00
Philip Reames	80e8025083	[LV] Workaround PR49900 (a crash due to analyzing partially mutated IR) LoopVectorize has a fairly deeply baked in design problem where it will try to query analysis (primarily SCEV, but also ValueTracking) in the midst of mutating IR. In particular, the intermediate IR state does not represent the semantics of the original (or final) program. Fixing this for real is hard, but all of the cases seen so far share a common symptom. In cases seen to date, the analysis being queried is the computation of the original loop's trip count. We can fix this particular instance of the issue by simply computing the trip count early, and caching it. I want to be really clear that this is nothing but a workaround. It does nothing to fix the root issue, and at best, delays the time until we have to fix this for real. Florian and I have discussed an eventual solution in the review comments for https://reviews.llvm.org/D100663, but it's a lot of work. Test taken from https://reviews.llvm.org/D100663. Differential Revision: https://reviews.llvm.org/D101487	2021-05-05 09:56:28 -07:00
Sanjay Patel	0034197874	[InstCombine] improve readability; NFC	2021-05-05 11:05:47 -04:00
Juneyoung Lee	1fef5c88a6	[InstCombine] Fold more select of selects using isImpliedCondition This is a simple folding that does these: ``` select x_inv, true, (select y, x, false) => select x_inv, true, y ``` https://alive2.llvm.org/ce/z/-STJ2d ``` select (select y, x, false), true, x_inv => select y, true, x_inv ``` https://alive2.llvm.org/ce/z/6ruYt6 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101807	2021-05-05 13:44:58 +09:00
Han Zhu	da1cdffbb1	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-05-04 17:05:04 -07:00
Florian Hahn	ccebf7a109	[VPlan] Properly handle sinking of replicate regions. This patch updates the code that sinks recipes required for first-order recurrences to properly handle replicate-regions. At the moment, the code would just move the replicate recipe out of its replicate-region, producing an invalid VPlan. When sinking a recipe in a replicate-region, we have to sink the whole region. To do that, we first need to split the block at the target recipe and move the region in between. This patch also adds a splitAt helper to VPBasicBlock to split a VPBasicBlock at a given iterator. Fixes PR50009. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100751	2021-05-04 22:36:01 +01:00
Xun Li	def86413d4	[Coroutines] Do not add alloca to the frame if the size is 0 This patch is to address https://bugs.llvm.org/show_bug.cgi?id=49916. When the size of an alloca is 0, it will trigger an assertion in OptimizedStructLayout when being added to the frame. Fix it by not adding it at all. We return index 0 (beginning of the frame) for all 0-sized allocas. Differential Revision: https://reviews.llvm.org/D101841	2021-05-04 12:55:40 -07:00
Nikita Popov	e20897726f	[SimplifyCFG] Create logical or in SimplifyCondBranchToCondBranch() We need to use a logical or instead of a bitwise or to preserve poison behavior. Poison from the second condition should not propagate if the first condition is true. We were already handling this correctly in FoldBranchToCommonDest(), but not in this fold. (There are still other folds with this issue.)	2021-05-04 19:51:30 +02:00
Nikita Popov	44fd4575b3	[SimplifyCFG] Extract helper for creating logical op (NFC)	2021-05-04 19:51:30 +02:00
Sanjay Patel	a6f79b5671	[InstCombine] avoid infinite loops with select/icmp transforms This fixes https://llvm.org/PR48900 , but as seen in the regression tests prevents some optimizations. There are a few options to restore those (switch to min/max intrinsics, add larger pattern matching for select with dominating condition, improve CVP), but we need to prevent the bug 1st.	2021-05-04 11:54:06 -04:00
Florian Hahn	4ba8720f88	[VPlan] Representing backedge def-use feeding reduction phis. This patch updates the code handling reduction recipes to also keep track of the incoming value from the latch in the recipe. This is needed to model the def-use chains completely in VPlan, so that it is possible to replace the incoming value with an arbitrary VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99294	2021-05-04 16:33:22 +01:00
Sander de Smalen	9931ae645e	Reland "[LV] Calculate max feasible scalable VF." Relands https://reviews.llvm.org/D98509 This reverts commit `51d648c119`.	2021-05-04 15:44:41 +01:00
Simon Pilgrim	b04148f777	Local.cpp - Avoid DebugLoc copies - use const reference from getDebugLoc. NFCI.	2021-05-04 14:31:50 +01:00
Simon Pilgrim	2bb41851a1	[Utils] recognizeBSwapOrBitReverseIdiom - support matching from funnel shift roots (PR40058) We were missing bitreverse matches in cases where InstCombine had seen a byte-level rotation at the end of a bitreverse sequence (replacing or() with fshl()), hindering the exhaustive bitreverse matching in CodeGenPrepare later on.	2021-05-04 13:46:45 +01:00
Alexey Bataev	369cd2ae52	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `fd18547e07`. Need to add a check for the size of the vectorization tree to avoid some extra vectorization.	2021-05-04 04:53:22 -07:00
Dávid Bolvanský	80b897e21b	[InstCombine] ctpop(X) ^ ctpop(Y) & 1 --> ctpop(X^Y) & 1 (PR50094) Original pattern: (__builtin_parity(x) ^ __builtin_parity(y)) LLVM rewrites it as: (__builtin_popcount(x) ^ __builtin_popcount(y)) & 1 Optimized form: __builtin_popcount(X^Y) & 1 Alive proof: https://alive2.llvm.org/ce/z/-GdWFr Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D101802	2021-05-04 13:16:18 +02:00
Philip Reames	11326cbcdb	[IndVarSimplify][NFC] Removed mayThrow from if-condition in predicateLoopExits of IndVarSimplify Instruction has mayHaveSideEffects method that returns true if mayThrow return true because this is called internally in the first method. As such, the call being removed is redundant. Patch By: vdsered (Daniil Seredkin) Differential Revision: https://reviews.llvm.org/D101685	2021-05-03 18:25:07 -07:00
Juneyoung Lee	24ce194cfe	[InstCombine] generalize select + select/and/or folding using implied conditions This patch optimizes the remaining possible cases in D101191 by generalizing isImpliedCondition()-based foldings. Assume that there is `op a, (select b, _, _)` where op is one of `and i1`, `or i1` or their select forms. We can do the following optimization based on the result of `isImpliedCondition(a, b)`: If a = true implies… - b = true: - select a, (select b, A, B), false => select a, A, false : https://alive2.llvm.org/ce/z/WCnZYh - and a, (select b, A, B) => select a, A, false : https://alive2.llvm.org/ce/z/uZhcMG - b = false: - select a, (select b, A, B), false => select a, B, false : https://alive2.llvm.org/ce/z/c2hJpV - and a, (select b, A, B) => select a, B, false : https://alive2.llvm.org/ce/z/5ggwMM If a = false implies… - b = true: - select a, true, (select b, A, B) => select a, true, A : https://alive2.llvm.org/ce/z/tidKvH - or a, (select b, A, B) => select a, true, A : https://alive2.llvm.org/ce/z/cC-uyb - b = false: - select a, true, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/ZXpJq9 - or a, (select b, A, B) => select a, true, B : https://alive2.llvm.org/ce/z/hnDrJj Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101720	2021-05-04 09:42:06 +09:00
Arthur Eubanks	d14d84af2f	[NewPM] Only invalidate modified functions' analyses in CGSCC passes Previously, any change in any function in an SCC would cause all analyses for all functions in the SCC to be invalidated. With this change, we now manually invalidate analyses for functions we modify, then let the pass manager know that all function analyses should be preserved. So far this only touches the inliner, argpromotion, funcattrs, and updateCGAndAnalysisManager(), since they are the most used. Slight compile time improvements: http://llvm-compile-time-tracker.com/compare.php?from=326da4adcb8def2abdd530299d87ce951c0edec9&to=8942c7669f330082ef159f3c6c57c3c28484f4be&stat=instructions Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D100917	2021-05-03 17:21:44 -07:00
Joseph Huber	182831258b	[Attributor] Add AAExecutionDomainInfo interface to OpenMPOpt Summary: Add the AAExecutionDomainInfo attributor instance to OpenMPOpt. This will infer information relating to domain information that an instruction might be expecting in. Right now this only includes a very crude check for instructions that will be executed by the master thread by comparing a thread-id function with a constant zero. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101578	2021-05-03 19:24:19 -04:00
Dávid Bolvanský	08c08577f9	[InstCombine] cttz(sext(x)) -> cttz(zext(x)) ``` ---------------------------------------- define i32 @src(i16 %x, i1 %b) { %0: %z = sext i16 %x to i32 %p = cttz i32 %z, %b ret i32 %p } => define i32 @tgt(i16 %x, i1 %b) { %0: %z = zext i16 %x to i32 %p = cttz i32 %z, %b ret i32 %p } Transformation seems to be correct! ``` https://alive2.llvm.org/ce/z/evomeg Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101764	2021-05-03 23:59:30 +02:00
Teresa Johnson	ea817d79be	[SimplifyCFG] Look for control flow changes instead of side effects. When passingValueIsAlwaysUndefined scans for an instruction between an inst with a null or undef argument and its first use, it was checking for instructions that may have side effects, which is a superset of the instructions it intended to find (as per the comments, control flow changing instructions that would prevent reaching the uses). Switch to using isGuaranteedToTransferExecutionToSuccessor() instead. Without this change, when enabling -fwhole-program-vtables, which causes assumes to be inserted by clang, we can get different simplification decisions. In particular, when building with instrumentation FDO it can affect the optimizations decisions before FDO matching, leading to some mismatches. I had to modify d83507-knowledge-retention-bug.ll since this fix enables more aggressive optimization of that code such that it no longer tested the original bug it was meant to test. I removed the undef which still provokes the original failure (confirmed by temporarily reverting the fix) and also changed it to just invoke the passes of interest to narrow the testing. Similarly I needed to adjust code for UnreachableEliminate.ll to avoid an undef which was causing the function body to get optimized away with this fix. Differential Revision: https://reviews.llvm.org/D101507	2021-05-03 13:32:22 -07:00
Alexey Bataev	fd18547e07	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 08:06:20 -07:00
Dávid Bolvanský	27b651ca47	[InstCombine] cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true' (PR50172) Zext doesn't change the number of trailing zeros, so narrow cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true'. Proofs: https://alive2.llvm.org/ce/z/o2dnjY Solves https://bugs.llvm.org/show_bug.cgi?id=50172 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101582	2021-05-03 17:05:12 +02:00
Alexey Bataev	2e4cc9a725	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `b5f64768cf` to fix a compiler crash revealed by buildbots.	2021-05-03 07:20:00 -07:00
Alexey Bataev	b5f64768cf	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 06:45:42 -07:00
Florian Hahn	2b7fa7f744	[LV] Iterate over recipes in VPlan to fix PHI (NFC). As we gradually move more elements of LV to VPlan, we are trying to reduce the number of places that still has to check IR of the original loop. This patch adjusts the code to fix cross iteration phis to get the PHIs to fix directly from the VPlan that is executed. We still need the original PHI to check for first-order recurrences, but we can get rid of that once we model that explicitly in VPlan as well. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99293	2021-05-03 14:09:46 +01:00
Sanjay Patel	1b24f35f84	[InstCombine] improve demanded bits analysis of left-shifted operand If we don't demand high bits, then we also don't care about those high bits of a left-shift operand regardless of shift amount. I noticed the sext/trunc pattern in a motivating example. It seems like there should be a low-bits with right-shift sibling, but I haven't looked at that yet. https://alive2.llvm.org/ce/z/JuS6jc https://rise4fun.com/Alive/Trm (not sure how to use 'width' with Alive1) https://alive2.llvm.org/ce/z/gRadbF Differential Revision: https://reviews.llvm.org/D101489	2021-05-03 08:39:20 -04:00
Reshabh Sharma	9f51f1b927	[ASAN][AMDGPU] Add support for accesses to global and constant addrspaces Add address sanitizer instrumentation support for accesses to global and constant address spaces in AMDGPU. It strictly avoids instrumenting the stack and assumes x86 as the host. Reviewed by: vitalybuka Differential Revision: https://reviews.llvm.org/D99071	2021-05-03 09:01:15 +05:30
Florian Hahn	942e068d7a	[VPlan] Add VPBasicBlock::phis() helper (NFC). This patch introduces a helper to obtain an iterator range for the PHI-like recipes in a block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100101	2021-05-02 19:20:13 +01:00
Juneyoung Lee	39eb2665d9	[InstCombine] Add a few more patterns for folding select of select This is a patch that folds select of select to salvage some optimizations after select -> and/or folding is disabled. ``` select (select a, true, b), c, false -> select a, c, false select c, (select a, true, b), false -> select c, a, false if c implies that b is false (isImpliedCondition). ``` https://alive2.llvm.org/ce/z/ANatjt, https://alive2.llvm.org/ce/z/rv8zTB ``` sel (sel c, a, false), true, (sel !c, b, false) -> sel c, a, b sel (sel !c, a, false), true, (sel c, b, false) -> sel c, b, a ``` https://alive2.llvm.org/ce/z/U2kp-t, https://alive2.llvm.org/ce/z/bc88EE See D101191 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101375	2021-05-02 19:00:42 +09:00
Juneyoung Lee	1977c53b2a	[InstCombine] Fold overflow bit of [u\|s]mul.with.overflow in a poison-safe way As discussed in D101191, this patch adds a poison-safe folding of overflow bit check: ``` %Op0 = icmp ne i4 %X, 0 %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y) %Op1 = extractvalue { i4, i1 } %Agg, 1 %ret = select i1 %Op0, i1 %Op1, i1 false => %Y.fr = freeze %Y %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr) %Op1 = extractvalue { i4, i1 } %Agg, 1 %ret = %Op1 ``` https://alive2.llvm.org/ce/z/zgPUGT https://alive2.llvm.org/ce/z/h2gZ_6 Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`. In this case, LLVM is already good because `%ret` is already successfully folded into `and`, triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K Differential Revision: https://reviews.llvm.org/D101423	2021-05-02 11:54:12 +09:00
Nikita Popov	ffa5a402a9	[IndVars] Remove redundant loop invariance check (NFC) This is checked again directly below this condition.	2021-05-01 17:22:00 +02:00
Nathan Chancellor	4397b7095d	Revert "Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" This reverts commit `791930d740`, as per https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy. I observed breakage with the Linux kernel, as reported at https://reviews.llvm.org/D91722#2724321 Fixes exist at https://reviews.llvm.org/D101523 https://reviews.llvm.org/D101540 but they have not landed so to unbreak the tree for the weekend, revert this commit. Commit `b11e4c9907` ("Revert "[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands"") only reverted one follow-up fix, not the original patch that broke the kernel. e	2021-04-30 20:23:21 -07:00
George Balatsouras	a45fd436ae	[dfsan] Fix origin tracking for fast8 The problem is the following. With fast8, we broke an important invariant when loading shadows. A wide shadow of 64 bits used to correspond to 4 application bytes with fast16; so, generating a single load was okay since those 4 application bytes would share a single origin. Now, using fast8, a wide shadow of 64 bits corresponds to 8 application bytes that should be backed by 2 origins (but we kept generating just one). Let’s say our wide shadow is 64-bit and consists of the following: 0xABCDEFGH. To check if we need the second origin value, we could do the following (on the 64-bit wide shadow) case: - bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000) - push the result along with the first origin load to the shadow/origin vectors - load the second 32-bit origin of the 64-bit wide shadow - push the wide shadow along with the second origin to the shadow/origin vectors. The combineOrigins would then select the second origin if the wide shadow is of the form 0xABCDE0000. The tests illustrate how this change affects the generated bitcode. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D101584	2021-04-30 15:57:33 -07:00
Justin Bogner	9542721085	Add support for llvm.assume intrinsic to the LoadStoreVectorizer pass Patch by Viacheslav Nikolaev. Thanks!	2021-04-30 13:39:46 -07:00
Alexey Bataev	a3fd82c289	[SLP]Fix the crash on cost calculation if non-compatible vectors shuffled. If the extracts from the non-power-2 vectors are recognized as shuffles, need some extra checks to not crash cost calculations if trying to gext the ecost for subvector extracts. In this case need to check carefully that we do not exit out of bounds of the original vector, otherwise the TTI's cost model will crash on assert. Differential Revision: https://reviews.llvm.org/D101477	2021-04-30 09:34:20 -07:00
Jingu Kang	88b259c014	[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch Differential Revision: https://reviews.llvm.org/D99354	2021-04-30 15:55:56 +01:00
Evgeniy Brevnov	7861cb600c	[NARY] Don't optimize min/max if there are side uses (part2) Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D101359	2021-04-30 19:02:02 +07:00
Florian Hahn	ed9df5bd2f	[Passes] Run sinking/hoisting in SimplifyCFG earlier. Hoisting and sinking instructions out of conditional blocks enables additional vectorization by: 1. Executing memory accesses unconditionally. 2. Reducing the number of instructions that need predication. After disabling early hoisting / sinking, we miss out on a few vectorization opportunities. One of those is causing a ~10% performance regression in one of the Geekbench benchmarks on AArch64. This patch tires to recover the regression by running hoisting/sinking as part of a SimplifyCFG run after LoopRotate and before LoopVectorize. Note that in the legacy pass-manager, we run LoopRotate just before vectorization again and there's no SimplifyCFG run in between, so the sinking/hoisting may impact the later run on LoopRotate. But the impact should be limited and the benefit of hosting/sinking at this stage should outweigh the risk of not rotating. Compile-time impact looks slightly positive for most cases. http://llvm-compile-time-tracker.com/compare.php?from=2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions NewPM-O3: geomean -0.19% NewPM-ReleaseThinLTO: geoman -0.54% NewPM-ReleaseLTO-g: geomean -0.03% With a few benchmarks seeing a notable increase, but also some improvements. Alternative to D101290. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101468	2021-04-30 12:23:57 +01:00
Alexey Bataev	12c51f2358	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 12:48:00 -07:00
Alexey Bataev	6e859f3cd4	Revert "[COST] Improve shuffle kind detection if shuffle mask is provided." This reverts commit `9239932221` to fix a compiler crash on mask checks.	2021-04-29 12:40:33 -07:00
Sanjay Patel	0f8b6686ac	[InstCombine] narrow popcount with zext operand https://llvm.org/PR50141	2021-04-29 15:07:16 -04:00
Roman Lebedev	cc63203908	[SimplifyCFG] Common code sinking: fix application of profitability check The profitability check is: we don't want to create more than a single PHI per instruction sunk. We need to create the PHI unless we'll sink all of it's would-be incoming values. But there is a caveat there. This profitability check doesn't converge on the first iteration! If we first decide that we want to sink 10 instructions, but then determine that 5'th one is unprofitable to sink, that may result in us not sinking some instructions that resulted in determining that some other instruction we've determined to be profitable to sink becoming unprofitable. So we need to iterate until we converge, as in determine that all leftover instructions are profitable to sink. But, the direct approach of just re-iterating seems dumb, because in the worst case we'd find that the last instruction is unprofitable, which would result in revisiting instructions many many times. Instead, i think we can get away with just two passes - forward and backward. However then it isn't obvious what is the most performant way to update InstructionsToSink.	2021-04-29 21:11:40 +03:00
Alexey Bataev	9239932221	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 09:42:56 -07:00
Sander de Smalen	51d648c119	Revert "[LV] Calculate max feasible scalable VF." Temporarily reverting this patch due to some unexpected issue found by one of the PPC buildbots. This reverts commit `584e9b6e4b`.	2021-04-29 16:04:37 +01:00
Florian Hahn	a0e1313c23	[VPlan] Add getVPSingleValue helper. As suggested in D99294, this adds a getVPSingleValue helper to use for recipes that are guaranteed to define a single value. This replaces uses of getVPValue() which used to default to I = 0.	2021-04-29 13:37:38 +01:00
Reshabh Sharma	60c60dd138	[ASAN] NFC: Use addrspace cast for pointers in non-zero addrspace Pointers in non-zero address spaces need to be address space casted before appending to the used list. Reviewed by: vitalybuka Differential Revision: https://reviews.llvm.org/D101363	2021-04-29 11:06:00 +05:30
Reshabh Sharma	fc1df36e6e	[ASAN] NFC: Copy address space when creating globals with redzones This patch makes sure that globals in supported address spaces will be replaced by globals with red zones in the same address space by copying the address space. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D101362	2021-04-29 10:21:43 +05:30
Amanieu d'Antras	ad9ce8142d	[ConstantMerge] Don't merge thread_local constants with non-thread_local constants Fixes PR49932 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D100322	2021-04-28 23:44:20 +01:00
Roman Lebedev	707ad01399	[SimplifyCFG] Common code sinking: fixup variable name As noticed in post-commit review. I've gone through several iterations of that name, and somehow managed to end up with an incorrect one.	2021-04-29 01:24:16 +03:00
Dávid Bolvanský	e20b32ff3b	[BuildLibCalls] Remove inaccessiblememonly inference for calloc Solves regression mentioned in PR50143. As noted in D101440, proper modelling for calloc would require new attribute inaccessible_or_returned_memonly.	2021-04-29 00:17:37 +02:00
Roman Lebedev	1886aad9d0	[SimplifyCFG] Common code sinking: relax restriction on non-uncond predecessors While we have a known profitability issue for sinking in presence of non-unconditional predecessors, there isn't any known issues for having multiple such non-unconditional predecessors, so said restriction appears to be artificial. Lift it.	2021-04-29 01:01:01 +03:00
Roman Lebedev	a8e273f2ed	[NFC][SimplifyCFG] Add test showing that profitability check for sinking is broken Essentially, we can't promise that the instruction is sinkable without introducing PHI's until we know that it is profitable to sink.	2021-04-29 01:01:00 +03:00
Roman Lebedev	12c8027ce3	[NFC][SimplifyCFG] Common code sinking: check profitability once We can just eagerly pre-check all the instructions that we could sink that we'd actually want to sink them, clamping the number of instructions that we'll sink to stop just before the first unprofitable one.	2021-04-29 01:01:00 +03:00
Roman Lebedev	4c27ca21d9	[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): reword comment about PR30244	2021-04-29 01:01:00 +03:00
Bardia Mahjour	ddb3b26a12	[LV] Consider Loop Unroll Hints When Making Interleave Decisions This patch causes the loop vectorizer to not interleave loops that have nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)). Note that if a particular interleave count is being requested (through llvm.loop.interleave_count), it will still be honoured, regardless of the presence of nounroll hints. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D101374	2021-04-28 17:27:52 -04:00
Sanjay Patel	abd7529625	[InstCombine] relax masking requirement for truncated funnel/rotate match I was investigating a seemingly unrelated improvement in demanded bits for shift-left, but that caused regressions on these tests because we were able to look through/eliminate the mask. https://alive2.llvm.org/ce/z/Ztdr22 define i8 @src(i32 %x, i32 %y, i32 %shift) { %and = and i32 %shift, 3 %conv = and i32 %x, 255 %shr = lshr i32 %conv, %and %sub = sub i32 8, %and %shl = shl i32 %y, %sub %or = or i32 %shr, %shl %conv2 = trunc i32 %or to i8 ret i8 %conv2 } define i8 @tgt(i32 %x, i32 %y, i32 %shift) { %x8 = trunc i32 %x to i8 %y8 = trunc i32 %y to i8 %shift8 = trunc i32 %shift to i8 %and = and i8 %shift8, 3 %conv2 = call i8 @llvm.fshr.i8(i8 %y8, i8 %x8, i8 %and) ret i8 %conv2 } declare i8 @llvm.fshr.i8(i8,i8,i8)	2021-04-28 16:49:50 -04:00
Roman Lebedev	d16d820c2e	[SimplifyCFG] Try 2: sink all-indirect indirect calls Note that we don't want to turn a partially-direct call into an indirect one, that will break ICP amongst other things.	2021-04-28 19:08:54 +03:00
Roman Lebedev	262c679d32	Revert "[SimplifyCFG] Sinking indirect calls - they're already indirect anyways" Seems to break indirect call promotion, LTO/Resolution/X86/load-sample-prof-icp.ll fails. This reverts commit `e57cf128b3`.	2021-04-28 17:46:59 +03:00
Roman Lebedev	e57cf128b3	[SimplifyCFG] Sinking indirect calls - they're already indirect anyways	2021-04-28 17:36:23 +03:00
Dawid Jurczak	5f5974aeac	[SimplifyLibCalls] Transform printf("%s", str) --> puts(str)/noop Before this change LLVM cannot simplify printf in following cases: printf("%s", "") --> noop printf("%s", str"\n") --> puts(str) From the other hand GCC can perform such transformations for many years: https://godbolt.org/z/7nnqbedfe Differential Revision: https://reviews.llvm.org/D100724	2021-04-28 10:29:07 -04:00
David Sherwood	00e65f3345	[LoopVectorize][SVE] Fix crash when vectorising FP negation This patch fixes a crash encountered when vectorising the following loop: void foo(float dst, float src, long long n) { for (long long i = 0; i < n; i++) dst[i] = -src[i]; } using scalable vectors. I've added a test to Transforms/LoopVectorize/AArch64/sve-basic-vec.ll as well as cleaned up the other tests in the same file. Differential Revision: https://reviews.llvm.org/D98054	2021-04-28 15:22:35 +01:00
Tres Popp	f0e848e63d	Silence unused variable warning	2021-04-28 15:46:09 +02:00
Alexey Bataev	8af4723c58	[SLP]Try to vectorize tiny trees with shuffled gathers. If the first tree element is vectorize and the second is gather, it still might be profitable to vectorize it if the gather node contains less scalars to vectorize than the original tree node. It might be profitable to use shuffles. Differential Revision: https://reviews.llvm.org/D101397	2021-04-28 06:35:31 -07:00
David Sherwood	6998f8ae2d	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-28 13:41:07 +01:00
Sander de Smalen	584e9b6e4b	[LV] Calculate max feasible scalable VF. This patch also refactors the way the feasible max VF is calculated, although this is NFC for fixed-width vectors. After this change scalable VF hints are no longer truncated/clamped to a shorter scalable VF, nor does it drop the 'scalable flag' from the suggested VF to vectorize with a similar VF that is fixed. Instead, the hint is ignored which means the vectorizer is free to find a more suitable VF, using the CostModel to determine the best possible VF. Reviewed By: c-rhodes, fhahn Differential Revision: https://reviews.llvm.org/D98509	2021-04-28 12:30:00 +01:00
Tres Popp	efce19c3b0	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `75d6b8bb40`. The reasoning is mentioned in https://reviews.llvm.org/D97667	2021-04-28 13:16:34 +02:00
Kerry McLaughlin	9cc217ab36	[LoopVectorize] Prevent multiple Phis being generated with in-order reductions When using the -enable-strict-reductions flag where UF>1 we generate multiple Phi nodes, though only one of these is used as an input to the vector.reduce.fadd intrinsics. The unused Phi nodes are removed later by instcombine. This patch changes widenPHIInstruction/fixReduction to only generate one Phi, and adds an additional test for unrolling to strict-fadd.ll Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D100570	2021-04-28 11:29:01 +01:00
Benjamin Kramer	7e5682ee62	[ADT] Make TrackingStatistic's ctor constexpr This lets clang diagnose unused statistics, so remove them.	2021-04-28 12:00:17 +02:00
Dávid Bolvanský	e81819377e	[DSE] Eliminate zero memset after calloc Solves PR11896 As noted, this can be improved futher (calloc -> malloc) in some cases. But for know, this is the first step. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101391	2021-04-28 03:30:52 +02:00
Han Zhu	75d6b8bb40	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-04-27 17:37:51 -07:00
Han Zhu	cd13f19031	[loop-idiom][NFC] Extract processLoopStoreOfLoopLoad into a helper function Differential Revision: https://reviews.llvm.org/D100979	2021-04-27 13:42:30 -07:00
Sanjay Patel	025bb52903	[InstCombine] fold clamp to 2 values from min/max intrinsics The "select" versions of these folds is also missing and can cause infinite loops as shown in: https://llvm.org/PR48900 ...but it seems easier to match these as max/min as a first fix. https://alive2.llvm.org/ce/z/wv-_dT	2021-04-27 15:35:49 -04:00
David Sherwood	6968520c3b	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `4afeda9157`.	2021-04-27 15:46:03 +01:00
David Sherwood	4afeda9157	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-27 15:26:15 +01:00
Alexey Bataev	24590d8d67	[SLP]Improved isGatherShuffledEntry, NFC. Reworked isGatherShuffledEntry function, simplified and moved common code to the lambda (it shall go away when non-power-2 patch will be landed).	2021-04-27 05:59:46 -07:00
Florian Hahn	cb96d802d4	[LV] Hoist code to get vector loop latch (NFC). Address suggestion from D99294.	2021-04-27 13:30:17 +01:00
Sanjay Patel	e808289fe6	[IndVars] avoid crash in LFTR when assuming an add recurrence The test is a crasher reduced from: https://llvm.org/PR49993 linearFunctionTestReplace() assumes that we have an add recurrence, so check for that as a condition of matching a loop counter. Differential Revision: https://reviews.llvm.org/D101291	2021-04-27 08:26:02 -04:00
Florian Hahn	160e729cf0	[VPlan] Use recursive traversal iterator in VPSlotTracker. This patch simplifies VPSlotTracker by using the recursive traversal iterator to traverse all blocks in a VPlan in reverse post-order when numbering VPValues in a plan. This depends on a fix to RPOT (D100169). It also extends the traversal unit tests to check RPOT. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100176	2021-04-27 12:39:06 +01:00
Vitaly Buka	f2a585e6d3	[NFC] Fix "not used" warning	2021-04-26 22:09:23 -07:00
Arthur Eubanks	fd1ff5ee03	[Inliner] Make ModuleInlinerWrapperPass return PreservedAnalyses::all() The ModulePassManager should already have taken care of all analysis invalidation. Without this change, upcoming changes will cause more invalidation than necessary. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D101320	2021-04-26 17:22:35 -07:00
William S. Moses	7aa3cad46a	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 20:12:12 -04:00
Hongtao Yu	30bb5be389	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 2. As a follow-up to D95982, this patch continues unblocking optimizations that are blocked by pseudu probe instrumention. The optimizations unblocked are: - In-block load propagation. - In-block dead store elimination - Memory copy optimization that turns stores to consecutive memories into a memset. These optimizations are local to a block, so they shouldn't affect the profile quality. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D100075	2021-04-26 16:52:33 -07:00
Fangrui Song	18839be9c5	[ADT] Remove StatisticBase and make NoopStatistic empty In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic` but has 3 mostly unused pointers. GlobalOpt considers that the pointers can potentially retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage 2 clang. This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The clang size is even smaller than applying D69428 (slightly smaller in both .bss and .text). ``` # This means the D69428 optimization on clang is mostly nullified by this patch. HEAD+D69428: size(.bss) = 0x0725a8 HEAD+D101211: size(.bss) = 0x072238 # bloaty - HEAD+D69428 vs HEAD+D101211 # With D101211, we also save a lot of string table space (.rodata). FILE SIZE VM SIZE -------------- -------------- -0.0% -32 -0.0% -24 .eh_frame -0.0% -336 [ = ] 0 .symtab -0.0% -360 [ = ] 0 .strtab [ = ] 0 -0.2% -880 .bss -0.0% -2.11Ki -0.0% -2.11Ki .rodata -0.0% -2.89Ki -0.0% -2.89Ki .text -0.0% -5.71Ki -0.0% -5.88Ki TOTAL ``` Note: LoopFuse is a disabled pass. For now this patch adds `#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful in LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with `llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings. Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which calls `getName`/`getDesc`/`getValue`. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D101211	2021-04-26 16:47:32 -07:00
William S. Moses	8ede96493c	Revert "[NVPTX] Enable lowering of atomics on local memory" This reverts commit `fede99d386`.	2021-04-26 19:33:01 -04:00
William S. Moses	fede99d386	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 19:27:27 -04:00
Lei Zhang	254e289d45	Revert "[ADT] Remove StatisticBase and make NoopStatistic empty" This reverts commit `b540311781` because it breaks MLIR build: https://buildkite.com/mlir/mlir-core/builds/13299#ad0f8901-dfa4-43cf-81b8-7940e2c6c15b	2021-04-26 18:31:04 -04:00
Michael Kruse	b99466eb45	[SimplifyCFG] Preserve metadata when unconditionalizing branches (same target). When replacing a conditional branch by an unconditional one because the targets are identical, transfer the metadata to the new branch instruction. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101226	2021-04-26 17:23:01 -05:00
Fangrui Song	b540311781	[ADT] Remove StatisticBase and make NoopStatistic empty In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic` but has 3 unused pointers. GlobalOpt considers that the pointers can potentially retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage 2 clang. This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The clang size is even smaller than applying D69428 (slightly smaller in both .bss and .text). ``` # This means the D69428 optimization on clang is mostly nullified by this patch. HEAD+D69428: size(.bss) = 0x0725a8 HEAD+D101211: size(.bss) = 0x072238 # bloaty - HEAD+D69428 vs HEAD+D101211 # With D101211, we also save a lot of string table space (.rodata). FILE SIZE VM SIZE -------------- -------------- -0.0% -32 -0.0% -24 .eh_frame -0.0% -336 [ = ] 0 .symtab -0.0% -360 [ = ] 0 .strtab [ = ] 0 -0.2% -880 .bss -0.0% -2.11Ki -0.0% -2.11Ki .rodata -0.0% -2.89Ki -0.0% -2.89Ki .text -0.0% -5.71Ki -0.0% -5.88Ki TOTAL ``` Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful and not noisy, we can replace `llvm::Statistic` with `llvm::TrackingStatistic` in the future. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D101211	2021-04-26 13:39:35 -07:00
Fangrui Song	614de225c9	[gcov] Set nounwind and respect module flags metadata "frame-pointer" & "uwtable" for synthesized functions This applies the D100251 mechanism to the gcov instrumentation pass. With this patch, `-fno-omit-frame-pointer` in `clang -fprofile-arcs -O1 -fno-omit-frame-pointer` will be respected for synthesized `__llvm_gcov_writeout,__llvm_gcov_reset,__llvm_gcov_init` functions: the frame pointer will be kept (note: on many targets -O1 eliminates the frame pointer by default). `clang -fno-exceptions -fno-asynchronous-unwind-tables -g -fprofile-arcs` will produce .debug_frame instead of .eh_frame. Fix: https://github.com/ClangBuiltLinux/linux/issues/955 Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D101129	2021-04-26 13:30:21 -07:00
Michael Kruse	153144be40	[SimplifyCFG] Preserve metadata when unconditionalizing branches (constant condition). When replacing a conditional branch by an unconditional one because the condition is a constant, transfer the metadata to the new branch instruction. Part of fix for llvm.org/PR50060 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101141	2021-04-26 10:57:31 -05:00
Dávid Bolvanský	691badc3d6	[InstCombine] C - ctpop(a) - > ctpop(~a)) if C is bitwidth (PR50104) Proof: https://alive2.llvm.org/ce/z/mncA9K Solves https://bugs.llvm.org/show_bug.cgi?id=50104 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D101257	2021-04-26 15:40:54 +02:00
Yuanbo Li	cc7803ee3f	[LSR][DebugInfo] Don't unnecessarily drop DebugLocs When transforming a loop terminating condition into a "max" comparison, the DebugLoc from the old condition should be set on the newly created comparison. They are the same operation, just optimized. Fixes PR48067. Differential Revision: https://reviews.llvm.org/D98218	2021-04-26 13:14:42 +01:00
Florian Hahn	7302fe4328	[VPlan] Make blocksOnly work properly with ranges over const pointers. When iterating over const blocks, the base type in the lambdas needs to use const VPBlockBase *, otherwise it cannot be used with input iterators over const VPBlockBase. Also adjust the type of the input iterator range to const &, as it does not take ownership of the input range.	2021-04-26 10:52:35 +01:00
Florian Hahn	4b9be5ac08	[VPlan] Add VPBlockUtils::blocksOnly helper. This patch adds a blocksOnly helpers which take an iterator range over VPBlockBase * or const VPBlockBase * and returns an interator range that only include BlockTy blocks. The accesses are casted to BlockTy. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D101093	2021-04-25 17:38:09 +01:00
Florian Hahn	fa2f162e76	[NewGVN] Properly transfer PredDep in move constructor.	2021-04-25 11:22:59 +01:00
Florian Hahn	1d8ef761be	[NewGVN] Use ExprResult to add extra predicate users. This patch updates performSymbolicPredicateInfoEvaluation to manage registering additional dependencies using ExprResult. Similar to D99987, this fixes an issues where we failed to track the correct dependency for a phi-of-ops value, which is marked as temporary. Fixes PR49873. Reviewed By: asbirlea, ruiling Differential Revision: https://reviews.llvm.org/D100560	2021-04-25 11:13:32 +01:00
Florian Hahn	1cc5946cc8	[NewGVN] Use performSymbolicEvaluation instead of createExpression. performSymbolicEvaluation is used to obtain the symbolic expression when visiting instructions and this is used to determine their congruence class. performSymbolicEvaluation only creates expressions for certain instructions (via createExpression). For unsupported instructions, 'unknown' expression are created. The use of createExpression in processOutgoingEdges means we may simplify the condition in processOutgoingEdges to a constant in the initial round of processing, but we use Unknown(I) for the congruence class. If an operand of I changes the expression Unknown(I) stays the same, so there is no update of the congruence class of I. Hence it won't get re-visited. So if an operand of I changes in a way that causes createExpression to return different result, this update is missed. This patch updates the code to use performSymbolicEvaluation, to be symmetric with the congruence class updating code. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D99990	2021-04-24 18:49:07 +01:00
Dávid Bolvanský	137568e579	[InstCombine] Fixed UB in foldCtpop	2021-04-24 19:44:16 +02:00
Dávid Bolvanský	de3fa35cdb	[InstCombine] ctpop(rot(X)) -> ctpop(X) Proof: https://alive2.llvm.org/ce/z/ss2zyt - rotl https://alive2.llvm.org/ce/z/ZM7Aue - rotr Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101235	2021-04-24 18:25:03 +02:00
Dávid Bolvanský	d4ec8ea19c	[InstCombine] ctpop(X) + ctpop(Y) => ctpop(X \| Y) if X and Y have no common bits (PR48999) For example: ``` int src(unsigned int a, unsigned int b) { return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16); } int tgt(unsigned int a, unsigned int b) { return __builtin_popcount((a << 16) \| (b >> 16)); } ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101210	2021-04-24 17:52:10 +02:00
dfukalov	6c57044231	[GVN] Clobber partially aliased loads. Use offsets stored in `AliasResult` implemented in D98718. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95543	2021-04-24 14:14:20 +03:00
wlei	3d1aecbd28	[CSSPGO] Fix missing debug info of dangling pseudo probe While doing speculative execution opt, it conservatively drops all insn's debug info in the merged `ThenBB`(see the loop at line 2384) including the dangling probe. The missing debug info of the dangling probe will cause the wrong inference computation. So we should avoid dropping the debug info from pseudo probe, this change try to fix this by moving the to-be dangling probe to the merging target BB before the debug info is dropped. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D101195	2021-04-23 14:26:47 -07:00
Dávid Bolvanský	9aee07abd0	[InstCombine] X - usub.sat(X, Y) => umin(X, Y) Pattern regressed in LLVM 9 with the introduction of usub.sat. Fixes https://bugs.llvm.org/show_bug.cgi?id=42178#c2 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101184	2021-04-23 21:13:07 +02:00
Hongtao Yu	5f2d730073	[CSSPGO] Fix incorrect prorating indirect call distribution factor that leads to target count loss. Pseudo probe distribution factor is used to scale down profile samples to avoid misleading the counts inference due to the usage of "maximum" in `getBlockWeight`. For callsites, the scaling down can come from code duplication prior to the sample profile loader (prelink or postlink), or due to the indirect call promotion in sample loader inliner. This patch fixes an issue in sample loader ICP where the leftover indirect callsite scaling down causes the loss of non-promoted call target samples unexpectedly. While the scaling down is to favor BFI/BPI with accurate an callsite count, it doesn't fit in the current distribution factor that represents code duplication changes. Ideally, we would need two factors, one is for code duplication, the other is for ICP. However this seems over complicated. I'm going to trade one usage (callsite counts) for the other (call target counts). Seeing perf win on one benchmark (mcf) of SPEC2017 with others unchanged. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D100993	2021-04-23 11:09:22 -07:00
Sanjay Patel	e10d7d455d	[InstCombine] fold 'not' of ctpop in parity pattern As discussed in https://llvm.org/PR50096 , we could convert the 'not' into a 'sub' and see the same fold. That's because we already have another demanded bits optimization for 'sub'. We could add a related transform for odd-number-of-type-bits, but that seems unlikely to be practical. https://alive2.llvm.org/ce/z/TWJZXr	2021-04-23 13:23:24 -04:00
Florian Hahn	89c4dda076	[VPlan] Add GraphTraits impl to traverse through VPRegionBlock. This patch adds a new iterator to traverse through VPRegionBlocks and a GraphTraits specialization using the iterator to traverse through VPRegionBlocks. Because there is already a GraphTraits specialization for VPBlockBase * and co, a new VPBlockRecursiveTraversalWrapper helper is introduced. This allows us to provide a new GraphTraits specialization for that type. Users can use the new recursive traversal by using this wrapper. The graph trait visits both the entry block of a region, as well as all its successors. Exit blocks of a region implicitly have their parent region's successors. This ensures all blocks in a region are visited before any blocks in a successor region when doing a reverse post-order traversal of the graph. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100175	2021-04-23 17:26:47 +01:00
Sander de Smalen	f9a50f04ba	[TTI] NFC: Change getIntImmCost[Inst\|Intrin] to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565	2021-04-23 16:06:36 +01:00
Sander de Smalen	43ace8b5ce	[TTI] NFC: Change getScalingFactorCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100564	2021-04-23 16:06:36 +01:00
Timm Bäder	e60d6e91e1	[llvm][NFC] Fix assert indentation This triggers GCC's misleading-indentation checker.	2021-04-23 14:44:05 +02:00
Dávid Bolvanský	5f77e7708a	[InstCombine] Fixed crash when setting align attr for memalign	2021-04-23 14:04:08 +02:00
Florian Hahn	2b15262f89	Recommit "[NewGVN] Track simplification dependencies for phi-of-ops." This recommits `4f5da356ff`, including explicit implementations of move a constructor and deleted copy constructors/assignment operators, to fix failures with some compilers. This reverts the revert `74854d00e8`.	2021-04-23 11:27:43 +01:00
Stephen Tozer	791930d740	Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" Previous build failures were caused by an error in bitcode reading and writing for DIArgList metadata, which has been fixed in `e5d844b587`. There were also some unnecessary asserts that were being triggered on certain builds, which have been removed. This reverts commit `dad5caa59e`.	2021-04-23 10:54:01 +01:00
Florian Hahn	74854d00e8	Revert "[NewGVN] Track simplification dependencies for phi-of-ops." This reverts commit `4f5da356ff`. This causes some buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/139/builds/3019	2021-04-23 09:56:17 +01:00
Florian Hahn	4f5da356ff	[NewGVN] Track simplification dependencies for phi-of-ops. If we are using a simplified value, we need to add an extra dependency this value , because changes to the class of the simplified value may require us to invalidate any decision based on that value. This is done by adding such values as additional users, however the current code does not excludes temporary instructions. At the moment, this means that we miss those dependencies for phi-of-ops, because they are temporary instructions at this point. We instead need to add the extra dependencies to the root instruction of the phi-of-ops. This patch pushes the responsibility of adding extra users to the callers of createExpression & performSymbolicEvaluation. At those points, it is clearer which real instruction to pick. Alternatively we could either pass the 'real' instruction as additional argument or use another map, but I think the approach in the patch makes things a bit easier to follow. Fixes PR35074. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D99987	2021-04-23 09:48:38 +01:00
KAWASHIMA Takahiro	d9a9c992d1	[LoopReroll] Fix rerolling loop with extra instructions Fixes PR47627 This fix suppresses rerolling a loop which has an unrerollable instruction. Sample IR for the explanation below: ``` define void @foo([2 x i32]* nocapture %a) { entry: br label %loop loop: ; base instruction %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ] ; unrerollable instructions %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0 store i32 999, i32* %stptrx, align 4 ; extra simple arithmetic operations, used by root instructions %plus20 = add nuw nsw i64 %indvar, 20 %plus10 = add nuw nsw i64 %indvar, 10 ; root instruction 0 %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0 %value0 = load i32, i32* %ldptr0, align 4 %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0 store i32 %value0, i32* %stptr0, align 4 ; root instruction 1 %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1 %value1 = load i32, i32* %ldptr1, align 4 %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1 store i32 %value1, i32* %stptr1, align 4 ; loop-increment and latch %indvar.next = add nuw nsw i64 %indvar, 1 %exitcond = icmp eq i64 %indvar.next, 5 br i1 %exitcond, label %exit, label %loop exit: ret void } ``` In the loop rerolling pass, `%indvar` and `%indvar.next` are appended to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots` function. Before this fix, two instructions with `unrerollable instructions` comment above are marked as `IL_All` at the end of the `LoopReroll::DAGRootTracker::collectUsedInstructions` function, as well as instructions with `extra simple arithmetic operations` comment and `loop-increment and latch` comment. It is incorrect because `IL_All` means that the instruction should be executed in all iterations of the rerolled loop but the `store` instruction should not. This fix rejects instructions which may have side effects and don't belong to def-use chains of any root instructions and reductions. See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.	2021-04-23 15:14:46 +09:00
Elia Geretto	2627f99613	[dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075 The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary. Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D101048	2021-04-22 21:12:20 +00:00
Arthur Eubanks	16ff1a7023	[GlobalOpt] Don't replace alias with aliasee if aliasee is interposable Both the alias and aliasee linkage are important. PR27866 provides some background. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99629	2021-04-22 13:12:34 -07:00
Philip Reames	15e19a2599	Revert "[instcombine] Exploit UB implied by nofree attributes" This change effectively reverts `86664638`, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert. Why revert this now? Two main reasons: 1) There are continuing discussion around what the semantics of nofree. I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes. 2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443). At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs. Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree. In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.	2021-04-22 10:53:17 -07:00
Jianzhou Zhao	7fdf270965	[dfsan] Track origin at loads The first version of origin tracking tracks only memory stores. Although this is sufficient for understanding correct flows, it is hard to figure out where an undefined value is read from. To find reading undefined values, we still have to do a reverse binary search from the last store in the chain with printing and logging at possible code paths. This is quite inefficient. Tracking memory load instructions can help this case. The main issues of tracking loads are performance and code size overheads. With tracking only stores, the code size overhead is 38%, memory overhead is 1x, and cpu overhead is 3x. In practice #load is much larger than #store, so both code size and cpu overhead increases. The first blocker is code size overhead: link fails if we inline tracking loads. The workaround is using external function calls to propagate metadata. This is also the workaround ASan uses. The cpu overhead is ~10x. This is a trade off between debuggability and performance, and will be used only when debugging cases that tracking only stores is not enough. Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D100967	2021-04-22 16:25:24 +00:00
Alexey Bataev	18c61fc498	[SLP]Skip undefs trying to find perfect/shuffled tree entries matching. We can skip check for undefs trying to find perfect/shuffled tree entries matching, they can be ignored completely improving the final cost/vectorization results. Differential Revision: https://reviews.llvm.org/D101061	2021-04-22 08:59:07 -07:00
Joe Ellis	2c551aedcf	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	d4f5f23bbb	[SLP]Replace more `TTI` with `TTIRef`, NFC. To pacify MSVC buildbots.	2021-04-22 07:53:20 -07:00
Alexey Bataev	da2cdfd421	[SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC.	2021-04-22 07:49:48 -07:00
Alexey Bataev	e99b98cb1b	[SLP]Improve cost model for the vectorized extractelements. 1. No need to call `areAllUsersVectorized` as later the cost is calculated only if the instruction has one use and gets vectorized. 2. Need to calculate the cost of the dead extractelement more precisely, taking the vector type of the vector operand, not the resulting vector type. Part of D57059. Differential Revision: https://reviews.llvm.org/D99980	2021-04-22 07:40:17 -07:00
Dawid Jurczak	57f443c348	[SimplifyLibCalls][NFC] Use StringRef::back instead explicit indexing. Split off from D100724. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D101032	2021-04-22 15:02:47 +02:00
David Sherwood	5a229a6702	[LoopVectorize] Don't create unnecessary vscale intrinsic calls In quite a few cases in LoopVectorize.cpp we call createStepForVF with a step value of 0, which leads to unnecessary generation of llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale and createStepForVF to return 0 when attempting to multiply vscale by 0. Differential Revision: https://reviews.llvm.org/D100763	2021-04-22 09:01:52 +01:00
Max Kazantsev	8fe62b7af1	[GVN] Introduce loop load PRE This patch allows PRE of the following type of loads: ``` preheader: br label %loop loop: br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p br label %merge merge: ... br i1 ..., label %loop, label %exit ``` Into ``` preheader: %x0 = load %p br label %loop loop: %x.pre = phi(x0, x2) br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p %x1 = load %p br label %merge merge: x2 = phi(x.pre, x1) ... br i1 ..., label %loop, label %exit ``` So instead of loading from %p on every iteration, we load only when the actual clobber happens. The typical pattern which it is trying to address is: hot loop, with all code inlined and provably having no side effects, and some side-effecting calls on cold path. The worst overhead from it is, if we always take clobber block, we make 1 more load overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken at least once, the transform is neutral or profitable. There are several improvements prospect open up: - We can sometimes be smarter in loop-exiting blocks via split of critical edges; - If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header. Differential Revision: https://reviews.llvm.org/D99926 Reviewed By: reames	2021-04-22 12:50:38 +07:00
Chuanqi Xu	77ca2a6893	[Coroutine] Collect CoroBegin if all of terminators are dominated by one coro.destroy Summary: The original logic seems to be we could collecting a CoroBegin if one of the terminators could be dominated by one of coro.destroy, which doesn't make sense. This patch rewrites the logics to collect CoroBegin if all of terminators are dominated by one coro.destroy. If there is no such coro.destroy, we would call hasEscapePath to evaluate if we should collect it. Test Plan: check-llvm Reviewed by: lxfind Differential Revision: https://reviews.llvm.org/D100614	2021-04-22 11:21:37 +08:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Fangrui Song	775a9483e5	[IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables On ELF targets, if a function has uwtable or personality, or does not have nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module. Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified. (i.e. If -g[123], every function gets `.eh_frame`. This behavior is strange but that is the status quo on GCC and Clang.) Let's take asan as an example. Other sanitizers are similar. `asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true, so every function gets `.eh_frame` if `-g[123]` is specified. This is the root cause that `-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame while `-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame. This patch * sets the nounwind attribute on sanitizer module ctor/dtor. * let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute. The "uwtable" mechanism is generic: synthesized functions not cloned/specialized from existing ones should consider `Function::createWithDefaultAttr` instead of `Function::create` if they want to get some default attributes which have more of module semantics. Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955 https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc. Differential Revision: https://reviews.llvm.org/D100251	2021-04-21 15:58:20 -07:00
Olle Fredriksson	f5446b769a	[MemCpyOpt] Allow variable lengths in memcpy optimizer This makes the memcpy-memcpy and memcpy-memset optimizations work for variable sizes as long as they are equal, relaxing the old restriction that they are constant integers. If they're not equal, the old requirement that they are constant integers with certain size restrictions is used. The implementation works by pushing the length tests further down in the code, which reveals some places where it's enough that the lengths are equal (but not necessarily constant). Differential Revision: https://reviews.llvm.org/D100870	2021-04-21 23:23:38 +02:00
Arthur Eubanks	b606e2df4d	[Evaluator] Bitcast result of pointer stripping Trying to evaluate a GEP would assert with "Ty == cast<PointerType>(C->getType()->getScalarType())->getElementType()" because the type of the pointer we would evaluate the GEP argument to would be a different type than the GEP was expecting. We should treat pointer stripping as a bitcast. The test adds a redundant GEP that would crash due to type mismatch. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D100970	2021-04-21 13:32:29 -07:00
Nikita Popov	24e9fbc1a3	Revert "[InstCombine] Fold multiuse shr eq zero" This reverts commit `9423f78240`. A performance regression with this patch has been reported at https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.	2021-04-21 21:40:52 +02:00
sstefan1	62cdcd6c5a	[FuncAttrs] Don't infer willreturn for nonexact definitions Discovered during attributor testing comparing stats with and without the attributor. Willreturn should not be inferred for nonexact definitions. Differential Revision: https://reviews.llvm.org/D100988	2021-04-21 21:26:09 +02:00
sstefan1	656ebd519e	[SimplifyLibCalls] Don't change alignment when creating memset Fix for PR49984 This was discovered during Attributor testing. Memset was always created with alignment of 1 and in case when strncpy alignment was changed it triggered an assertion in the AttrBuilder. Memset will now be created with appropriate alignment. Differential Revision: https://reviews.llvm.org/D100875	2021-04-21 20:34:13 +02:00
Nico Weber	ba7a92c01e	[Support] Don't include VirtualFileSystem.h in CommandLine.h CommandLine.h is indirectly included in ~50% of TUs when building clang, and VirtualFileSystem.h is large. (Already remarked by jhenderson on D70769.) No behavior change. Differential Revision: https://reviews.llvm.org/D100957	2021-04-21 10:19:01 -04:00
George Balatsouras	79b5280a6c	[dfsan] Enable origin tracking with fast8 mode All related instrumentation tests have been updated. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D100903	2021-04-20 18:10:32 -07:00
Arthur Eubanks	326da4adcb	[FuncAttrs] Always preserve FunctionAnalysisManagerCGSCCProxy FunctionAnalysisManagerCGSCCProxy should not be preserved if any of its keys may be invalid. Since we are not removing/adding functions in FuncAttrs, it's fine to preserve it. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D100893	2021-04-20 16:37:45 -07:00
Reid Kleckner	91f7a4fff7	Revert "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)" This reverts commit `13ec913bdf`. This commit introduces new uses of the overflow checking intrinsics that depend on implementations in compiler-rt, which Windows users generally do not link against. I filed an issue (somewhere) to make clang auto-link the builtins library to resolve this situation, but until that happens, it isn't reasonable for the optimizer to introduce new link time dependencies.	2021-04-20 15:53:34 -07:00
Philip Reames	4824d876f0	Revert "Allow invokable sub-classes of IntrinsicInst" This reverts commit `d87b9b81cc`. Post commit review raised concerns, reverting while discussion happens.	2021-04-20 15:38:38 -07:00
Roman Lebedev	5a654bfeab	Revert "[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)" I forgot about the case where we sign-extend to width smaller than the original. This reverts commit `1e6ca23ab8`.	2021-04-21 01:11:15 +03:00
Roman Lebedev	1e68d338c1	Revert "[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)" I forgot about the case where we sign-extend to width smaller than the original. This reverts commit `41b71f718b`.	2021-04-21 01:11:14 +03:00
Philip Reames	d87b9b81cc	Allow invokable sub-classes of IntrinsicInst It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked. This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke. After this lands and has baked for a couple days, planned cleanups: Make GCStatepointInst a IntrinsicInst subclass. Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor. Do the same in SelectionDAG. Do the same in FastISEL. Differential Revision: https://reviews.llvm.org/D99976	2021-04-20 15:03:49 -07:00
Roman Lebedev	41b71f718b	[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543) This is a more convoluted form of the same pattern "sext of NSW trunc", but in this case the operand of trunc was a right-shift, and the truncation chops off just the zero bits that were shifted-in.	2021-04-21 00:31:46 +03:00
Roman Lebedev	1e6ca23ab8	[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543) If we can tell that trunc only chops off sign bits, and not all of them, then we can simply sign-extend the trunc's source.	2021-04-21 00:31:45 +03:00
Sanjay Patel	1e202e8f39	[InstCombine] fold shift-of-srem-by-2 to mask+shift There are several potential srem-by-2 folds because the result is known {-1,0,1}. https://alive2.llvm.org/ce/z/LuVyeK	2021-04-20 17:10:16 -04:00
Roman Lebedev	13ec913bdf	[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769) We already had support for it's unsigned variant, so simply extend it to also handle the signed variant. Fixes https://bugs.llvm.org/show_bug.cgi?id=48769	2021-04-20 21:29:43 +03:00
Joseph Huber	b2ad63d3cf	[OpenMP] Add OpenMPOpt as a Module pass Summary: This patch registers OpenMPOpt as a Module pass in addition to a CGSCC pass. This is so certain optimzations that are sensitive to intact call-sites can happen before inlining. The old `openmpopt` pass name is changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass. The current module pass only runs a single check but will be expanded in the future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D99202	2021-04-20 12:28:58 -04:00
Alexey Bataev	af870e11ae	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 09:08:46 -07:00
Philip Reames	3b1474cab2	free(nullptr) does not violate the nofree specification This fixes a subtle and nasty bug in my `86664638`. The problem is that free(nullptr) is well defined (and common). The specification for the nofree attributes talks about memory objects, and doesn't explicitly address null, but I think it's reasonable to assume that nofree doesn't disallow a call to free(nullptr). If it did, we'd have to prove nonnull on an argument to ever infer nofree which doesn't seem to be the intent. This was found by Nuno and Alive2 over in https://reviews.llvm.org/D100141#2697374. Differential Revision: https://reviews.llvm.org/D100779	2021-04-20 09:08:05 -07:00
Alexey Bataev	b82344a019	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `daf6e18c55` to fix the compiler crash.	2021-04-20 08:29:32 -07:00
Alexey Bataev	daf6e18c55	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 07:46:49 -07:00
Alexey Bataev	cf00cb8bed	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `b232771aca` to fix buildbots.	2021-04-20 07:16:11 -07:00
Alexey Bataev	b232771aca	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 06:55:55 -07:00
Sander de Smalen	86729538bd	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Luo, Yuanke	bcdaccfe34	[X86][AMX] Verify illegal types or instructions for x86_amx. This patch is related to https://reviews.llvm.org/D100032 which define some illegal types or operations for x86_amx. There are no arguments, arrays, pointers, vectors or constants of x86_amx. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D100472	2021-04-20 16:14:22 +08:00
Arthur Eubanks	5e71b9fa93	Explicitly pass type to cast load constant folding result Previously we would use the type of the pointee to determine what to cast the result of constant folding a load. To aid with opaque pointer types, we should explicitly pass the type of the load rather than looking at pointee types. ConstantFoldLoadThroughBitcast() converts the const prop'd value to the proper load type (e.g. [1 x i32] -> i32). Instead of calling this in every intermediate step like bitcasts, we only call this when we actually see the global initializer value. In some existing uses of this API, we don't know the exact type we're loading from immediately (e.g. first we visit a bitcast, then we visit the load using the bitcast). In those cases we have to manually call ConstantFoldLoadThroughBitcast() when simplifying the load to make sure that we cast to the proper type. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100718	2021-04-20 00:53:21 -07:00
Dávid Bolvanský	324d641b75	[InstCombine] Enhance deduction of alignment for aligned_alloc This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch. > The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment. Currently, we simply bail out if we see a non-constant size - change that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100785	2021-04-20 02:04:18 +02:00
Alexey Bataev	8030481065	Revert "[SLP]Add detection of shuffled/perfect matching of tree entries." This reverts commit `d6fde91379` to fix compiler crashes.	2021-04-19 14:10:04 -07:00
Zequan Wu	e28435caf6	[ThinLTO] Copy UnnamedAddr when spliting module. The unnamedaddr property of a function is lost when using `-fwhole-program-vtables` and thinlto which causes size increase under linker's safe icf mode. The size increase of chrome on Linux when switching from all icf to safe icf drops from 5 MB to 3 MB after this change, and from 6 MB to 4 MB on Windows. There is a repro: ``` # a.h struct A { virtual int f(); virtual int g(); }; # a.cpp #include "a.h" int A::f() { return 10; } int A::g() { return 10; } # main.cpp #include "a.h" int g(A* a) { return a->f(); } int main(int argv, char** args) { A a; return g(&a); } $ clang++ -O2 -ffunction-sections -flto=thin -fwhole-program-vtables -fsplit-lto-unit -c main.cpp -o main.o && clang++ -Wl,--icf=safe -fuse-ld=lld -flto=thin main.o -o a.out && llvm-readobj -t a.out \| grep -A 1 -e _ZN1A1fEv -e _ZN1A1gEv Name: _ZN1A1fEv (480) Value: 0x201830 -- Name: _ZN1A1gEv (490) Value: 0x201840 ``` Differential Revision: https://reviews.llvm.org/D100498	2021-04-19 14:04:58 -07:00
Alexey Bataev	d6fde91379	[SLP]Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Differential Revision: https://reviews.llvm.org/D100495	2021-04-19 13:29:30 -07:00
Philip Reames	3c54762226	[funcattrs] Consistently check call site attributes This is mostly stylistic cleanup after D100226, but not entirely. When skimming the code, I found one case where we weren't accounting for attributes on the callsite at all. I'm also suspicious we had some latent bugs related to operand bundles (which are supposed to be able to override attributes on declarations), but I don't have concrete test cases for those, just suspicions. Aside: The only case left in the file which directly checks attributes on the declaration is the norecurse logic. I left that because I didn't understand it; it looks obviously wrong, so I suspect I'm misinterpreting the intended semantics of the attribute. Differential Revision: https://reviews.llvm.org/D100689	2021-04-19 13:20:50 -07:00
Philip Reames	01801d5274	[rs4gc] Fix a latent bug around attribute stripping for intrinsics This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032). The story on this is a bit involved. Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic. Why? Because the lowering of the intrinsic relies on the presence of the readonly attribute. We don't have a matcher to select the case where there's a glue node needed. Now, on the surface, this still seems like a codegen bug. However, here it gets fun. I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail. This reproduces only when RS4GC and codegen are run in the same process and context. Why? Because it turns out we can't roundtrip the stripped attribute through serialized IR! We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td. This makes it impossible to exercise SelectionDAG from a standalone test case. At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future. As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted. I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.	2021-04-19 13:14:07 -07:00
Nikita Popov	9423f78240	[InstCombine] Fold multiuse shr eq zero The single-use case is handled implicity by converting the icmp into a mask check first. When comparing with zero in particular, we don't need the one-use restriction, as we only produce a single icmp. https://alive2.llvm.org/ce/z/MSixcm https://alive2.llvm.org/ce/z/GwpG0M	2021-04-19 22:13:11 +02:00
Nikita Popov	d440f9a326	[LICM] Make capture check more precise During store promotion, we check whether the pointer was captured to exclude potential reads from other threads. However, we're only interested in captures before or inside the loop. Check this using PointerMayBeCapturedBefore against the loop header. Differential Revision: https://reviews.llvm.org/D100706	2021-04-19 20:34:23 +02:00
Roman Lebedev	d746fefb6f	[SCEVExpander] ReuseOrCreateCast(): use IRBuilder to actually create the cast In particular, this allows to create constant expressions instead of IR Instruction's if the argumen is a constant.	2021-04-19 18:38:39 +03:00
Roman Lebedev	ecc9d7e913	[SCEVExpander] Expand explicit PtrToInt casts just like we would implicit ones I.e., use GetOptimalInsertionPointForCastOf() helper to get the insertion point, and try to reuse casts first.	2021-04-19 18:38:39 +03:00
Roman Lebedev	442c408e0e	[SCEVExpander] GetOptimalInsertionPointForCastOf(): gracefully handle Constant's I guess this case hasn't come up thus far, and i'm not sure if it can really happen for the existing usages, thus no test in this commit. But, the following commit adds test coverage, there we'd expirience a crash without this fix.	2021-04-19 18:38:39 +03:00
Roman Lebedev	b8a3705896	[NFCI][SCEVExpander] Extract GetOptimalInsertionPointForCastOf() helper	2021-04-19 18:38:38 +03:00
Roman Lebedev	73f60e3988	[SCEVExpander] generateOverflowCheck(): explicitly PtrToInt the Start Currently, InsertNoopCastOfTo() would implicitly insert that cast, but now that we have SCEVPtrToIntExpr, i'm hoping we could stop InsertNoopCastOfTo() from doing that. But first all users must be fixed.	2021-04-19 18:38:38 +03:00
Cullen Rhodes	f0bc2782f2	[TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100377	2021-04-19 11:01:34 +00:00
OCHyams	0ebf9a8e34	[DebugInfo] Move the findDbg* functions into DebugInfo.cpp Move the findDbg* functions into lib/IR/DebugInfo.cpp from lib/Transforms/Utils/Local.cpp. D99169 adds a call to a function (findDbgUsers) that lives in lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp (LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp which is part of LLVMCore. Reviewed By: dblaikie, rnk Differential Revision: https://reviews.llvm.org/D100632	2021-04-19 10:30:25 +01:00
Evgeniy Brevnov	35e95c6817	[CVP] processCallSite returns wrong status Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true' even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100538	2021-04-19 12:13:22 +07:00
Xun Li	5faba87938	Revert "[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass" This reverts commit `fa6b54c44a`. The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.	2021-04-18 17:22:28 -07:00
Xun Li	fa6b54c44a	[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining. The presplit coroutine attributes are set in CoroEarly pass. However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine. This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920 To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass. Reviewed By: rjmccall, ChuanqiXu Differential Revision: https://reviews.llvm.org/D100282	2021-04-18 15:41:09 -07:00
Xun Li	c0211e8d7d	Revert "[Coroutines] Move CoroEarly pass to before AlwaysInliner" This reverts commit `2b50f5a434`. Forgot to update the description of the commit to sync with phabricator. Going to redo the commit.	2021-04-18 15:38:19 -07:00
Xun Li	2b50f5a434	[Coroutines] Move CoroEarly pass to before AlwaysInliner Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining. The presplit coroutine attributes are set in CoroEarly pass. However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine. This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920 Differential Revision: https://reviews.llvm.org/D100282	2021-04-18 14:54:04 -07:00
Juneyoung Lee	1c10201d96	Update InstCombine to use undef matcher instead This is a patch to use m_Undef() matcher instead of isa<UndefValue>(). As suggested in D100122, this update is separately committed.	2021-04-18 11:05:36 +09:00
Florian Hahn	af523514c4	[SimplifyCFG] Skip dbg intrinsics when checking for branch-only BBs. Debug intrinsics are free to hoist and should be skipped when looking for terminator-only blocks. As a consequence, we have to delegate to the main hoisting loop to hoist any dbg intrinsics instead of jumping to the terminator case directly. This fixes PR49982. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100640	2021-04-17 15:17:50 +01:00
Nikita Popov	e68b12c99e	[Inline] Don't add noalias metadata to inaccessiblememonly calls It will not do anything useful for them, as we already know that they don't modref with any accessible memory. In particular, this prevents noalias metadata from being placed on noalias.scope.decl intrinsics. This reduces the amount of metadata needed, and makes it more likely that unnecessary decls can be eliminated.	2021-04-17 14:56:13 +02:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Philip Reames	11707435cc	[inferattrs] Don't infer lib func attributes for nobuiltin functions If we have a nobuiltin function, we can't assume we know anything about the implementation. I noticed this when tracing through a log from an in the wild miscompile (https://github.com/emscripten-core/emscripten/issues/9443) triggered after `8666463`. We were incorrectly assuming that a custom allocator could not free. (It's not clear yet this is the only problem in said issue.) I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history. Through, from what I can tell, that commit fixed symptom not root cause. The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner. I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.	2021-04-16 15:36:15 -07:00
Philip Reames	f549176ad9	[funcattrs] Add the maximal set of implied attributes to definitions Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time. Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now. The old behavior can end up quite confusing for two reasons: * Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.) * We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent. I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement. Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own. Differential Revision: https://reviews.llvm.org/D100226	2021-04-16 14:22:19 -07:00
Philip Reames	ff55d01a8e	[nofree] Restrict semantics to memory visible to caller This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee. This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute". The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D100141	2021-04-16 11:38:55 -07:00
Marcythm	f8cf3b9931	[LICM][NFC] Fix typo fixed some typos which may lead to misunderstandings in LICM.cpp Reviewed By: nikic, asbirlea Differential Revision: https://reviews.llvm.org/D100470	2021-04-16 09:42:00 +08:00
Arthur Eubanks	9c776c2fa2	[NFC][NewPM] Remove some AnalysisManager invalidate methods These were misleading, they're more of a "clear" than an "invalidate". We shouldn't be individually clearing analysis results. Either we clear all analyses when some IR becomes invalid, or we properly go through invalidation. There was only one use of this, which can be simulated with AM.invalidate(F, PA). Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D100519	2021-04-15 16:51:26 -07:00
Florian Hahn	3e7ee5428d	[InferAttrs] Do not mark first argument of str(n)cat as writeonly. str(n)cat appends a copy of the second argument to the end of the first argument. To find the end of the first argument, str(n)cat has to read from it until it finds the terminating 0. So it should not be marked as writeonly. I think this means the argument should not be marked as writeonly. (This is causing a mis-compile with legacy DSE, before it got removed) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D100601	2021-04-15 23:00:21 +01:00
Florian Hahn	49999d4364	[VPlan] Replace a few unnecessary includes with forward decls.	2021-04-15 20:08:31 +01:00
Danilo C. Grael	55487079a9	[LoopUnrollAndJam] Avoid repeated instructions for UAJ analysis Avoid visiting repeated instructions for processHeaderPhiOperands as it can cause a scenario of endless loop. Test case is attached and can be ran with `opt -basic-aa -tbaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97407	2021-04-15 12:59:42 -04:00
Mark Johnston	f511dc75e4	[asan] Add an offset for the kernel address sanitizer on FreeBSD This is based on a port of the sanitizer runtime to the FreeBSD kernel that has been commited as https://cgit.freebsd.org/src/commit/?id=38da497a4dfcf1979c8c2b0e9f3fa0564035c147 and the following commits. Reviewed By: emaste, dim Differential Revision: https://reviews.llvm.org/D98285	2021-04-15 17:49:00 +01:00
Stelios Ioannou	bf147c4653	[LSR] Fix for pre-indexed generated constant offset This patch changed the isLegalUse check to ensure that LSRInstance::GenerateConstantOffsetsImpl generates an offset that results in a legal addressing mode and formula. The check is changed to look similar to the assert check used for illegal formulas. Differential Revision: https://reviews.llvm.org/D100383 Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a	2021-04-15 16:44:42 +01:00
Florian Hahn	6adebe3fd2	[VPlan] Add VPRecipeBase::mayHaveSideEffects. Add an initial version of a helper to determine whether a recipe may have side-effects. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100259	2021-04-15 11:49:40 +01:00
David Sherwood	ea14df695e	[SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction There were a few places in widenPHIInstruction where calculations of offsets were failing to take the runtime calculation of VF into account for scalable vectors. I've fixed those cases in this patch as well as adding an assert that we should not be scalarising for scalable vectors. Tests are added here: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D99254	2021-04-15 10:51:49 +01:00
David Sherwood	7120f89f7d	[NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts There are a few places in LoopVectorize.cpp where we have been too cautious in adding VF.isScalable() asserts and it can be confusing. It also makes it more difficult to see the genuine places where work needs doing to improve scalable vectorization support. This patch changes getMemInstScalarizationCost to return an invalid cost instead of firing an assert for scalable vectors. Also, vectorizeInterleaveGroup had multiple asserts all for the same thing. I have removed all but one assert near the start of the function, and added a new assert that we aren't dealing with masks for scalable vectors. Differential Revision: https://reviews.llvm.org/D99727	2021-04-15 09:41:03 +01:00
Florian Hahn	5a3ff24b12	[NewGVN] Add phi-of-ops operands if no real PHI is created. If the PHI-of-ops simplifies to an existing value, no real PHI is created, which means the dependencies between the PHI-of-ops and its operands is not materialized in IR. At the moment, we fail to create a real PHI node for the PHI-of-ops, because the PHI-of-ops root instruction is not re-visited if one of the PHI-of-ops operands changes. We need to add the operands as additional users in this case. Even with this patch, there are still some dependencies missing. I will continue tackling the outstanding reporeted crashes in this area. Fixes PR36501, PR42422, PR42557. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D66924	2021-04-15 08:25:10 +01:00
Philip Reames	dd985551c2	Reapply "[InferAttributes] Materialize all infered attributes for declaration"" and follow on patches. This reverts commit `ab98f2c712` and `98eea392cd`. It includes a fix for the clang test which triggered the revert. I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message. Odd. Since only one build bot reported the clang test, I didn't notice that one.	2021-04-14 16:38:07 -07:00
Nico Weber	ab98f2c712	Revert "[InferAttributes] Materialize all infered attributes for declaration" Breaks check-clang, see comments on D100400 Also revert follow-up "[NFC] Move a recently added utility into a location to enable reuse" This reverts commit `3ce61fb6d6`. This reverts commit `61a85da882`.	2021-04-14 18:41:20 -04:00
Philip Reames	3ce61fb6d6	[NFC] Move a recently added utility into a location to enable reuse About to refresh a patch that uses this in FunctionAtrrs, doing the move seperately to control build times.	2021-04-14 15:05:16 -07:00
Philip Reames	61a85da882	[InferAttributes] Materialize all infered attributes for declaration We have some cases today where attributes can be inferred from another on access, but the result is not explicitly materialized in IR. This change is a step towards changing that. Why? Two main reasons: * Human clarity. It's really confusing trying to figure out why a transform is triggering when the IR doesn't appear to have the required attributes. * This avoids the need to special case declarations in e.g. functionattrs. Since we can assume the attribute is present, we can work directly from attributes (and only attributes) without also needing to query accessors on Function to avoid missing cases due to unannotated (but infered on use) declarations. (This piece will appear must easier to follow once D100226 also lands.) Differential Revision: https://reviews.llvm.org/D100400	2021-04-14 14:45:24 -07:00
Mehrnoosh Heidarpour	29f189f90d	[InstCombine] Conditionally emit nowrap flags when combining two adds Currently, the InstCombineCompare is combining two add operations into a single add operation which always has a nsw flag, without checking the conditions to see if this flag should be present according to the original two add operations or not. This patch will change the InstCombineCompare to emit the nsw or nuw only when these flags are allowed to be generated according to the original add operations and remove the possibility of applying wrong optimization with passes that will perform on the IR later in the pipeline. To confirm that the current results are buggy and the results after proposed patch are the correct IR the following examples from Alive2 are attached; the same results can be seen in the case of nuw flag and nsw is just used as an example. The following link shows that the generated IR with current LLVM is a buggy IR when none of the original add operations have nsw flag. https://alive2.llvm.org/ce/z/WGaDrm The following link proves that the generated IR after the patch in the former case is the correct IR. https://alive2.llvm.org/ce/z/wQ7G_e Differential Revision: https://reviews.llvm.org/D100095	2021-04-14 20:53:06 +02:00
Sjoerd Meijer	39d29817f3	[SCCP] Follow up of rGbbab9f986c6d. NFC. This addresses the linter messages, mainly the inconsistent capitalisation of member functions.	2021-04-14 17:14:46 +01:00
Benjamin Kramer	cf4161673c	[Instcombine] Disable memcpy of alloca bypass for instruction sources This transformation is fundamentally broken when it comes to dominance, it just happened to work when the source of the memcpy can be moved into the place of the alloca. The bug shows up a lot more often since `077bff39d4` allows the source to be a switch. It would be possible to check dominance of the source and all its operands, but that seems very heavy for instcombine.	2021-04-14 16:52:09 +02:00
Simon Pilgrim	b49c41afba	[SLP] createOp - fix null dereference warning. NFCI. Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.	2021-04-14 15:24:41 +01:00
Sjoerd Meijer	bbab9f986c	[SCCP] Create SCCP Solver This refactors SCCP and creates a SCCPSolver interface and class so that it can be used by other passes and transformations. We will use this in D93838, which adds a function specialisation pass. This is based on an early version by Vinay Madhusudan. Differential Revision: https://reviews.llvm.org/D93762	2021-04-14 14:58:03 +01:00
Roman Lebedev	2fea5d5d4a	[InstCombine] tmp alloca bypass: ensure that the replacement dominates all alloca uses After `077bff39d4`, isDereferenceableForAllocaSize() can recurse into selects, which is causing a problem for the new test case, reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html because the replacement (the select) is defined after the first use of an alloca, so we'd end up with a verifier error. Now, this new check is too restrictive. We likely can handle some cases, by trying to sink all uses of an alloca to after the the def.	2021-04-14 13:04:12 +03:00
Sterling Augustine	32e264921b	Revert "[GlobalOpt] Revert valgrind hacks" This reverts commit `dbc16ed199`.	2021-04-13 17:47:07 -07:00
Evgeny Leviant	dbc16ed199	[GlobalOpt] Revert valgrind hacks Differential revision: https://reviews.llvm.org/D69428	2021-04-13 19:11:10 +03:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
Florian Hahn	467b1f1cd2	[SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false. As a side-effect of the change to default HoistCommonInsts to false early in the pipeline, we fail to convert conditional branch & phis to selects early on, which prevents vectorization for loops that contain conditional branches that effectively are selects (or if the loop gets vectorized, it will get vectorized very inefficiently). This patch updates SimplifyCFG to perform hoisting if the only instruction in both BBs is an equal branch. In this case, the only additional instructions are selects for phis, which should be cheap. Even though we perform hoisting, the benefits of this kind of hoisting should by far outweigh the negatives. For example, the loop in the code below will not get vectorized on AArch64 with the current default, but will with the patch. This is a fundamental pattern we should definitely vectorize. Besides that, I think the select variants should be easier to use for reasoning across other passes as well. https://clang.godbolt.org/z/sbjd8Wshx ``` double clamp(double v) { if (v < 0.0) return 0.0; if (v > 6.0) return 6.0; return v; } void loop(double* X, double *Y) { for (unsigned i = 0; i < 20000; i++) { X[i] = clamp(Y[i]); } } ``` Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100329	2021-04-13 10:33:35 +01:00
Amy Huang	dad5caa59e	Revert "Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" This change causes an assert / segmentation fault in LTO builds. This reverts commit `f2e4f3eff3`.	2021-04-12 20:10:17 -07:00
Evgeniy Brevnov	e50aa1af2d	[NARY][NFC] Use hasNUsesOrMore instead of getNumUses since it's more efficient.	2021-04-13 09:29:49 +07:00
Gulfem Savrun Yeniceri	e96df3e531	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-04-13 01:29:41 +00:00
Nick Desaulniers	237d4ee835	[JumpThreading] merge debug info when merging select+br Jump threading can replace select then unconditional branch with conditional branch, but when doing so loses debug info. This destructive transform is eventually leading to a failed Verifier run during full LTO builds of the Linux kernel with CFI and KCOV enabled, as reported in PR39531. ModuleSanitizerCoveragePass will insert calls to __sanitizer_cov_trace_pc, and sometimes split critical edges, using whatever debug info may or may not exist for the branch for the added libcall. Since we can inline calls to __sanitizer_cov_trace_pc due to LTO, this can lead to the error observed in PR39531 when the debug info isn't propagated to the libcall, because of prior destructive transforms that failed to retain debug info. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100137	2021-04-12 17:51:21 -07:00
Arthur Eubanks	a8ab1f98d2	[Evaluator] Look through invariant.group intrinsics Turning on -fstrict-vtable-pointers in Chrome caused an extra global initializer. Turns out that a llvm.strip.invariant.group intrinsic was causing GlobalOpt to fail to step through some simple code. We can treat .invariant.group uses as simply their operand. Value::stripPointerCastsForAliasAnalysis() does exactly this. This should be safe because the Evaluator does not skip memory accesses due to invariants or alias analysis. However, we don't want to leak that we've stripped arbitrary pointer casts to users of Evaluator, so we bail out if we evaluate a function to any constant, since we may have looked through .invariant.group calls and aliasing pointers cannot be arbitrarily substituted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D98843	2021-04-12 16:12:15 -07:00
Nick Desaulniers	4914c98367	[SantizerCoverage] handle missing DBG MD when inserting libcalls Instruction::getDebugLoc can return an invalid DebugLoc. For such cases where metadata was accidentally removed from the libcall insertion point, simply insert a DILocation with line 0 scoped to the caller. When we can inline the libcall, such as during LTO, then we won't fail a Verifier check that all calls to functions with debug metadata themselves must have debug metadata. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100158	2021-04-12 15:55:58 -07:00
Yuanfang Chen	c5fda0e662	Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"" This reverts commit `a3fabc79ae` (relands `f4d682d6ce` with fix for the compile-time regression issue).	2021-04-12 14:50:54 -07:00
Nikita Popov	a3fabc79ae	Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom" This reverts commit `f4d682d6ce`. This caused a significant compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions Possibly this is due to overeager parsing of target triples.	2021-04-12 22:55:59 +02:00
Sanjay Patel	5354a213a0	[InstCombine] fold shift+trunc signbit check https://alive2.llvm.org/ce/z/6vQvrP This solves: https://llvm.org/PR49866	2021-04-12 16:19:43 -04:00
Sanjay Patel	661cc71a1c	[PassManager][PhaseOrdering] lower expects before running simplifyCFG Retry of `330619a3a6` that includes a clang test update. Original commit message: If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898 <https://reviews.llvm.org/D98898>. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 15:07:53 -04:00
Sanjay Patel	23ac9d1e6e	Revert "[PassManager][PhaseOrdering] lower expects before running simplifyCFG" This reverts commit `330619a3a6`. There are clang tests that also need to be updated.	2021-04-12 13:58:54 -04:00
Yuanfang Chen	f4d682d6ce	[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom D24453 enabled libcalls simplication for ARM PCS. This may cause caller/callee calling conventions mismatch in some situations such as LTO. This patch makes instcombine aware that the compatible calling conventions differences are benign (not emitting undef idom). Differential Revision: https://reviews.llvm.org/D99773	2021-04-12 09:32:23 -07:00
Sanjay Patel	330619a3a6	[PassManager][PhaseOrdering] lower expects before running simplifyCFG If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 12:23:31 -04:00
Stephen Tozer	f2e4f3eff3	Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" The causes of the previous build errors have been fixed in revisions `aa3e78a59f`, and `140757bfaa` This reverts commit `f40976bd01`.	2021-04-12 16:57:29 +01:00
Evgeniy Brevnov	36b932d6a3	[NARY] Don't optimize min/max if there are side uses Say we have %1=min(%a,%b) %2=min(%b,%c) %3=min(%2,%a) The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2. But if %2 has another uses outside of %3 then we can't remove %2 and end up with: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1, %c) This doesn't harm by itself except it is not profitable and changes IR for no good reason. What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1,%c) %4=min(%2,%a) and so on... The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and known to be not removed. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D100170	2021-04-12 12:43:54 +07:00
Roman Lebedev	8fc8c745cf	[NFCI][SimplifyCFG] PerformValueComparisonIntoPredecessorFolding(): improve Dominator Tree updating Same as with previous patches.	2021-04-11 23:56:23 +03:00
Roman Lebedev	13fca9d816	[NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updating Same as with previous patches.	2021-04-11 23:56:23 +03:00
Roman Lebedev	0699da1569	[NFCI][Local] MergeBasicBlockIntoOnlyPred(): improve Dominator Tree updating Same as with TryToSimplifyUncondBranchFromEmptyBlock()/MergeBlockIntoPredecessor() patch.	2021-04-11 23:56:23 +03:00
Roman Lebedev	e5692a564a	[NFCI][BasicBlockUtils] MergeBlockIntoPredecessor(): improve Dominator Tree updating Same as with TryToSimplifyUncondBranchFromEmptyBlock() patch.	2021-04-11 23:56:23 +03:00
Roman Lebedev	2def9c3d8e	[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): improve Dominator Tree updating First, we don't need vector-ness for the predecessor lists. Secondly, like elsewhere, do insertions before deletions. Lastly, the check that we actually need to insert an edge, that it doesn't exist already, is backwards. Instead of looking at successors of every single 'PredOfBB', just always look at predecessors of the 'Succ'. The result is always the same, but we avoid really inefficient code.	2021-04-11 23:56:22 +03:00
Roman Lebedev	91248e2db9	[InstCombine] Improve "get low bit mask upto and including bit X" pattern https://alive2.llvm.org/ce/z/3u-48R	2021-04-11 18:08:08 +03:00
Roman Lebedev	a36bb7fd76	[InstCombine] (X \| Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add https://alive2.llvm.org/ce/z/Coc5yf	2021-04-11 18:08:08 +03:00
Roman Lebedev	005881e96e	[LoopIdiom] left-shift-until-bittest: set all allowed no-wrap flags on add/sub I've checked each one of these with alive2, and this is both correct and precise.	2021-04-11 18:08:07 +03:00
Roman Lebedev	9829f5e6b1	[CVP] @llvm.[us]{min,max}() intrinsics handling If we can tell that either one of the arguments is taken, bypass the intrinsic. Notably, we are indeed fine with non-strict predicate: * UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i * UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx * SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W * SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr Much like with all other comparison handling in CVP, while we could sort-of handle two Value's, at least for plain ICmpInst it does not appear to be worthwhile. This only fires 78 times on test-suite + dt + rs, but we don't canonicalize to these yet. (only SCEV produces them)	2021-04-11 00:33:47 +03:00
Roman Lebedev	f041757e9c	[NFC][JumpThreading] Increment 'NumFolds' statistic all places terminator becomes uncond	2021-04-10 21:24:29 +03:00
Roman Lebedev	a407738def	[NFC][CVP] Add statistic for function pointer argument non-null-ness deduction	2021-04-10 21:23:20 +03:00
Roman Lebedev	fe7b3ad8d5	[CVP] LVI: Use in-block values when checking value signedness domain This has a huge positive impact on all the folds that use these helpers, as it can be seen on vanilla test-suite + rawspeed + darktable: correlated-value-propagation.NumSRems +75.68% (+ 28) correlated-value-propagation.NumAShrs +63.87% (+198) correlated-value-propagation.NumSDivs +49.42% (+127) correlated-value-propagation.NumSExt + 8.85% (+593) correlated-value-propagation.NumUDivURemsNarrowed + 8.65% (+34) ... while having pretty minimal compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=e8c7f43e2c2c6f3581ec1c6489ec21ad9f98958a&to=4cd197711e58ee1b2faeee0c35eea54540185569&stat=instructions	2021-04-10 21:10:59 +03:00
Roman Lebedev	257eda0794	[NFC][LVI] getPredicateAt(): drop default value for UseBlockValue The default is likely wrong. Out of all the callees, only a single one needs to pass-in false (JumpThread), everything else either already passes true, or should pass true. Until the default is flipped, at least make it harder to unintentionally add new callees with UseBlockValue=false.	2021-04-10 20:46:01 +03:00
Roman Lebedev	e8c7f43e2c	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:38:55 +03:00
Roman Lebedev	7b12c8c59d	Revert "[NFC][ConstantRange] Add 'icmp' helper method" This reverts commit `17cf2c9423`.	2021-04-10 19:37:53 +03:00
Roman Lebedev	17cf2c9423	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:09:52 +03:00
Roman Lebedev	c329a47d9e	[CVP] @llvm.abs() handling Iff we know the sigdness domain of the argument, we can either skip @llvm.abs, or do negation directly. Notably, INT_MIN can belong to either domain: * X u<= INT_MIN --> X is always fine https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS * X s<= 0 --> -X is always fine https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84 If all else fails, try to inferr NSW flag: https://alive2.llvm.org/ce/z/qCJfYm	2021-04-10 16:47:31 +03:00
Adrian Prantl	6ce76ff7eb	Update the linkage name of coro-split functions in the debug info. This patch updates the linkage name in the DISubprogram of coro-split functions, which is particularly important for Swift, where the funclets have a special name mangling. This patch does not affect C++ coroutines, since the DW_AT_specification is expected to hold the (original) linkage name. I believe this is mostly due to limitations in AsmPrinter, so we might be able to relax this restriction in the future. Differential Revision: https://reviews.llvm.org/D99693	2021-04-09 09:50:56 -07:00
Sanjay Patel	84cdccc9dc	[InstCombine] try to eliminate an instruction in min/max -> abs fold As suggested in the review thread for `5094e12` and seen in the motivating example from https://llvm.org/PR49885, it's not clear if we have a way to create the optimal code without this heuristic.	2021-04-09 10:34:03 -04:00
dfukalov	c1a88e007b	[AA][NFC] Convert AliasResult to class containing offset for PartialAlias case. Add an ability to store `Offset` between partially aliased location. Use this storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98718	2021-04-09 13:26:09 +03:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Max Kazantsev	baf17e2cc9	[NFC] Move statictic increment out of helper	2021-04-09 16:32:35 +07:00
Max Kazantsev	275f3a2540	[GVN][NFC] Factor out load elimination logic via PRE for reuse	2021-04-09 16:12:25 +07:00
Arthur Eubanks	4c89bcadf6	[LICM] Hoist loads with invariant.group metadata Previously loading the vtable used in calling a virtual method in a loop was not hoisted out of the loop. This fixes that. canSinkOrHoistInst() itself doesn't check that the load operands are loop invariant, callers also check that separately. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99784	2021-04-08 21:57:37 -07:00
Serguei Katkov	d2e15a83a6	[RS4GC] Cleanup meetBDVState. NFC. meetBDVState looks pretty difficult to read and follow. This is purely NFC but doing several things: 1) Combine meet and meetBDVState 2) Move the function to be a member of BDVState 3) Make BDVState be a mutable object 4) Convert switch to sequence of ifs 5) Adds comments. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99064	2021-04-09 10:20:25 +07:00
Alexey Bataev	ab124bbe2a	[SLP]Fix PR49898: Infinite loop in SLP vectorizer. We should not re-try attempt of finding of the consecutive store chain if it was tried before. Differential Revision: https://reviews.llvm.org/D100131	2021-04-08 14:18:06 -07:00
Philip Reames	35393c865c	[funcattrs] Infer nosync from instruction walk Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync. A couple points of interest: * I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere. * The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately. * I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising. Differential Revision: https://reviews.llvm.org/D99769	2021-04-08 14:05:00 -07:00
Arthur Eubanks	c5d1ccbcdf	[GVN] Properly invalidate ICF cache when we simplify a value This fixes a "Cached first special instruction is wrong!" assert. The assert fires because replacing a value with another can cause an instruction to no longer be "special" to ICF. In this case, devirtualization happened, turning an indirect call to a call to a willreturn function which is no longer special. Reviewed By: nikic, rnk Differential Revision: https://reviews.llvm.org/D99977	2021-04-08 14:01:57 -07:00
Nikita Popov	59a2f67011	[LoopRotate] Don't split loop pass manager After D99249 we use three different loop pass managers for LICM, LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI and LazyBPI are not preserved by LoopRotate (note that D74640 is no longer needed). Avoid this by marking them as preserved. My understanding of D86156 is that it is okay to simply preserve them (which LoopUnswitch already does for the same reason) and rely on callbacks to deal with deleted blocks. Differential Revision: https://reviews.llvm.org/D99843	2021-04-08 22:05:18 +02:00
Congzhe Cao	ce2db9005d	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to the `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-08 14:58:13 -04:00
Sanjay Patel	5094e1279e	[InstCombine] fold min/max intrinsic with negated operand to abs The smax case shows up in https://llvm.org/PR49885 . The others seem unlikely, but we might as well try for uniformity (although that could mean an extra instruction to create "nabs"). smax -- https://alive2.llvm.org/ce/z/8yYaGy smin -- https://alive2.llvm.org/ce/z/0_7zc_ umax -- https://alive2.llvm.org/ce/z/EcsZWs umin -- https://alive2.llvm.org/ce/z/Xw6WvB	2021-04-08 14:37:39 -04:00
Florian Hahn	e4de3cdf3d	[LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC). Instead of passing the start value and the defined value to widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be used to get both (and more in future patches).	2021-04-08 14:25:10 +01:00
Stephen Tozer	140757bfaa	[DebugInfo] Prevent invalid debug info being produced during LoopStrengthReduce During LoopStrengthReduce, some of the SSA values that are used by debug values may be lost and/or salvaged. After LSR we attempt to recover any undef debug values, including any that were salvaged but then lost their values afterwards, by replacing the lost values with any live equal values (plus a possible constant offset) that have been gathered prior to running LSR. When we do this we restore the debug value's original DIExpression, to undo any salvaging (as we have gone back to using the original debug value). This process can currently produce invalid debug info if the number of operands has changed by salvaging during LSR. Replacing old values during the applyEqualValues step does not change the number of location operands, which means that when we restore the old DIExpression we may have a mismatch between the number of operands used by the debug value and the number of operands referenced by the DIExpression. This patch fixes this by restoring the full original location metadata at the start of the applyEqualValues step, so that there is no mismatch in operand count between the debug value and its DIExpression. Differential Revision: https://reviews.llvm.org/D98644	2021-04-08 13:04:48 +01:00
David Green	8675ef100f	[LV] Logical and/or select costs D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly. This fixes 60% performance regressions from code like the attached test case. Differential Revision: https://reviews.llvm.org/D99884	2021-04-08 10:39:47 +01:00
Congzhe Cao	593cb46550	Revert "[LoopInterchange] Fix transformation bugs in loop interchange" This reverts commit 6ec68bd815d00c1eec2a6b9766452554f0e6cb61.	2021-04-07 21:17:30 -04:00
CongzheUalberta	f5645ea65f	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of (new) outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-07 20:55:44 -04:00
Sanjay Patel	c0bbd0cc35	[InstCombine] fold not ops around min/max intrinsics This is another step towards parity with the existing cmp+select folds (see D98152).	2021-04-07 17:31:36 -04:00
Craig Topper	5fc0e98d9a	[LoopIdiomRecognize] Minor cleanups to the FFS idiom matching. NFC -Make sure of the CreateShl/LShr/AShr methods that take a uint64_t instead of creating a ConstantInt for 1 ourselves. -Use Builder.getInt1 or ConstantInt::getBool instead of a conditional. -Pull out repeated calls to getType.	2021-04-07 10:03:14 -07:00
Roman Lebedev	24f67473dd	[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants All of the code that handles general constant here (other than the more restrictive APInt-dealing code) expects that it is an immediate, because otherwise we won't actually fold the constants, and increase instruction count. And it isn't obvious why we'd be okay with increasing the number of constant expressions, those still will have to be run.. But after `2829094a8e` this could also cause endless combine loops. So actually properly restrict this code to immediates.	2021-04-07 19:50:19 +03:00
Sanjay Patel	1894c6c59e	[InstCombine] avoid infinite loop from partial undef vectors This fixes the examples from D99674 and https://llvm.org/PR49878 The matchers succeed on partial undef/poison vector constants, but the transform creates a full 'not' (-1) constant, so it would undo a demanded vector elements change triggered by the extractelement. Differential Revision: https://reviews.llvm.org/D100044	2021-04-07 12:18:12 -04:00
wlei	6d5132b426	[CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it. `Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again. This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D99787	2021-04-07 08:48:59 -07:00
Alexey Bataev	a78e86e6be	[SLP]Avoid multiple attempts to vectorize CmpInsts. No need to lookup through and/or try to vectorize operands of the CmpInst instructions during attempts to find/vectorize min/max reductions. Compiler implements postanalysis of the CmpInsts so we can skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save compile time. Differential Revision: https://reviews.llvm.org/D99950	2021-04-07 06:15:42 -07:00
Sanjay Patel	0333ed8e0c	[InstCombine] move abs transform to helper function; NFC The swap of the operands can affect later transforms that are expecting a constant as operand 1. I don't think we can trigger a bug with the current code, but I hit that problem while drafting a new transform for min/max intrinsics.	2021-04-07 08:35:07 -04:00
Roman Lebedev	2829094a8e	Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) This reverts commit `a547b4e26b`, relanding commit `31d219d299`, which was reverted because there was a conflicting inverse transform, which was causing an endless combine loop, which has now been adjusted. Original commit message: https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-07 12:06:25 +03:00
Roman Lebedev	93d1d94b74	[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants I.e., if any/all of the consants is an expression, don't do it. Since those constants won't reduce into an immediate, but would be left as an constant expression, they could cause endless combine loops after `31d219d299` added an inverse transformation.	2021-04-07 12:06:24 +03:00
Petr Hosek	a547b4e26b	Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)" This reverts commit `31d219d299` which causes an infinite loop when compiling the XRay runtime.	2021-04-06 22:30:28 -07:00
Sidharth Baveja	d81d9e8b86	[SplitEdge] Update SplitCriticalEdge to return a nullptr only when the edge is not critical Summary: The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it can always split all critical edges, which is an incorrect assumption. The three cases where the function SplitCriticalEdge will return a nullptr is: 1. DestBB is an exception block 2. Options.IgnoreUnreachableDests is set to true and isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr 3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true) and it cannot be maintained for a loop due to indirect branches For each of these situations they are handled in the following way: 1. Modified the function ehAwareSplitEdge originally from llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB is an exception block. This function is called directly in SplitEdge. SplitEdge does not call SplitCriticalEdge in this case 2. Options.IgnoreUnreachableDests is set to false by default, so this situation does not apply. 3. Return a nullptr in this situation since the SplitCriticalEdge also returned nullptr. Nothing we can do in this case. Reviewed By: asbirlea Differential Revision:https://reviews.llvm.org/D94619	2021-04-06 21:24:40 +00:00
Philip Reames	4bf8985f4f	Replace calls to IntrinsicInst::Create with CallInst::Create [nfc] There is no IntrinsicInst::Create. These are binding to the method in the super type. Be explicitly about which method is being called.	2021-04-06 13:23:58 -07:00
Philip Reames	908215b346	Use AssumeInst in a few more places [nfc] Follow up to `a6d2a8d6f5`. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.	2021-04-06 13:18:53 -07:00
Philip Reames	9ef6aa020b	Plumb AssumeInst through operand bundle apis [nfc] Follow up to `a6d2a8d6f5`. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.	2021-04-06 12:53:53 -07:00
Luís Marques	0c3bc1f3a4	[ASan][RISCV] Fix RISC-V memory mapping Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and D87581). This should be an improvement both in terms of first principles soundness and observed test failures --- test failures would occur non-deterministically depending on the ASLR random offset. On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as `PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result be the not very round number 0x1555556000. That address had to be further rounded to ensure page alignment after the shadow scale shifting is applied. Still, that value explains why the mapping table may look less regular than expected. Further cleanups: - Moved the mapping table comment, to ensure that the two Linux/AArch64 tables stayed together; - Removed mention of Sv48. Neither the original mapping nor this one are compatible with an actual Linux Sv48 address space (mainline Linux still operates Sv48 in Sv39 mode). A future patch can improve this; - Removed the additional comments, for consistency. Differential Revision: https://reviews.llvm.org/D97646	2021-04-06 20:46:17 +01:00
Philip Reames	a6d2a8d6f5	Add a subclass of IntrinsicInst for llvm.assume [nfc] Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.	2021-04-06 11:16:22 -07:00
Arthur Eubanks	4e83e59eb8	[GVN] Add missing ICF update performScalarPREInsertion() inserts instructions into blocks that we need to tell ImplicitControlFlowTracking about, otherwise the ICF cache may be invalid. Fixes PR49193. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99909	2021-04-06 10:13:42 -07:00
Philip Reames	21d4839948	Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc] These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.	2021-04-06 08:33:15 -07:00
Philip Reames	52ecd94cfb	Remove last remnants of PR49607 migration [NFC] The key change (`4f5e92c`) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.	2021-04-06 07:56:55 -07:00
Jan Svoboda	fb6a5237aa	Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction" This reverts commit `167ea67d` This causes a bunch of build failures: * http://lab.llvm.org:8011/#/builders/121/builds/6287 * http://green.lab.llvm.org/green/job/clang-stage1-RA/19915	2021-04-06 16:33:28 +02:00
Benjamin Kramer	ce4acb01b3	Avoid unused variable warning in Release builds	2021-04-06 16:25:19 +02:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Roman Lebedev	31d219d299	[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-06 15:58:14 +03:00
Simon Pilgrim	b8aba76a4e	LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI. Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.	2021-04-06 12:24:20 +01:00
Abhina Sreeskantharajan	82b3e28e83	[SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text Problem: On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable. Solution: This patch adds two new flags - OF_CRLF which indicates that CRLF translation is used. - OF_TextWithCRLF = OF_Text \| OF_CRLF indicates that the file is text and uses CRLF translation. Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF. So this is the behaviour per platform with my patch: z/OS: OF_None: open in binary mode OF_Text : open in text mode OF_TextWithCRLF: open in text mode Windows: OF_None: open file with no carriage return OF_Text: open file with no carriage return OF_TextWithCRLF: open file with carriage return The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set. ``` if (Flags & OF_CRLF) CrtOpenFlags \|= _O_TEXT; ``` These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows. ./llvm/lib/Support/raw_ostream.cpp ./llvm/lib/TableGen/Main.cpp ./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp ./llvm/unittests/Support/Path.cpp ./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp ./clang/lib/Frontend/CompilerInstance.cpp ./clang/lib/Driver/Driver.cpp ./clang/lib/Driver/ToolChains/Clang.cpp Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99426	2021-04-06 07:23:31 -04:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Florian Hahn	a6b06b785c	[VPlan] Print VPValue operands for VPWidenPHI if possible. For VPWidenPHIRecipes that model all incoming values as VPValue operands, print those operands instead of printing the original PHI. D99294 updates recipes of reduction PHIs to use the VPValue for the incoming value from the loop backedge, making use of this new printing.	2021-04-06 12:11:21 +01:00
madhur13490	167ea67d76	[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction This patch enhances hasAddressTaken() to ignore bitcasts as a callee in callbase instruction. Such bitcast usage doesn't really take the address in a useful meaningful way. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D98884	2021-04-06 09:23:46 +00:00
Arthur Eubanks	ea0e2ca1ac	[SROA] Allow SROA on pointers with invariant group intrinsic uses When we are able to SROA an alloca, we know all uses of it, meaning we don't have to preserve the invariant group intrinsics and metadata. It's possible that we could lose information regarding redundant loads/stores, but that's unlikely to have any real impact since right now the only user is Clang and vtables. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99760	2021-04-05 19:53:40 -07:00
Ta-Wei Tu	6a82ace5f2	[LoopFusion] Bails out if only the second candidate is guarded (PR48060) If only the second candidate loop is guarded while the first one is not, fusioning two loops might not be valid but this check is currently missing. Fixes https://bugs.llvm.org/show_bug.cgi?id=48060 Reviewed By: sidbav Differential Revision: https://reviews.llvm.org/D99716	2021-04-06 01:08:56 +08:00
Sanjay Patel	c590a9880d	[InstCombine] fix potential miscompile in select value equivalence As shown in the example based on: https://llvm.org/PR49832 ...and the existing test, we can't substitute a vector value because the equality compare replacement that we are attempting requires that the comparison is true for the entire value. Vector select can be partly true/false.	2021-04-05 12:25:40 -04:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Roman Lebedev	2760a808b9	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778) This is identical to `781d077afb`, but for the other function. For certain shift amount bit widths, we must first ensure that adding shift amounts is safe, that the sum won't have an unsigned overflow. Fixes https://bugs.llvm.org/show_bug.cgi?id=49778	2021-04-04 23:26:41 +03:00
Roman Lebedev	dceb3e5996	[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper	2021-04-04 23:26:41 +03:00
Sanjay Patel	c0645f1324	[InstCombine] fold popcount of exactly one bit to shift This is discussed in https://llvm.org/PR48999 , but it does not solve that request. The difference in the vector test shows that some other logic transform is limited to scalar types.	2021-04-04 11:43:49 -04:00
Nikita Popov	9bad7de9a3	[SimplifyCFG] Handle two equal cases in switch to select When converting a switch with two cases and a default into a select, also handle the denegerate case where two cases have the same value. Generate this case directly as %or = or i1 %cmp1, %cmp2 %res = select i1 %or, i32 %val, i32 %default rather than %sel1 = select i1 %cmp1, i32 %val, i32 %default %res = select i1 %cmp2, i32 %val, i32 %sel1 as InstCombine is going to canonicalize to the former anyway.	2021-04-04 17:27:28 +02:00
Juneyoung Lee	5207cde5cb	[InstCombine] Conditionally fold select i1 into and/or This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or: ``` select cond, cond2, false -> and cond, cond2 ``` This is not safe if cond2 is poison whereas cond isn’t. Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s. To minimize its impact, this patch conservatively checks whether cond2 is an instruction that creates a poison or its operand creates a poison. This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99674	2021-04-04 14:11:28 +09:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Sanjay Patel	412fc74140	[InstCombine] fold not+or+neg ~((-X) \| Y) --> (X - 1) & (~Y) We generally prefer 'add' over 'sub', this reduces the dependency chain, and this looks better for codegen on x86, ARM, and AArch64 targets. https://llvm.org/PR45755 https://alive2.llvm.org/ce/z/cxZDSp	2021-04-02 13:16:36 -04:00
Dimitry Andric	6abb92f210	[SCCP] Avoid modifying AdditionalUsers while iterating over it When run under valgrind, or with a malloc that poisons freed memory, this can lead to segfaults or other problems. To avoid modifying the AdditionalUsers DenseMap while still iterating, save the instructions to be notified in a separate SmallPtrSet, and use this to later call OperandChangedState on each instruction. Fixes PR49582. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98602	2021-04-02 19:05:59 +02:00
Florian Hahn	8867fc69f0	[LV] Hoist mapping of IR operands to VPValues (NFC). This patch moves mapping of IR operands to VPValues out of tryToCreateWidenRecipe. This allows using existing VPValue operands when widening recipes directly, which will be introduced in future patches.	2021-04-02 17:57:20 +01:00
Philip Reames	2c4548e18e	[rs4gc] Use loops instead of straightline code for attribute stripping [nfc] Mostly because I'm about to add more attributes and the straightline copies get much uglier. What's currently there isn't too bad.	2021-04-02 09:25:15 -07:00
Philip Reames	a505801e2b	[rs4gc] Strip nofree and nosync attributes when lowering from abstract model The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering. I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.	2021-04-02 09:12:24 -07:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Jeroen Dobbelaere	b82b305cf9	[InstCombine] Fix out-of-bounds ashr(shl) optimization This fixes a crash found by the oss fuzzer and reported by @fhahn. The suggestion of @RKSimon seems to be the correct fix here. (See D91343). The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D99792	2021-04-02 13:45:11 +02:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Evgeniy Brevnov	2388aae401	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev, lebedev.ri Differential Revision: https://reviews.llvm.org/D88287	2021-04-02 15:30:13 +07:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Juneyoung Lee	c664769330	[AssumeBundles] offset should be added to correctly calculate align This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492). Consider this code: ``` call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)] %arrayidx = getelementptr inbounds i32, i32* %a, i64 -1 ; aligment of %arrayidx? ``` The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k. Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned. The reason why this happened is as follows. `DiffSCEV` stores `%arrayidx - %a` which is -4. `OffSCEV` stores the offset value of “align”, which is 28. `DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D98759	2021-04-02 12:32:05 +09:00
Philip Reames	91790c6785	[indvars[ Fix pr49802 by checking for SCEVCouldNotCompute The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts. We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler. (Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...) p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.	2021-04-01 17:53:44 -07:00
Philip Reames	b23a314146	[funcattrs] Respect nofree attribute on callsites (not just callee)	2021-04-01 14:45:49 -07:00
Philip Reames	1e69a5af92	[Attributor] Cleanup detection of non-relaxed atomics in nosync inference The code was checking for cases which are disallowed by the verifier. Delete dead code and adjust style.	2021-04-01 12:01:29 -07:00
Philip Reames	8e596f7e27	[Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC] Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic. By using the matcher class, we now do.	2021-04-01 11:49:59 -07:00
Philip Reames	6ef4505298	[funcattrs] Infer nosync from readnone and non-convergent This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (`0626367202`). This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive. Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D99749	2021-04-01 11:37:34 -07:00
Philip Reames	ffa15e9463	Extract isVolatile helper on Instruction [NFCI] We have this logic duplicated in several cases, none of which were exhaustive. Consolidate it in one place. I don't believe this actually impacts behavior of the callers. I think they all filter their inputs such that their partial implementations were correct. If not, this might be fixing a cornercase bug.	2021-04-01 11:24:02 -07:00
Philip Reames	6b05d753e0	Mark unordered memset/memmove/memcpy as nosync Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.	2021-04-01 10:38:54 -07:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Yevgeny Rouban	1ed53d44d8	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Max Kazantsev	a1d83776bf	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	630818a850	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro	5fac7c6046	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
qixingxue	62b74f7564	[GVN][NFC] Refactor analyzeLoadFromClobberingWrite This commit adjusts the order of two swappable if statements to make code cleaner. Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D99648	2021-04-01 08:35:35 +08:00
Roman Lebedev	43ded90094	[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader	2021-03-31 23:27:36 +03:00
Huihui Zhang	fe5c4a06a4	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Sanjay Patel	1462bdf1b9	[InstCombine] fold abs(srem X, 2) This is a missing optimization based on an example in: https://llvm.org/PR49763 As noted there and the test here, we could add a more general fold if that is shown useful. https://alive2.llvm.org/ce/z/xEHdTv https://alive2.llvm.org/ce/z/97dcY5	2021-03-31 11:29:20 -04:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Chuanqi Xu	eb51dd719f	[Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2 Summary: Try to insert dbg.declare to entry.resume basic block in resume function. In this way, we could print alloca such as __promise in gdb/lldb under O2, which would be beneficial to debug coroutine program. Test Plan: check-llvm Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D96938	2021-03-31 10:37:06 +08:00
Fangrui Song	3e5ee194c0	[SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `431a40e1e2`	2021-03-30 19:27:10 -07:00
Juneyoung Lee	431a40e1e2	[LoopUnswitch] Assert that branch condition is either and/or but not both as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321	2021-03-31 10:35:22 +09:00
Sanjay Patel	c2ebad8d55	[InstCombine] add fold for demand of low bit of abs() This is one problem shown in https://llvm.org/PR49763 https://alive2.llvm.org/ce/z/cV6-4K https://alive2.llvm.org/ce/z/9_3g-L	2021-03-30 15:14:37 -04:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
spupyrev	22998738e8	[SamplePGO] Keeping prof metadata for IndirectBrInst Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot. This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99550	2021-03-30 10:44:48 -07:00
Hongtao Yu	3e3fc431df	[CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351	2021-03-30 10:42:22 -07:00
Krasimir Georgiev	c51e91e046	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5178ffc7cf`. Compiling `llvm-profdata` with a compiler build from this produces a crashing binary.	2021-03-30 14:13:37 +02:00
Juneyoung Lee	6b4b1dc6ec	[LoopUnswitch] Simplify branch condition if it is select with constant operands This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 . `select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later transformations confused. Simplify the branch condition to not have the form.	2021-03-30 20:09:42 +09:00
Sander de Smalen	f71ed5dfe2	NFC: Migrate PartialInlining to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D97382	2021-03-30 11:59:45 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Huihui Zhang	ca721042f1	[IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism. Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test profile-context-tracker-debug.ll . Reviewed By: MaskRay, wenlei Differential Revision: https://reviews.llvm.org/D99547	2021-03-29 16:37:10 -07:00
Gulfem Savrun Yeniceri	5178ffc7cf	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-29 21:53:32 +00:00
Wenlei He	30b0232336	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146	2021-03-29 09:46:14 -07:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Matt Arsenault	9a0c9402fa	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `07e46367ba`.	2021-03-29 08:55:30 -04:00
Jingu Kang	e4abb64100	[LoopUnswitch] Use reference variables instead of pointer one Differential Revision: https://reviews.llvm.org/D99496	2021-03-29 13:08:46 +01:00
Hans Wennborg	c6e5c4654b	Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places Using $ breaks demangling of the symbols. For example, $ c++filt _Z3foov\$123 _Z3foov$123 This causes problems for developers who would like to see nice stack traces etc., but also for automatic crash tracking systems which try to organize crashes based on the stack traces. Instead, use the period as suffix separator, since Itanium demanglers normally ignore such suffixes: $ c++filt _Z3foov.123 foo() [clone .123] This is already done in some places; try to do it everywhere. Differential revision: https://reviews.llvm.org/D97484	2021-03-29 13:03:52 +02:00
Oliver Stannard	07e46367ba	Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute"" Reverting because test 'Bindings/Go/go.test' is failing on most buildbots. This reverts commit `fc9df30991`.	2021-03-29 11:32:22 +01:00
Jingu Kang	cfe87d4edd	[NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils Differential revision: https://reviews.llvm.org/D99490	2021-03-29 10:29:45 +01:00
Matt Arsenault	fc9df30991	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `20d5c42e0e`.	2021-03-28 13:35:21 -04:00
Sanjay Patel	01ae6e5ead	[InstCombine] sink min/max intrinsics with common op after select This is another step towards parity with cmp+select min/max idioms. See D98152.	2021-03-28 13:13:04 -04:00

... 16 17 18 19 20 ...

28640 Commits