llvm-project

Commit Graph

Author	SHA1	Message	Date
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Jianzhou Zhao	7e658b2fdc	[dfsan] Instrument origin variable and function definitions This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D96977	2021-02-18 23:50:05 +00:00
Hongtao Yu	e87b1b1d4e	[CSSPGO] Use callsite sample counts to annotate indirect call sites. With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling. I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D96990	2021-02-18 14:52:34 -08:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Ta-Wei Tu	f70cdc5b5c	[NPM] Properly reset parent loop after loop passes This fixes https://bugs.llvm.org/show_bug.cgi?id=49185 When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`. If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes. However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes. This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D96727	2021-02-19 02:50:53 +08:00
Jianzhou Zhao	406dc54903	[dfsan] Refactor defining TLS variables This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96941	2021-02-18 18:04:21 +00:00
Jianzhou Zhao	2e6cd338c6	[dfsan] Refactor runtime functions checking This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96940	2021-02-18 18:01:46 +00:00
Philip Reames	8666463889	[instcombine] Exploit UB implied by nofree attributes This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred. Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug. Differential Revision: https://reviews.llvm.org/D96349	2021-02-18 08:34:22 -08:00
Djordje Todorovic	c1e23894fc	Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info" This reverts rG8ee7c7e02953. One test is failing, I'll reland this as soon as possible.	2021-02-18 02:04:27 -08:00
Djordje Todorovic	8ee7c7e029	[Debugify] Make the debugify aware of the original (-g) Debug Info As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545	2021-02-18 01:52:16 -08:00
Joseph Huber	c3a3d20093	[LV] Add analysis remark for mixed precision conversions Floating point conversions inside vectorized loops have performance implications but are very subtle. The user could specify a floating point constant, or call a function without realizing that it will force a change in the vector width. An example of this behaviour is seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate when this happens becuase it is most likely unintended behaviour. This patch adds a simple check for this behaviour by following floating point stores in the original loop and checking if a floating point conversion operation occurs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D95539	2021-02-17 21:37:08 -05:00
Teresa Johnson	d55d46f43b	[WPD] Add an optional checking mode for debugging devirtualization This adds an internal option -wholeprogramdevirt-check which if enabled will guard each devirtualization with a runtime check against the expected target, and an invocation of a debug trap if the check fails. This is useful for debugging WPD failures involving undefined behavior (e.g. casting to another class type not in the inheritance chain). Differential Revision: https://reviews.llvm.org/D95969	2021-02-17 16:46:15 -08:00
Rong Xu	7397905ab0	[SampleFDO] Third Try: Refactor SampleProfile.cpp Apply the patch for the third time after fixing buildbot failures. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. (4) Add inline keyword to avoid duplicated symbols -- they will be removed later when the class is changed to a template. Differential Revision: https://reviews.llvm.org/D96455	2021-02-17 15:31:50 -08:00
Teresa Johnson	3c4c205060	[WPD][lld] Test handling of vtable definition from shared libraries Adds a lld test for a case that the handling added for dynamically exported symbols in `1487747e99` already fixes. Because isExportDynamic returns true when the symbol is SharedKind with default visibility, it will treat as dynamically exported and block devirtualization when the definition of a vtable comes from a shared library. This is desireable as it is dangerous to devirtualize in that case, since there could be hidden overrides in the shared library. Typically that happens when the shared library header contains available externally definitions, which applications can override. An example is std::error_category, which is overridden in LLVM and causing failures after a self build with WPD enabled, because libstdc++ contains hidden overrides of the virtual base class methods. The regular LTO case in the new test already worked, but there are 2 fixes in this patch needed for the index-only case and the hybrid LTO case. For the index-only case, WPD should not simply ignore available externally vtables. A follow on fix will be made to clang to emit type metadata for those vtables, which the new test is modeling. For the hybrid case, we need to ensure when the module is split that any llvm.*used globals are cloned to the regular LTO split module so available externally vtable definitions are not prematurely deleted. Another follow on fix will add the equivalent gold test, which requires a small fix to the plugin to treat symbols in dynamic libraries the same way lld already is. Differential Revision: https://reviews.llvm.org/D96721	2021-02-17 12:49:24 -08:00
Vedant Kumar	c28622fbf3	Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455" This reverts commit `c73cbf218a`. Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)" This reverts commit `a23e6b321c`. Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" This reverts commit `6fd5ccff72`. Still seeing link failures when building llc (or other tools), due to the new SampleProfileLoaderBaseImpl.h containing definitions that get duplicated across multiple TU's. ``` duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const) const' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const, llvm::BasicBlock const*>)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) ```	2021-02-17 10:22:24 -08:00
William S. Moses	40862b1a74	[SROA] Propagate correct TBAA/TBAA Struct offsets SROA does not correctly account for offsets in TBAA/TBAA struct metadata. This patch creates functionality for generating new MD with the corresponding offset and updates SROA to use this functionality. Differential Revision: https://reviews.llvm.org/D95826	2021-02-17 11:59:00 -05:00
Ta-Wei Tu	0eeaec2a6d	[NFC] Refactor LoopInterchange into a loop-nest pass This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change. Changes that are not loop-nest related are split to D96650. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D96644	2021-02-18 00:55:38 +08:00
Sjoerd Meijer	f78aa8b2c2	[LSR] Add a flag that overrides the target's preferred addressing mode This adds a new flag -lsr-preferred-addressing-mode to override the target's preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is equivalent to preindexed addressing that is one of the options that -lsr-preferred-addressing-mode accepts. Differential Revision: https://reviews.llvm.org/D96855	2021-02-17 16:50:21 +00:00
Sanjay Patel	85294703a7	[InstCombine] fold fcmp-of-copysign idiom As discussed in: https://llvm.org/PR49179 ...this pattern shows up in library code. There are several potential generalizations as noted, but we need to be careful that we get FP special-values right, and it's not clear how much variation we should expect to see from this exact idiom.	2021-02-17 10:32:33 -05:00
Sjoerd Meijer	5a641cf194	Follow up of rGdea4a63e6359, which committed a slightly different version than intended.	2021-02-17 10:07:26 +00:00
Sjoerd Meijer	dea4a63e63	[LSR] Cleanup of getPreferredAddresingMode. NFC. This is a follow up D96600 and cleans up most calls to getPreferredAddresingMode. I.e., we really don't need to query the same things again and again, but get the preferred addressing mode once for each loop. So this should be a lot friendlier for compile times, especially if we start implementing getPreferredAddresingMode. Differential Revision: https://reviews.llvm.org/D96772	2021-02-17 09:45:29 +00:00
Rong Xu	6fd5ccff72	[SampleFDO] Reapply: Refactor SampleProfile.cpp Reapply patch after fixing buildbot failure. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 16:43:21 -08:00
Mehdi Amini	c761fe77bd	Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp" This reverts commit `310b35304c`. The build is broken with -DBUILD_SHARED_LIBS=ON : lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const, llvm::ProfileSummaryInfo, bool)': SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const' SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const' ...	2021-02-16 22:11:42 +00:00
Rong Xu	310b35304c	[SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 11:18:21 -08:00
Arnold Schwaighofer	627cfd4394	[coro async] Don't promote allocas to the frame or rewrite swifterror if there are no suspend points Also don't call function to update the call graph if there are no clones. The function will fail. rdar://74277860 Differential Revision: https://reviews.llvm.org/D96620	2021-02-16 09:05:38 -08:00
Ta-Wei Tu	6b612a7baf	[NFC][LoopInterchange] Explicitly pass both `InnerLoop` and `OuterLoop` to `processLoop` This is a split patch of D96644. Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`. Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96650	2021-02-16 22:17:44 +08:00
Florian Hahn	f64c626069	[VPlan] Remove unused Phi member from VPWidenPHIRecipe (NFC). The member is not needed any longer after recent changes.	2021-02-16 13:53:06 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Sander de Smalen	00fe10c6a6	[SCEVExpander] Migrate costAndCollectOperands to use InstructionCost. This patch changes costAndCollectOperands to use InstructionCost for accumulated cost values. isHighCostExpansion will return true if the cost has exceeded the budget. Reviewed By: CarolineConcatto, ctetreau Differential Revision: https://reviews.llvm.org/D92238	2021-02-16 09:27:34 +00:00
Florian Hahn	54a14c264a	[VPlan] Manage scalarized values using VPValues. This patch updates codegen to use VPValues to manage the generated scalarized instructions. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92285	2021-02-16 09:04:10 +00:00
Akira Hatanaka	32dc79c5ef	[ObjC][ARC] Do not perform code motion on precise release calls This fixes a bug where an object can get deallocated before reaching the end of its full formal lifetime. rdar://72110887 rdar://74123176	2021-02-15 17:39:37 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Max Kazantsev	e3c759bd58	[LoopLoadElim] Pass ScalarEvolution in old pass manager. PR49141 Loop canonicalization may end up deleting blocks from CFG. And Scalar Evolution may still keep cached referenced to those blocks unless updated properly.	2021-02-15 18:08:23 +07:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Florian Hahn	3df5d5aace	[ConstraintElimination] Fix variables used for pattern matching. Re-using the matched variable in the pattern does not work as expected. This patch fixes that by introducing a new variable for the 2nd level match.	2021-02-14 18:42:37 +00:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Sanjay Patel	b40fde062c	[InstCombine] fold fdiv with pow divisor (PR49147) This is unusual in the general (non-reciprocal) case because we need an extra instruction, but that should be better for general FP reassociation and codegen. We conservatively check for "arcp" FMF here as we do with existing fdiv folds, but it is not strictly necessary to have that. This is part of solving: https://llvm.org/PR49147 (The powi variant potentially has a different constraint.) Differential Revision: https://reviews.llvm.org/D96648	2021-02-14 08:07:36 -05:00
Juneyoung Lee	ed253ef772	[LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask This patch fixes pr48832 by correctly generating the mask when a poison value is involved. Consider this CFG (which is a part of the input): ``` for.body: ; preds = %for.cond br i1 true, label %cond.false, label %land.rhs land.rhs: ; preds = %for.body br i1 poison, label %cond.end, label %cond.false cond.false: ; preds = %for.body, %land.rhs br label %cond.end cond.end: ; preds = %land.rhs, %cond.false %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ] ``` The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead. The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation. SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026). Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D95217	2021-02-14 21:12:34 +09:00
Teresa Johnson	a80232bd5f	[LTT] Address post-review comments (NFC) Implement some post-review cleanup suggestions for D96083.	2021-02-13 15:52:59 -08:00
Tyker	642e9225c6	reland [InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-13 13:03:11 +01:00
Wei Wang	80dc0661bd	[LTO] Perform DSOLocal propagation in combined index Perform DSOLocal propagation within summary list of every GV. This avoids the repeated query of this information during function importing. Differential Revision: https://reviews.llvm.org/D96398	2021-02-12 22:58:26 -08:00
Arnold Schwaighofer	e760ec2a01	[coro] Add support for polymorphic return typed coro.suspend.async This allows for suspend point specific resume function types. Return values from a suspend point can therefore be modelled as arguments to the resume function. Allowing for directly passed return types. Differential Revision: https://reviews.llvm.org/D96136	2021-02-12 10:08:00 -08:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Kerry McLaughlin	fea06efe7c	[SVE][LoopVectorize] Support for vectorization of loops with function calls Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs and adds some simple tests for loops containing a function for which there is a vectorized variant available. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96356	2021-02-12 13:47:43 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
Florian Hahn	85fe5c9345	[VPlan] Make VPRecipeBase inherit from VPUser directly (NFC). The individual recipes have been updated to manage their operands using VPUser a while back. Now that the transition is done, we can instead make VPRecipeBase a VPUser and get rid of the toVPUser helper.	2021-02-12 13:06:58 +00:00
David Sherwood	01b87444cb	[NFC][Analysis] Change struct VecDesc to use ElementCount This patch changes the VecDesc struct to use ElementCount instead of an unsigned VF value, in preparation for future work that adds support for vectorized versions of math functions using scalable vectors. Since all I'm doing in this patch is switching the type I believe it's a non-functional change. I changed getWidestVF to now return both the widest fixed-width and scalable VF values, but currently the widest scalable value will be zero. Differential Revision: https://reviews.llvm.org/D96011	2021-02-12 11:07:58 +00:00
David Sherwood	9700228abc	[Analysis] Change VFABI::mangleTLIVectorName to use ElementCount Adds support for mangling TLI vector names for scalable vectors. Differential Revision: https://reviews.llvm.org/D96338	2021-02-12 09:38:12 +00:00
Kazu Hirata	9dc62d1dc1	[PGO] Drop unnecessary const from return types (NFC)	2021-02-11 23:31:29 -08:00
Hongtao Yu	de40f6d623	[CSSPGO] Process functions in a top-down order on a dynamic call graph. Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining. The profile top-down order for SCC is also extended to support non-CS profiles. Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95988	2021-02-11 12:36:59 -08:00
Michael Kruse	606aa622b2	Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache" This reverts commit `b7d870eae7` and the subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)" (commit `e6810cab09`). It caused indeterminism in the output, such that e.g. the polly-x86_64-linux buildbot failed accasionally.	2021-02-11 12:17:38 -06:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Markus Lavin	9498315c9b	Expand masked mem intrinsics correctly wrt big-endian Need to take endianness into account when doing vector to scalar casts such as %bc = bitcast <8 x i1> %v to i8 Companion commit for https://reviews.llvm.org/D94867 Upload in response to https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html Attempting to document the actual memory layout rules for vectors in https://reviews.llvm.org/D94964 Differential Revision: https://reviews.llvm.org/D94765	2021-02-11 08:59:52 +00:00
Sander de Smalen	be9bbb57f4	[LoopVectorize] NFC: Change selectVectorizationFactor to work on ElementCount. This patch is NFC and changes occurrences of `unsigned Width` and `unsigned i` to work on type ElementCount instead. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96019	2021-02-11 08:47:59 +00:00
Kazu Hirata	d12a0f4fc0	[GCOV] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-02-10 20:01:18 -08:00
Duncan P. N. Exon Smith	fa35c1f80f	ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and `MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more precisely describe its effect and clarify the header documentation. Found this while helping to investigate PR48841, which pointed out an unsound use of the flag in `CloneModule()`. For now I've just added a FIXME there, but I'm hopeful that the new (more precise) name will prevent other similar errors.	2021-02-10 16:53:21 -08:00
Benjamin Kramer	8fb4a4f7bb	[SampleFDO] Silence -Wnon-virtual-dtor warning There's no polymorphic deletion happening here.	2021-02-10 23:37:15 +01:00
Rong Xu	db0d7d0ba9	[SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen Break SampleProfileLoader into to a base and a derived class. Base class (SampleProfileLoaderBaseImpl) includes the common code for IR and MachineIR (CodeGen) sample loader. It will be templatelized in the later patch. Inline and Probe related code will remain in the derived class of SampleProfileLoader and stays in SampleProfile.cpp. We need to refactor some functions: (1) getInstWeight() to enable the code sharing -- put the core into getInstWeightImpl(). (2) emitAnnotation() and propagateWeights() to carve out the code specific to SampleProfileLoader. (3) make getInstWeight() and findFunctionSamples() virtual and override in SampleProfileLoader as they need to access the fields in the derived class. Differential Revision: https://reviews.llvm.org/D95832	2021-02-10 13:29:15 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Adrian Prantl	19fc8eede4	Add missing nullptr check. salvageDebugInfoImpl() may fail and return a nullptr.	2021-02-10 12:15:24 -08:00
Sanjay Patel	6e2053983e	[InstCombine] fold lshr(mul X, SplatC), C2 This is a special-case multiply that replicates bits of the source operand. We need this fold to avoid regression if we make canonicalization to `mul` more aggressive for shl+or patterns. I did not see a way to make Alive generalize the bit width condition for even-number-of-bits only, but an example of the proof is: Name: i32 Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2)) %m = mul nuw i32 %x, C1 %t = lshr i32 %m, C2 => %t = and i32 %x, C1 - 2 Name: i14 %m = mul nuw i14 %x, 129 %t = lshr i14 %m, 7 => %t = and i14 %x, 127 https://rise4fun.com/Alive/e52	2021-02-10 15:02:31 -05:00
Sander de Smalen	9db6e97a86	[LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount. This patch is NFC and changes occurrences of `unsigned MaxVectorSize` to work on type ElementCount. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D96018	2021-02-10 08:52:10 +00:00
Tyker	5652e192fc	Revert "[InstCombine] convert assumes to operand bundles" This reverts commit `5eb2e994f9`.	2021-02-10 01:32:00 +01:00
Florian Hahn	fd8afa41eb	[VPlan] Use VPUser to manage CondBit VP blocks keep track of a condition, which is a VPValue. This patch updates VPBlockBase to manage the value using VPUser, so replaceAllUsesWith properly updates the condition bit as well. This is required to enable VP2VP transformations and it helps with simplifying some of the code required to manage condition bits. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95382	2021-02-09 21:53:50 +00:00
Johannes Doerfert	81429abd83	[Attributor][FIX] Do not create UB by introducing a `noundef undef` This was reported as PR49104. The reproducer uses varargs but the issue is the same, we know an argument is dead but can't change the signature for some reason. The PR49104 situation was: We are in an CG-SCC traversal and we remove all the uses of an argument and proof it thereby dead. However, if we do not remove the argument, via signature rewrite, we need to ensure that the `undef` we introduce at the call site doesn't clash with a `noundef` attribute.	2021-02-09 13:02:38 -06:00
Tyker	5eb2e994f9	[InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-09 19:33:53 +01:00
Jianzhou Zhao	9887fdebd6	[dfsan] Refactor loadShadow To simplify the review of https://reviews.llvm.org/D95835. Reviewed-by: gbalats, morehouse Differential Revision: https://reviews.llvm.org/D96180	2021-02-09 17:21:41 +00:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Chuanqi Xu	88d7876e1e	[NFC] [Coroutine] Remove Unused Variables	2021-02-09 15:55:06 +08:00
Kazu Hirata	302313a264	[Transforms] Use range-based for loops (NFC)	2021-02-08 22:33:53 -08:00
Kazu Hirata	de6c49ae31	[Transforms/Utils] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-08 22:33:49 -08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Arthur Eubanks	0eda454796	[SimpleLoopUnswitch] Don't non-trivially unswitch loops that are unsafe to clone Non-trivial unswitching can clone loops. The legacy -loop-unswitch pass also checks for this. Fixes PR49085. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D96288	2021-02-08 13:19:24 -08:00
Jianzhou Zhao	64b448b983	[dfsan] Refactor visitCallBase To simplify the review of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96177	2021-02-08 19:55:18 +00:00
Florian Hahn	68dc90b347	[ConstraintElimination] Decompose a few more GEP indices. This patch adds handling for zero-extended GEP indices.	2021-02-08 18:06:38 +00:00
Florian Hahn	1f1f037ed3	[ConstraintElimination] Improve index handing during constraint building. This patch improves the index management during constraint building. Previously, the code rejected constraints which used values that were not part of Value2Index, but after combining the coefficients of the new indices were 0 (if ShouldAdd was 0). In those cases, no new indices need to be added. Instead of adding to Value2Index directly, add new indices to the NewIndices map. The caller can then check if it needs to add any new indices. This enables checking constraints like `a + x <= a + n` to `x <= n`, even if there is no constraint for `a` directly.	2021-02-08 13:05:13 +00:00
Florian Hahn	ca268ed285	[ConstraintElimination] Decompose zext for unsigned compares. For unsigned compares, zext should be a no-op and we can add the extended value to the constraint system.	2021-02-07 20:53:06 +00:00
Florian Hahn	3bb6dc0b26	[LV] Replace some uses of VectorLoopValueMap with VPTransformState (NFC) This patch updates some places where VectorLoopValueMap is accessed directly to instead go through VPTransformState. As we move towards managing created values exclusively in VPTransformState, this ensures the use always can fetch the correct value. This is in preparation for D92285, which switches to managing scalarized values through VPValues. In the future, the various fix* functions should be moved directly into the VPlan codegen stage. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95757	2021-02-07 18:28:21 +00:00
Kazu Hirata	be23012d5a	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-07 09:49:36 -08:00
Sanjay Patel	6fd91be354	[Reassociate] allow or->add with shl operands As discussed in: https://llvm.org/PR49055 We invert instcombine's add->or transform here because it makes it easier to identify factorization transforms like the mul in the motivating test. This extends the logic added with: https://reviews.llvm.org/rG70472f3 https://reviews.llvm.org/rG93f3d7f (I intentionally kept the formatting fix in this patch to provide more context about the calling logic.)	2021-02-07 09:45:19 -05:00
Florian Hahn	853c52c988	[ConstraintElimination] Require GEPs to be inbounds for decomposition. During decomposition, we assume the no-wrap properties of inbound GEPs. Make sure the decomposed GEP is actually inbounds.	2021-02-07 11:08:53 +00:00
Johannes Doerfert	b7d870eae7	[AssumptionCache] Avoid dangling llvm.assume calls in the cache PR49043 exposed a problem when it comes to RAUW llvm.assumes. While D96106 would fix it for GVNSink, it seems a more general concern. To avoid future problems this patch moves away from the vector of weak reference model used in the assumption cache. Instead, we track the llvm.assume calls with a callback handle which will remove itself from the cache if the call is deleted. Fixes PR49043. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96168	2021-02-06 12:18:39 -06:00
Teresa Johnson	3a27933ec2	[LTT] Don't attempt to lower type tests used only by assumes Type tests used only by assumes were original for devirtualization, but are meant to be kept through the first invocation of LTT so that they can be used for additional optimization. In the regular LTO case where the IR is analyzed we may find a resolution for the type test and end up rewriting the associated vtable global, which can have implications on section splitting. Simply ignore these type tests. Fixes PR48245. Differential Revision: https://reviews.llvm.org/D96083	2021-02-06 09:02:10 -08:00
Sander de Smalen	79a6cfc29e	NFC: Migrate LoopIdiomRecognize to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html	2021-02-06 14:39:19 +00:00
Sander de Smalen	ae27274b2f	NFC: Migrate LoopFlatten to work on InstructionCost. This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96029	2021-02-06 11:47:04 +00:00
Kazu Hirata	ea3175c15b	[Transforms/Instrumentation] Use range-based for loops (NFC)	2021-02-05 21:02:08 -08:00
Sidharth Baveja	22ebbc4765	LoopUnrollAndJam] Only allow loops with single exit(ing) blocks Summary: This resolves an issue posted on Bugzilla. https://bugs.llvm.org/show_bug.cgi?id=48764 In this issue, the loop had multiple exit blocks, which resulted in the function getExitBlock to return a nullptr, which resulted in hitting the assert. This patch ensures that loops which only have one exit block as allowed to be unrolled and jammed. Reviewed By: Whitney, Meinersbur, dmgreen Differential Revision: https://reviews.llvm.org/D95806	2021-02-05 16:10:53 +00:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Arnold Schwaighofer	8a7f5ad0fd	We can only move static allocas into the resume entry points Dynamic allocas that still exist have been verified to be only used 'locally' not accross a suspend point. rdar://73903220 Differential Revision: https://reviews.llvm.org/D96071	2021-02-05 06:06:10 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Adrian Kuegel	7fe41ac3df	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `3e5ce49e53`. Tests started failing on PPC, for example: http://lab.llvm.org:8011/#/builders/105/builds/5569	2021-02-05 12:51:03 +01:00
Simon Pilgrim	f7d07dbb29	IROutliner.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:38:09 +00:00
Simon Pilgrim	476b912e7c	SampleProfile.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:31:17 +00:00
Simon Pilgrim	89edda7084	IROutliner.cpp - fix Wdocumentation warnings. NFCI.	2021-02-05 11:21:00 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Kazu Hirata	fb74e1e78a	[Transforms/Scalar] Use range-based for loops (NFC)	2021-02-04 21:18:05 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Philip Reames	3e5ce49e53	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way. Differential Revision: https://reviews.llvm.org/D94892	2021-02-04 17:28:30 -08:00
Richard Smith	ab243efb26	Don't infer attributes on '::operator new'. These attributes were all incorrect or inappropriate for LLVM to infer: - inaccessiblememonly is generally wrong; user replacement operator new can access memory that's visible to the caller, as can a new_handler function. - willreturn is generally wrong; a custom new_handler is not guaranteed to terminate. - noalias is inappropriate: Clang has a flag to determine whether this attribute should be present and adds it itself when appropriate. - noundef and nonnull on the return value should be specified by the frontend on all 'operator new' functions if we want them, not here. In any case, inferring attributes on functions declared 'nobuiltin' (as these are when Clang emits them) seems questionable.	2021-02-04 13:59:49 -08:00
Richard Smith	1484ad4137	Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete" Several of the new attributes here were incorrect, and even the ones that are generally correct were being added even to nobuiltin calls. This reverts commit `bb3f169b59`.	2021-02-04 13:59:49 -08:00
Sander de Smalen	75b2555d6e	NFC: Migrate LoopUnrollPass to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D95817	2021-02-04 14:05:40 +00:00
Florian Hahn	703f6a6828	[ConstraintElimination] Support conditions from loop preheaders This patch extends the condition collection logic to allow adding conditions from pre-headers to loop headers, by allowing cases where the target block dominates some of its predecessors.	2021-02-04 13:58:32 +00:00
Chuanqi Xu	9511fa2dda	[NFC][Coroutine] Remove redundant comment The functionallity in the TODO was added before: https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b	2021-02-04 12:54:30 +08:00
Kazu Hirata	be37475897	[Transforms/IPO] Use range-based for loops (NFC)	2021-02-03 20:41:20 -08:00
Nico Weber	b995314143	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `97ba5cde52`. Still breaks tests: https://reviews.llvm.org/D76802#2540647	2021-02-03 19:14:34 -05:00
Arthur Eubanks	f020544601	[NewPM][HelloWorld] Move HelloWorld to Utils To prevent creating a new component, which creates a new library. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D95907	2021-02-03 12:59:40 -08:00
Rong Xu	b8f13db5b7	[SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker This patch detaches SampleProfileLoader from class SampleCoverageTracker. We plan to move SampleProfileLoader to a template class. This would remain SampleCoverageTracker as a class. Also make callsiteIsHot() as a file static function. Differential Revision: https://reviews.llvm.org/D95823	2021-02-03 11:38:04 -08:00
Florian Hahn	daaa0e3501	[VPlan] Manage induction value creation using VPValues. This patch updates the induction value creation to use VPValues of recipes to map the created values. This should bring is one step closer to being able to optimize induction recipes directly in VPlan. Currently widenIntOrFpInduction also generates vector values for a cast of the induction, if it exists. Make this explicit by adding the cast instruction to the values defined by the recipe. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92284	2021-02-03 17:45:03 +00:00
David Sherwood	d4626eb0bd	[VPlan][NFC] Introduce constructors for VPIteration This patch adds constructors to VPIteration as a cleaner way of initialising the struct and replaces existing constructions of the form: {Part, Lane} with VPIteration(Part, Lane) I have also added a default constructor, which is used by VPlan.cpp when deciding whether to replicate a block or not. This refactoring will be required in a later patch that adds more members and functions to VPIteration. Differential Revision: https://reviews.llvm.org/D95676	2021-02-03 08:52:27 +00:00
Petr Hosek	97ba5cde52	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-02 23:19:51 -08:00
Kazu Hirata	dc3d5453bc	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-02 22:52:47 -08:00
Florian Hahn	d8e90716df	[ConstraintElimination] Skip pointer casts. We should be able to look through pointer casts that do not impact the value.	2021-02-02 21:25:29 +00:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Fangrui Song	51da12680f	[ConstraintElimination] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build	2021-02-02 10:23:14 -08:00
Jeroen Dobbelaere	50c523a9d4	[InlineFunction] Only update noalias scopes once for an instruction. Inlining sometimes maps different instructions to be inlined onto the same instruction. We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best). This patch ensures that we only replace scopes for which the mapping is known. This approach is preferred over tracking which instructions we already handled in a SmallPtrSet, as that one will need more memory. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95862	2021-02-02 17:57:10 +01:00
Florian Hahn	3e09bc2500	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
Wenlei He	1645f465be	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan:	2021-02-02 07:55:08 -08:00
Roman Lebedev	485c4b552b	[InstCombine] Host inversion out of ashr's value operand (PR48995) This is a yet another hint that we will eventually need InstCombineInverter, which would consistently sink inversions, but but for that we'll need to consistently hoist inversions where possible, so let's do that here. Example of a proof: https://alive2.llvm.org/ce/z/78SbDq See https://bugs.llvm.org/show_bug.cgi?id=48995	2021-02-02 17:56:43 +03:00
Tom Weaver	4f1320b77d	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `df3e39f60b`. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
Sander de Smalen	3d3ca8f8eb	NFC: Migrate SpeculateAroundPHIs to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: ctetreau Differential Revision: https://reviews.llvm.org/D95353	2021-02-02 13:32:45 +00:00
Sander de Smalen	00da322788	NFC: Migrate SimpleLoopUnswitch to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95352	2021-02-02 13:32:44 +00:00
Adrian Kuegel	48ca6da9d2	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit `9a03058d63`.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	3a65ec4bf9	Revert "Fix build break from D95024" This reverts commit `09cd849fde`.	2021-02-02 11:51:04 +01:00
David Sherwood	d4d4ceeb8f	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Wenlei He	09cd849fde	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	9a03058d63	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Petr Hosek	df3e39f60b	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Hongtao Yu	224fee8219	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791	2021-02-01 13:56:40 -08:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Florian Hahn	0b28d756af	[ConstraintElimination] Add support for EQ predicates. A == B map to A >= B && A <= B (https://alive2.llvm.org/ce/z/_dwxKn). This extends the constraint construction to return a list of constraints, which can be used to properly de-compose nested AND & OR.	2021-02-01 20:48:31 +00:00
Michael Holman	8bfef78722	[ConstantHoisting] Fix bug where constant materialization could insert into EH pad If the incoming block to a phi node is an EH pad, then we will materialize into an EH pad, which is not supposed to happen. To fix this, I added a check to see if incoming block of a phi node is an EH pad before using it as the insertion point. Differential Revision: https://reviews.llvm.org/D95019	2021-02-01 11:23:56 -08:00
Sanjay Patel	0ce2920f17	[InstCombine] try to narrow min/max intrinsics with constant operand The constant trunc/ext may not be the optimal pre-condition, but I think that handles the common cases. Example of Alive2 proof: https://alive2.llvm.org/ce/z/sREeLC This is another step towards canonicalizing to the intrinsics. Narrowing was identified as source of potential regression for abs(), so we need to handle this for min/max - see: https://llvm.org/PR48816 If this is not enough, we could process intrinsics in the trunc-driven matching in canEvaluateTruncated().	2021-02-01 13:44:13 -05:00
Florian Hahn	ce190e4144	[ConstraintElimination] Negate IR condition directly. Instead of using ConstraintSystem::negate when adding new constraints, flip the condition in IR. The main advantage is that EQ predicates can be represented by 2 constraints, which makes negating based on the constraint tricky. The IR condition can easily negated.	2021-02-01 17:21:40 +00:00
Sander de Smalen	bf294953e7	NFC: Migrate SimplifyCFG to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D95351	2021-02-01 16:14:05 +00:00
Sander de Smalen	880b64aa22	[SimplifyCFG] NFC: Rename static methods to clang-tidy standards. This patch is a precursor to D95351, which changes the signature of these methods.	2021-02-01 16:14:05 +00:00
Cullen Rhodes	8cda227432	[LV] Fix crash when computing max VF too early D90687 introduced a crash: llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed. when compiling the following C code: typedef struct { char a; } b; b *c; int d, e; int f() { int g = 0; for (; d; d++) { e = 0; for (; e < c[d].a; e++) g++; } return g; } with: clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o - This occurred since prior to D90687 computeFeasibleMaxVF would only be called in computeMaxVF when a scalar epilogue was allowed, but now it's always called. This causes the assert above since computeFeasibleMaxVF collects all viable VFs larger than the default MaxVF, and for each VF calculates the register usage which results in analysis being done the assert above guards against. This can occur in computeFeasibleMaxVF if TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in the hexagon backend to always return true. Reported by @iajbar. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94869	2021-02-01 12:14:59 +00:00
Sander de Smalen	3b8a1d581e	NFC: Migrate SpeculativeExecution to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95356	2021-02-01 12:13:23 +00:00
Florian Hahn	a9583a1923	[LoopUnswitch] Pacify compiler warnings. Attempt to fix some compiler warnings on some bots after `b8c81fa5c7`.	2021-02-01 09:13:06 +00:00
Florian Hahn	b8c81fa5c7	[LoopUnswitch] Add shortcut if unswitched path is a no-op. If we determine that the invariant path through the loop has no effects, we can directly branch to the exit block, instead to unswitching first. Besides avoiding some extra work (unswitching first, then deleting the loop again) this allows to be more aggressive than regular unswitching with respect to cost-modeling. This approach should always be be desirable. This is similar in spirit to D93734, just that it uses the previously added checks for loop-unswitching. I tried to add the required no-op checks from scratch, as we only check a subset of the loop. There is potential to unify the checks with LoopDeletion, at the cost of adding a predicate whether a block should be considered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95468	2021-02-01 09:03:30 +00:00
Jeroen Dobbelaere	80cdd30eb9	[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed. The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95544	2021-02-01 10:01:17 +01:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Florian Hahn	39486753d5	[ConstraintElimination] Verify CS and DFSInStack are in sync.(NFC) After the main loop is done, we should have one constraint per item in DFSInStack. Otherwise we added a constraint without a proper DFSInStack item.	2021-01-30 18:27:04 +00:00
Florian Hahn	10c57268c0	[LoopUnswitch] Properly update MSSA if header has non-clobbering stores. This patch fixes updating MemorySSA if the header contains memory defs that do not clobber a duplicated instruction. We need to find the first defining access outside the loop body and use that as defining access of the duplicated instruction. This fixes a crash caused by `bee486851c`.	2021-01-30 13:51:05 +00:00
Kazu Hirata	8ed1636184	[llvm] Use isa instead of dyn_cast (NFC)	2021-01-29 23:23:37 -08:00
Roman Lebedev	c2534a7097	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Roman Lebedev	a78d8feb48	[LowerConstantIntrinsics] Preserve Dominator Tree, if avaliable	2021-01-30 01:14:50 +03:00
Sriraman Tallam	9a81a4ef79	Emit metadata when instr. profiles hash mismatch occurs. This patch emits "instr_prof_hash_mismatch" function annotation metadata if there is a hash mismatch while applying instrumented profiles. During the PGO optimized build using instrumented profiles, if the CFG of the function has changed since generating the profile, a hash mismatch is encountered. This patch emits this information as annotation metadata. We plan to use this with Propeller which is done at the machine IR level. Propeller is usually applied on top of PGO and a hash mismatch during PGO could be used to detect source drift. Differential Revision: https://reviews.llvm.org/D95495	2021-01-29 12:56:01 -08:00
Florian Hahn	f3a710cade	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Yang Fan	59bd2068e9	[NFC][ScalarizeMaskedMemIntrin] Fix unused variable warning GCC warning: ``` /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedStore(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:295:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 295 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedScatter(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:555:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 555 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ ```	2021-01-29 15:15:58 +08:00
Roman Lebedev	056385921d	[ScalarizeMaskedMemIntrin] Preserve Dominator Tree, if avaliable This de-pessimizes the arguably more usual case of no masked mem intrinsics, and gets rid of one more Dominator Tree recalculation. As per llvm/test/CodeGen/X86/opt-pipeline.ll, there's one more Dominator Tree recalculation left, we could get rid of.	2021-01-29 01:11:36 +03:00
Roman Lebedev	577fdcaa93	[PartiallyInlineLibCalls] Preserve Dominator Tree, if avaliable This doesn't get rid of any Dominator Tree recalculations just yet, there is one more pass to update..	2021-01-29 01:11:36 +03:00
Roman Lebedev	573f74117b	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedCompressStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	2e4bb3f119	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedExpandLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	e8efc03a1e	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedScatter(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	1356399a11	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedGather(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	22b8421156	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	0ea45a412a	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	394685481c	[NFC][PartiallyInlineLibCalls] Port to SplitBlockAndInsertIfThen() This makes follow-up patch for Dominator Tree preservation somewhat more straight-forward.	2021-01-29 01:11:33 +03:00
Roman Lebedev	2de2d84ed0	[NFC][EntryExitInstrumenter] Mark Dominator Tree as preserved in legacy-PM too This is correctly handled in new-PM wrappers, but not in old-PM.	2021-01-29 01:11:33 +03:00
Adrian Prantl	62140d943c	Better document the limitations of coro::salvageDebugInfo() and fix a few edge cases that show up in the Swift compiler but weren't caught by the existing tests. Most notably the old code wasn't salvaging load operations correctly. The patch also gets rid of the LoadFromFramePtr argument and replaces it with a more generalized mechanism.	2021-01-28 09:53:19 -08:00
Roman Lebedev	8cfa963463	[SimplifyCFG] If provided, preserve Dominator Tree SimplifyCFG is an utility pass, and the fact that it does not preserve DomTree's, forces it's users to somehow workaround that, likely by not preserving DomTrees's themselves. Indeed, simplifycfg pass didn't know how to preserve dominator tree, it took me just under a month (starting with `e113317958`) do rectify that, now it fully knows how to, there's likely some problems with that still, but i've dealt with everything i can spot so far. I think we now can flip the switch. Note that this is functionally an NFC change, since this doesn't change the users to pass in the DomTree, that is a separate question. Reviewed By: kuhar, nikic Differential Revision: https://reviews.llvm.org/D94827	2021-01-28 14:11:34 +03:00
Yang Fan	8644eb024b	[NFC][Transforms][Coroutines] Remove unused variable	2021-01-28 16:42:30 +08:00
Kazu Hirata	0da15ea581	[llvm] Use append_range (NFC)	2021-01-27 23:25:41 -08:00
Hongtao Yu	7e99bddfea	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Teresa Johnson	1487747e99	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Fangrui Song	54fb3ca96e	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Florian Hahn	28410d17f5	[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks. This gives the user control over which expander to use, which in turn allows the user to decide what to do with the expanded instructions. Used in D75980. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94295	2021-01-27 17:36:19 +00:00
Petr Hosek	bb9eb19829	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 17:13:34 -08:00
Adrian Prantl	0554541b44	Salvage debug info for function arguments in coro-split funclets. This patch improves the availability for variables stored in the coroutine frame by emitting an alloca to hold the pointer to the frame object and rewriting dbg.declare intrinsics to point inside the frame object using salvaged DIExpressions. Finally, a new alloca is created in the funclet to hold the FramePtr pointer to ensure that it is available throughout the entire function at -O0. This path also effectively reverts D90772. The testcase updates highlight nicely how every removed CHECK for a dbg.value is preceded by a new CHECK for a dbg.declare. Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews! Differential Revision: https://reviews.llvm.org/D93497 rdar://71866936	2021-01-26 15:01:26 -08:00
Bjorn Pettersson	a9bd3d37bd	[NewPM] Add ExtraVectorizerPasses support As it looks like NewPM generally is using SimpleLoopUnswitch instead of LoopUnswitch, this patch also use SimpleLoopUnswitch in the ExtraVectorizerPasses sequence (compared with LegacyPM which use the LoopUnswitch pass). Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D95457	2021-01-26 22:59:10 +01:00
Valery N Dmitriev	716b9dd0d8	[InstCombine] Preserve FMF for powi simplifications. Differential Revision: https://reviews.llvm.org/D95455	2021-01-26 13:26:06 -08:00
Petr Hosek	1e634f3952	Revert "Support for instrumenting only selected files or functions" This reverts commit `4edf35f11a` because the test fails on Windows bots.	2021-01-26 12:25:28 -08:00
Petr Hosek	4edf35f11a	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 11:11:39 -08:00
Sanjay Patel	09b1c56366	[LoopUtils] do not initialize Cmp predicate unnecessarily; NFC The switch must set the predicate correctly; anything else should lead to unreachable/assert. I'm trying to fix FMF propagation here and the callers, so this is a preliminary cleanup.	2021-01-26 11:22:51 -05:00
Florian Hahn	1272f16d14	[LoopUnswitch] Avoid partially unswitching too aggressively. This patch adds additional checks to avoid partial unswitching in cases where it won't be profitable, e.g. because the path directly exits the loop anyways.	2021-01-26 15:18:41 +00:00
Florian Hahn	35b3989a30	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
Sergey Dmitriev	13cedcaf45	[llvm-link] Fix crash when materializing appending global This patch fixes llvm-link crash when materializing global variable with appending linkage and initializer that depends on another global with appending linkage. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D95329	2021-01-25 18:08:07 -08:00
modimo	ce7f9cdb50	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Nikita Popov	835104a114	[LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943) When LSR converts a branch on the pre-inc IV into a branch on the post-inc IV, the nowrap flags on the addition may no longer be valid. Previously, a poison result of the addition might have been ignored, in which case the program was well defined. After branching on the post-inc IV, we might be branching on poison, which is undefined behavior. Fix this by discarding nowrap flags which are not present on the SCEV expression. Nowrap flags on the SCEV expression are proven by SCEV to always hold, independently of how the expression will be used. This is essentially the same fix we applied to IndVars LFTR, which also performs this kind of pre-inc to post-inc conversion. I believe a similar problem can also exist for getelementptr inbounds, but I was not able to come up with a problematic test case. The inbounds case would have to be addressed in a differently anyway (as SCEV does not track this property). Fixes https://bugs.llvm.org/show_bug.cgi?id=46943. Differential Revision: https://reviews.llvm.org/D95286	2021-01-25 23:13:48 +01:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Florian Hahn	76afbf60ed	[VPlan] Replace uses with new value in VPInstructionsToVPRecipe (NFC). Now that VPRecipeBase inherits from VPDef, we can always use the new VPValue for replacement, if the recipe defines one. Given the recipes that are supported at the moment, all new recipes must have either 0 or 1 defined values.	2021-01-25 19:38:08 +00:00
Nick Desaulniers	d36812892c	[GVN] do not repeat PRE on failure to split critical edge Fixes an infinite loop encountered in GVN. GVN will delay PRE if it encounters critical edges, attempt to split them later via calls to SplitCriticalEdge(), then restart. The caller of GVN::splitCriticalEdges() assumed a return value of true meant that critical edges were split, that the IR had changed, and that PRE should be re-attempted, upon which we loop infinitely. This was exposed after D88438, by compiling the Linux kernel for s390, but the test case is reproducible on x86. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1261 Reviewed By: void Differential Revision: https://reviews.llvm.org/D94996	2021-01-25 11:23:44 -08:00
Wei Mi	c9cd9a0066	[SampleFDO] Report error when reading a bad/incompatible profile instead of turning off SampleFDO silently. Currently sample loader pass turns off SampleFDO optimization silently when it sees error in reading the profile. This behavior will defeat the tests which could have caught those bad/incompatible profile problems. This patch change the behavior to report error. Differential Revision: https://reviews.llvm.org/D95269	2021-01-25 10:28:23 -08:00
Xun Li	17c3538aef	Revert "Fix unused variable in CoroFrame.cpp when building Release with GCC 10" This reverts commit `ff5e896425`.	2021-01-25 08:37:45 -08:00
Florian Hahn	3201274dea	[VPlan] Handle scalarized values in VPTransformState. This patch adds plumbing to handle scalarized values directly in VPTransformState. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92282	2021-01-25 14:21:56 +00:00
Sanjay Patel	09a136bcc6	[InstCombine] narrow min/max intrinsics with extended inputs We can sink extends after min/max if they match and would not change the sign-interpreted compare. The only combo that doesn't work is zext+smin/smax because the zexts could change a negative number into positive: https://alive2.llvm.org/ce/z/D6sz6J Sext+umax/umin works: define i32 @src(i8 %x, i8 %y) { %0: %sx = sext i8 %x to i32 %sy = sext i8 %y to i32 %m = umax i32 %sx, %sy ret i32 %m } => define i32 @tgt(i8 %x, i8 %y) { %0: %m = umax i8 %x, %y %r = sext i8 %m to i32 ret i32 %r } Transformation seems to be correct!	2021-01-25 07:52:50 -05:00
Sander de Smalen	171d12489f	[SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. This change also changes getReductionCost to return InstructionCost, and it simplifies two expressions by removing a redundant 'isValid' check.	2021-01-25 12:27:01 +00:00
Nikita Popov	8b9df70bf7	[Utils] Use NoAliasScopeDeclInst in a few more places (NFC) In the cloning infrastructure, only track an MDNode mapping, without explicitly storing the Metadata mapping, same as is done during inlining. This makes things slightly simpler.	2021-01-24 16:24:11 +01:00
Sanjay Patel	77adbe6a8c	[SLP] fix fast-math requirements for fmin/fmax reductions `a6f0221276` enabled intersection of FMF on reduction instructions, so it is safe to ease the check here. There is still some room to improve here - it looks like we have nearly duplicate flags propagation logic inside of the LoopUtils helper but it is limited targets that do not form reduction intrinsics (they form the shuffle expansion).	2021-01-24 08:55:56 -05:00
Jeroen Dobbelaere	dcc7706fcf	[InstCombine] Remove unused llvm.experimental.noalias.scope.decl A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope. When that is not the case for at least one of the two, the intrinsic call can as well be removed. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95141	2021-01-24 13:55:50 +01:00
Jeroen Dobbelaere	659c7bcde6	[LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary. This is based on the version from the Full Restrict paches (D68511). The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94306	2021-01-24 13:53:13 +01:00
Jeroen Dobbelaere	774629641b	[LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed. Notes: - it also includes changes and tests from D90104 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92887	2021-01-24 13:48:20 +01:00
Roman Lebedev	6f2753273e	[NFC][SimplifyCFG] Extract CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses() out of PerformBranchToCommonDestFolding() To be used in PerformValueComparisonIntoPredecessorFolding()	2021-01-24 00:54:55 +03:00
Roman Lebedev	67f9c87a65	[NFC][SimplifyCFG] Perform early-continue in FoldValueComparisonIntoPredecessors() per-pred loop	2021-01-24 00:54:54 +03:00
Roman Lebedev	a4e6c2e647	[NFC][SimplifyCFG] Extract PerformValueComparisonIntoPredecessorFolding() out of FoldValueComparisonIntoPredecessors() Less nested code is much easier to follow and modify.	2021-01-24 00:54:54 +03:00
Nikita Popov	c83cff45c7	[IR] Add NoAliasScopeDeclInst (NFC) Add an intrinsic type class to represent the llvm.experimental.noalias.scope.decl intrinsic, to make code working with it a bit nicer by hiding the metadata extraction from view.	2021-01-23 22:40:32 +01:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Florian Hahn	d60b74c28a	[InstCombine] Set MadeIRChange in replaceInstUsesWith. Some utilities used by InstCombine, like SimplifyLibCalls, may add new instructions and replace the uses of a call, but return nullptr because the inserted call produces multiple results. Previously, the replaced library calls would get removed by InstCombine's deleter, but after `292077072e` this may not happen, if the willreturn attribute is missing. As a work-around, update replaceInstUsesWith to set MadeIRChange, if it replaces any uses. This catches the cases where it is used as replacer by utilities used by InstCombine and seems useful in general; updating uses will modify the IR. This fixes an expensive-check failure when replacing @__sinpif/@__cospifi with @__sincospif_sret.	2021-01-23 17:52:59 +00:00
Sanjay Patel	a6f0221276	[SLP] fix fast-math-flag propagation on FP reductions As shown in the test diffs, we could miscompile by propagating flags that did not exist in the original code. The flags required for fmin/fmax reductions will be fixed in a follow-up patch.	2021-01-23 11:17:20 -05:00
Florian Hahn	292077072e	[Local] Treat calls that may not return as being alive. With the addition of the `willreturn` attribute, functions that may not return (e.g. due to an infinite loop) are well defined, if they are not marked as `willreturn`. This patch updates `wouldInstructionBeTriviallyDead` to not consider calls that may not return as dead. This patch still provides an escape hatch for intrinsics, which are still assumed as willreturn unconditionally. It will be removed once all intrinsics definitions have been reviewed and updated. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94106	2021-01-23 16:05:14 +00:00
Roman Lebedev	022da61f6b	[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers If i change it to AssertingVH instead, a number of existing tests fail, which means we don't consistently remove from the set when deleting blocks, which means newly-created blocks may happen to appear in that set if they happen to occupy the same memory chunk as did some block that was in the set originally. There are many places where we delete blocks, and while we could probably consistently delete from LoopHeaders when deleting a block in transforms located in SimplifyCFG.cpp itself, transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may delete blocks, and it doesn't seem good to teach them to deal with it. Since we at most only ever delete from LoopHeaders, let's just delegate to WeakVH to do that automatically. But to be honest, personally, i'm not sure that the idea behind LoopHeaders is sound.	2021-01-23 16:48:35 +03:00
Jeroen Dobbelaere	2b9a834c43	[InlineFunction] Use llvm.experimental.noalias.scope.decl for noalias arguments. Insert a llvm.experimental.noalias.scope.decl intrinsic that identifies where a noalias argument was inlined. This patch includes some refactorings from D90104. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93040	2021-01-23 12:10:57 +01:00
Zequan Wu	867bdfeff1	[InstCombine] remove incompatible attribute when simplifying some lib calls Like D95088, remove incompatible attribute in more lib calls. Differential Revision: https://reviews.llvm.org/D95278	2021-01-22 17:27:36 -08:00
Philip Reames	ef51eed37b	[LoopDeletion] Handle inner loops w/untaken backedges This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA. When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes. This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop. (I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.) The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue. Differential Revision: https://reviews.llvm.org/D94378	2021-01-22 16:31:29 -08:00
Francis Visoiu Mistrih	0cc38acfc4	[Matrix] Propagate shape information through fneg Similar to binary operators like fadd/fmul/fsub, propagate shape info through unary operators (fneg is the only one?). Differential Revision: https://reviews.llvm.org/D95252	2021-01-22 14:34:28 -08:00
Roman Lebedev	1742203844	[SimplifyCFG] FoldBranchToCommonDest(): re-lift restrictions on liveout uses of bonus instructions I have previously tried doing that in `b33fbbaa34` / `d38205144f`, but eventually it was pointed out that the approach taken there was just broken wrt how the uses of bonus instructions are updated to account for the fact that they should now use either bonus instruction or the cloned bonus instruction. In particluar, all that manual handling of PHI nodes in successors was just wrong. But, the fix is actually much much simpler than my initial approach: just tell SSAUpdate about both instances of bonus instruction, and let it deal with all the PHI handling. Alive2 confirms that the reproducers from the original bugs (@pr48450*) are now handled correctly. This effectively reverts commit `59560e8589`, effectively relanding `b33fbbaa34`.	2021-01-23 01:29:05 +03:00
Roman Lebedev	eae1cc0de5	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): move instruction cloning to after CFG update This simplifies follow-up patch, and is NFC otherwise.	2021-01-23 01:29:04 +03:00
Roman Lebedev	9bd8bcf993	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): fix instruction name preservation NewBonusInst just took name from BonusInst, so BonusInst has no name, so BonusInst.getName() makes no sense. So we need to ask NewBonusInst for the name.	2021-01-23 01:29:03 +03:00
Shimin Cui	99a0aa07e9	[Analysis] Support AIX vec_malloc routines This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX. Differential Revision: https://reviews.llvm.org/D94710	2021-01-22 16:03:01 -05:00
Nikita Popov	45b259f995	[SimplifyLibCalls] Skip unused calls in sincos transform If the call result is unused, we should let it get DCEd rather than replacing it. Also, don't try to replace an existing sincos with another one (unless it's as part of combining sin and cos). This avoids an infinite combine loop if the calls are not DCEd as expected, which can happen with D94106 and lack of willreturn annotation in hand-crafted IR.	2021-01-22 20:57:13 +01:00
Sanjay Patel	411c144e4c	[InstCombine] narrow abs with sign-extended input In the motivating cases from https://llvm.org/PR48816 , we have a trailing trunc. But that is not required to reduce the abs width: https://alive2.llvm.org/ce/z/ECaz-p ...as long as we clear the int-min-is-poison bit (nsw). We have some existing tests that are affected, and I'm not sure what the overall implications are, but in general we favor narrowing operations over preserving nsw/nuw. If that causes problems, we could restrict this transform based on type (shouldChangeType() and/or vector vs. scalar). Differential Revision: https://reviews.llvm.org/D95235	2021-01-22 13:36:04 -05:00
Florian Hahn	86991d3231	[LoopUnswitch] Fix logic to avoid unswitching with atomic loads. The existing code did not deal with atomic loads correctly. Such loads are represented as MemoryDefs. Bail out on any MemoryAccess that is not a MemoryUse.	2021-01-22 15:10:12 +00:00
Arnold Schwaighofer	87b628dadd	[coro.async] Make sure we process async coroutines Because we were not looking for the llvm.coro.id.async intrinsic in the early coro pass which triggers follow-up passes we relied on the llvm.coro.end intrinsic being present. This might not be the case in functions that end in unreachable code. Differential Revision: https://reviews.llvm.org/D95144	2021-01-22 07:04:01 -08:00
Roman Lebedev	85e7578c6d	Revert "[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches" Does not build in XCode: http://green.lab.llvm.org/green/job/clang-stage1-RA/17963/consoleFull#-1704658317a1ca8a51-895e-46c6-af87-ce24fa4cd561 This reverts commit `aabed3718a`.	2021-01-22 17:37:11 +03:00
Roman Lebedev	d1a6f92fd5	[InstCombine] Fold `(~x) \| y` --> `~(x & (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count. Note that we could position this transformation as just hoisting of the `not` (still, iff y is freely negatible), but the test changes show a number of regressions, so let's not do that.	2021-01-22 17:23:54 +03:00
Roman Lebedev	79b0d21ce9	[InstCombine] Fold `(~x) & y` --> `~(x \| (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count.	2021-01-22 17:23:54 +03:00
Roman Lebedev	4ed0d8f2f0	[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate() I'd like to use it in an upcoming fold.	2021-01-22 17:23:53 +03:00
Roman Lebedev	efeb8caf8b	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract the actual transform into helper function I'm intentionally structuring it this way, so that the actual fold only does the fold, and no legality/correctness checks, all of which must be done by the caller. This allows for the fold code to be more compact and more easily grokable.	2021-01-22 17:23:53 +03:00
Roman Lebedev	b482560a59	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract check for destination sharing into a helper function As a follow-up, i'll extract the actual transform into a function, and this helper will be called from both places, so this avoids code duplication.	2021-01-22 17:23:53 +03:00
Roman Lebedev	7b89efb55e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): somewhat better structure weight updating code Hoist the successor updating out of the code that deals with branch weight updating, and hoist the 'has weights' check from the latter, making code more consistent and easier to follow.	2021-01-22 17:23:41 +03:00
Roman Lebedev	256a035752	[NFC][SimplifyCFG] FoldBranchToCommonDest(): unclutter Cond/CondInPred handling We don't need those variables, we can just get the final value directly.	2021-01-22 17:23:11 +03:00
Roman Lebedev	aabed3718a	[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches While we already ignore uncond branches, we could still potentially end up with a conditional branches with identical destinations due to the visitation order, or because we were called as an utility. But if we have such a disguised uncond branch, we still probably shouldn't deal with it here.	2021-01-22 17:23:10 +03:00
Roman Lebedev	0895b836d7	[SimplifyCFG] FoldBranchToCommonDest(): don't deal with unconditional branches The case where BB ends with an unconditional branch, and has a single predecessor w/ conditional branch to BB and a single successor of BB is exactly the pattern SpeculativelyExecuteBB() transform deals with. (and in this case they both allow speculating only a single instruction) Well, or FoldTwoEntryPHINode(), if the final block has only those two predecessors. Here, in FoldBranchToCommonDest(), only a weird subset of that transform is supported, and it's glued on the side in a weird way. In particular, it took me a bit to understand that the Cond isn't actually a branch condition in that case, but just the value we allow to speculate (otherwise it reads as a miscompile to me). Additionally, this only supports for the speculated instruction to be an ICmp. So let's just unclutter FoldBranchToCommonDest(), and leave this transform up to SpeculativelyExecuteBB(). As far as i can tell, this shouldn't really impact optimization potential, but if it does, improving SpeculativelyExecuteBB() will be more beneficial anyways. Notably, this only affects a single test, but EarlyCSE should have run beforehand in the pipeline, and then FoldTwoEntryPHINode() would have caught it. This reverts commit rL158392 / commit `d33f4efbfd`.	2021-01-22 17:22:49 +03:00
Anton Rapetov	a4914dc1f2	[SLP] do not traverse constant uses Walking the use list of a Constant (particularly, ConstantData) is not scalable, since a given constant may be used by many instructinos in many functions in many modules. Differential Revision: https://reviews.llvm.org/D94713	2021-01-22 08:14:09 -05:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Xun Li	bd3ca6666d	[Inlining] Delete redundant optnone/alwaysinline check The same check is done in InlineCost: `8b0bd54d0e/llvm/lib/Analysis/InlineCost.cpp (L2537-L2552)` Also, doing a check on the callee here is confusing, because anything that deals with callee should be done in the inner loop where we proecss all calls from the same caller. Differential Revision: https://reviews.llvm.org/D95186	2021-01-21 18:38:10 -08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Sanjay Patel	2f03528f5e	[SLP] rename reduction variable to avoid shadowing; NFC The code structure can likely be improved now that 'OperationData' is gone.	2021-01-21 16:02:38 -05:00
Anton Rapetov	bfec9148a0	Scalar: Don't visit constants in findInnerReductionPhi in LoopInterchange In LoopInterchange, `findInnerReductionPhi()` looks for reduction variables, which cannot be constants. Update it to return early in that case. This also addresses a blocker for removing use-lists from ConstantData, whose users could be spread across arbitrary modules in the same LLVMContext. Differential Revision: https://reviews.llvm.org/D94712	2021-01-21 12:33:06 -08:00
Sanjay Patel	d77753381f	[SLP] simplify reduction matching This is NFC-intended and removes the "OperationData" class which had become nothing more than a recurrence (reduction) type. I adjusted the matching logic to distinguish instructions from non-instructions - that's all that the "IsLeafValue" member was keeping track of.	2021-01-21 14:58:57 -05:00
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Sanjay Patel	070af1b788	[InstCombine] avoid crashing on attribute propagation In https://llvm.org/PR48810 , we are crashing while trying to propagate attributes from mempcpy (returns void*) to memcpy (returns nothing - void). We can avoid the crash by removing known incompatible attributes for the void return type. I'm not sure if this goes far enough (should we just drop all attributes since this isn't the same function?). We also need to audit other transforms in LibCallSimplifier to make sure there are no other cases that have the same problem. Differential Revision: https://reviews.llvm.org/D95088	2021-01-21 08:13:26 -05:00
Florian Hahn	bee486851c	[LoopUnswitch] Implement first version of partial unswitching. This patch applies the idea from D93734 to LoopUnswitch. It adds support for unswitching on conditions that are only invariant along certain paths through a loop. In particular, it targets conditions in the loop header that depend on values loaded from memory. If either path from the true or false successor through the loop does not modify memory, perform partial loop unswitching. That is, duplicate the instructions feeding the condition in the pre-header. Then unswitch on the duplicated condition. The condition is now known in the unswitched version for the 'invariant' path through the original loop. On caveat of this approach is that one of the loops created can be partially unswitched again. To avoid this behavior, `llvm.loop.unswitch.partial.disable` metadata is added to the unswitched loops, to avoid subsequent partial unswitching. If that's the approach to go, I can move the code handling the metadata kind into separate functions. This increases the cases we unswitch quite a bit in SPEC2006/SPEC2000 & MultiSource. It also allows us to eliminate a dead loop in SPEC2017's omnetpp ``` Tests: 236 Same hash: 170 (filtered out) Remaining: 66 Metric: loop-unswitch.NumBranches Program base patch diff test-suite...000/255.vortex/255.vortex.test 2.00 23.00 1050.0% test-suite...T2006/401.bzip2/401.bzip2.test 7.00 55.00 685.7% test-suite :: External/Nurbs/nurbs.test 5.00 26.00 420.0% test-suite...s-C/unix-smail/unix-smail.test 1.00 3.00 200.0% test-suite.../Prolangs-C++/ocean/ocean.test 1.00 3.00 200.0% test-suite...tions/lambda-0.1.3/lambda.test 1.00 3.00 200.0% test-suite...yApps-C++/PENNANT/PENNANT.test 2.00 5.00 150.0% test-suite...marks/Ptrdist/yacr2/yacr2.test 1.00 2.00 100.0% test-suite...lications/viterbi/viterbi.test 1.00 2.00 100.0% test-suite...plications/d/make_dparser.test 12.00 24.00 100.0% test-suite...CFP2006/433.milc/433.milc.test 14.00 27.00 92.9% test-suite.../Applications/lemon/lemon.test 7.00 12.00 71.4% test-suite...ce/Applications/Burg/burg.test 6.00 10.00 66.7% test-suite...T2006/473.astar/473.astar.test 16.00 26.00 62.5% test-suite...marks/7zip/7zip-benchmark.test 78.00 121.00 55.1% ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93764	2021-01-21 09:46:41 +00:00
Kazu Hirata	e53472de68	[Transforms] Use llvm::append_range (NFC)	2021-01-20 21:35:54 -08:00
Kazu Hirata	8f5da41c4d	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-20 21:35:52 -08:00
Dávid Bolvanský	bb3f169b59	[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95095	2021-01-21 00:12:37 +01:00
Mircea Trofin	ccec2cf1d9	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `d97f776be5`. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	95ce32c787	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Nikita Popov	1c6d1e57c1	[PredicateInfo] Handle logical and/or Teach PredicateInfo to handle logical and/or the same way as bitwise and/or. This allows handling logical and/or inside IPSCCP and NewGVN.	2021-01-20 21:03:07 +01:00
Nikita Popov	ca4ed1e7ae	[PredicateInfo] Generalize processing of conditions Branch/assume conditions in PredicateInfo are currently handled in a rather ad-hoc manner, with some arbitrary limitations. For example, an `and` of two `icmp`s will be handled, but an `and` of an `icmp` and some other condition will not. That also includes the case where more than two conditions and and'ed together. This patch makes the handling more general by looking through and/ors up to a limit and considering all kinds of conditions (though operands will only be taken for cmps of course). Differential Revision: https://reviews.llvm.org/D94447	2021-01-20 20:40:41 +01:00
Mircea Trofin	d97f776be5	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `e8aec763a5`.	2021-01-20 11:19:34 -08:00
Mircea Trofin	e8aec763a5	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
Dávid Bolvanský	16d6e85271	[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94850	2021-01-20 19:45:23 +01:00
Sanjay Patel	c09be0d2a0	[SLP] reduce reduction code for checking vectorizable ops; NFC This is another step towards removing `OperationData` and fixing FMF matching/propagation bugs when forming reductions.	2021-01-20 11:14:48 -05:00
Sanjay Patel	1c54112a57	[SLP] refactor more reduction functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param. More streamlining is possible, but I'm trying to avoid logic/typo bugs while fixing this. Eventually, we should not need the `OperationData` class.	2021-01-20 11:14:48 -05:00
Sanjay Patel	8590d24543	[SLP] move reduction createOp functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param.	2021-01-20 11:14:48 -05:00
Joseph Tremoulet	40cd262c43	Loop peeling: check that latch is conditional branch Loop peeling assumes that the loop's latch is a conditional branch. Add a check to canPeel that explicitly checks for this, and testcases that otherwise fail an assertion when trying to peel a loop whose back-edge is a switch case or the non-unwind edge of an invoke. Reviewed By: skatkov, fhahn Differential Revision: https://reviews.llvm.org/D94995	2021-01-20 11:01:16 -05:00
Chuanqi Xu	c1bc7981ba	[Coroutine] Remain alignment information when merging frame variables Summary: This is to address bug48712. The solution in this patch is that when we want to merge two variable a into the storage frame of variable b only if the alignment of a is multiple of b. There may be other strategies. But now I think they are hard to handle and benefit little. Or we can implement them in the future. Test-plan: check-llvm Reviewers: jmorse, lxfind, junparser Differential Revision: https://reviews.llvm.org/D94891	2021-01-20 18:59:00 +08:00
David Sherwood	255a507716	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	21b1ad0340	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Alexey Bataev	e463bd53c0	Revert "[SLP]Merge reorder and reuse shuffles." This reverts commit `438682de6a` to fix the bug with the reducing size of the resulting vector for the entry node with multiple users.	2021-01-19 11:48:04 -08:00
Mariya Podchishchaeva	7113de301a	[ScalarizeMaskedMemIntrin] Add missing dependency The pass has dependency on 'TargetTransformInfoWrapperPass', but the corresponding call to INITIALIZE_PASS_DEPENDENCY was missing. Differential Revision: https://reviews.llvm.org/D94916	2021-01-19 22:33:47 +03:00
Nikita Popov	21443381c0	Reapply [InstCombine] Replace one-use select operand based on condition Relative to the original change, this adds a check that the instruction on which we're replacing operands is safe to speculatively execute, because that's what we're effectively doing. We're executing the instruction with the replaced operand, which is fine if it's pure, but not fine if can cause side-effects or UB (aka is not speculatable). Additionally, we cannot (generally) replace operands in phi nodes, as these may refer to a different loop iteration. This is also covered by the speculation check. ----- InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-19 20:26:38 +01:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Hans Wennborg	58bdfcfac0	Revert `5238e7b302` "[InstCombine] Replace one-use select operand based on condition" This caused a miscompile in Chromium, see comments on the codereview for discussion and pointer to a reproducer. > InstCombine already performs a fold where X == Y ? f(X) : Z is > transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, > if f(X) only has one use, then we can always directly replace the > use inside the instruction. To actually be profitable, limit it to > the case where Y is a non-expr constant. > > This could be further extended to replace uses further up a one-use > instruction chain, but for now this only looks one level up. > > Among other things, this also subsumes D94860. > > Differential Revision: https://reviews.llvm.org/D94862 This also reverts the follow-up a003f26539cf4db744655e76c41f4c4a8913f116: > [llvm] Prevent infinite loop in InstCombine of select statements > > This fixes an issue where the RHS and LHS the comparison operation > creating the predicate were swapped back and forth forever. > > Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 11:50:56 +01:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Tres Popp	a003f26539	[llvm] Prevent infinite loop in InstCombine of select statements This fixes an issue where the RHS and LHS the comparison operation creating the predicate were swapped back and forth forever. Differential Revision: https://reviews.llvm.org/D94934	2021-01-19 10:31:48 +01:00
David Sherwood	c3ce262794	[NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065	2021-01-19 09:08:40 +00:00
Juneyoung Lee	2d89ebd5d1	Address unused variable warning	2021-01-19 09:30:16 +09:00
Juneyoung Lee	0441df94ad	[InstCombine,InstSimplify] Optimize select followed by and/or/xor This patch adds `A & (A && B)` -> `A && B` (similarly for or + logical or) Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`. Alive2 proof: merge_and: https://alive2.llvm.org/ce/z/teMR97 merge_or: https://alive2.llvm.org/ce/z/b4yZUp xor_and: https://alive2.llvm.org/ce/z/_-TXHi xor_or: https://alive2.llvm.org/ce/z/2uYx_a Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94861	2021-01-19 09:14:17 +09:00
Juneyoung Lee	395c737d9f	[SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of (x == C1 \|\| x == C2 \|\| ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible. D93065 has more context about the transition, including links to the list of optimizations being updated. Differential Revision: https://reviews.llvm.org/D93943	2021-01-19 08:53:40 +09:00
Sanjay Patel	5b77ac32b1	[SLP] match maxnum/minnum intrinsics as FP reduction ops After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready. We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this should restore that functionality for SLP. There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). Fixing FMF propagation is a planned follow-up. Differential Revision: https://reviews.llvm.org/D94913	2021-01-18 17:37:16 -05:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	dc300beba7	[STLExtras] Add a default value to drop_begin This patch adds the default value of 1 to drop_begin. In the llvm codebase, 70% of calls to drop_begin have 1 as the second argument. The interface similar to with std::next should improve readability. This patch converts a couple of calls to drop_begin as examples. Differential Revision: https://reviews.llvm.org/D94858	2021-01-18 10:16:34 -08:00
Xun Li	1d04dc52dd	[Coroutine] Do not CoroElide if there are musttail calls This is to address https://bugs.llvm.org/show_bug.cgi?id=48626. When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal. We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway. Differential Revision: https://reviews.llvm.org/D94834	2021-01-18 09:06:21 -08:00
Sanjay Patel	3dbbadb8ef	[SLP] rename reduction query for min/max ops; NFC This will avoid confusion once we start matching min/max intrinsics. All of these hacks to accomodate cmp+sel idioms should disappear once we canonicalize to min/max intrinsics.	2021-01-18 09:32:57 -05:00
Sanjay Patel	d1c4e859ce	[SLP] reduce opcode API dependency in reduction cost calc; NFC The icmp opcode is now hard-coded in the cost model call. This will make it easier to eventually remove all opcode queries for min/max patterns as we transition to intrinsics.	2021-01-18 09:32:57 -05:00
Florian Hahn	e6d758de82	[InferAttrs] Mark some library functions as willreturn. This patch marks some library functions as willreturn. On the first pass, I excluded most functions that interact with streams/the filesystem. Along with willreturn, it also adds nounwind to a set of math functions. There probably are a few additional attributes we can add for those, but that should be done separately. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94684	2021-01-18 13:40:21 +00:00
Caroline Concatto	36710c38c1	[NFC]Migrate VectorCombine.cpp to use InstructionCost This patch changes these functions: vectorizeLoadInsert isExtractExtractCheap foldExtractedCmps scalarizeBinopOrCmp getShuffleExtract foldBitcastShuf to use the class InstructionCost when calling TTI.get<something>Cost(). This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 ps.:This patch adds the test \|\| !NewCost.isValid(), because we want to return false when: !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive and !NewCost.isValid() && OldCost.isValid() Therefore for simplication we only add test for !NewCost.isValid() Differential Revision: https://reviews.llvm.org/D94069	2021-01-18 13:37:21 +00:00
Dávid Bolvanský	ed396212da	[InstCombine] Transform abs pattern using multiplication to abs intrinsic (PR45691) ``` unsigned r(int v) { return (1 \| -(v < 0)) * v; } `r` is equivalent to `abs(v)`. ``` ``` define <4 x i8> @src(<4 x i8> %0) { %1: %2 = ashr <4 x i8> %0, { 31, undef, 31, 31 } %3 = or <4 x i8> %2, { 1, 1, 1, undef } %4 = mul nsw <4 x i8> %3, %0 ret <4 x i8> %4 } => define <4 x i8> @tgt(<4 x i8> %0) { %1: %2 = icmp slt <4 x i8> %0, { 0, 0, 0, 0 } %3 = sub nsw <4 x i8> { 0, 0, 0, 0 }, %0 %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %0 ret <4 x i8> %4 } Transformation seems to be correct! ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94874	2021-01-17 17:06:14 +01:00
Nikita Popov	5238e7b302	[InstCombine] Replace one-use select operand based on condition InstCombine already performs a fold where X == Y ? f(X) : Z is transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However, if f(X) only has one use, then we can always directly replace the use inside the instruction. To actually be profitable, limit it to the case where Y is a non-expr constant. This could be further extended to replace uses further up a one-use instruction chain, but for now this only looks one level up. Among other things, this also subsumes D94860. Differential Revision: https://reviews.llvm.org/D94862	2021-01-16 23:25:02 +01:00
Roman Lebedev	32fc32317a	[SimplifyCFG] markAliveBlocks(): catchswitch: preserve PostDomTree When removing catchpad's from catchswitch, if that removes a successor, we need to record that in DomTreeUpdater. This fixes PostDomTree preservation failure in an existing test. This appears to be the single issue that i see in my current test coverage.	2021-01-17 01:21:05 +03:00
Sanjay Patel	49b96cd9ef	[SLP] remove opcode field from reduction data class This is NFC-intended and another step towards supporting intrinsics as reduction candidates. The remaining bits of the OperationData class do not make much sense as-is, so I will try to improve that, but I'm trying to take minimal steps because it's still not clear how this was intended to work.	2021-01-16 13:55:52 -05:00
Sanjay Patel	fcfcc3cc6b	[SLP] fix typos; NFC	2021-01-16 13:55:52 -05:00
Sanjay Patel	48dbac5b6b	[SLP] remove unnecessary use of 'OperationData' This is another NFC-intended patch to allow matching intrinsics (example: maxnum) as candidates for reductions. It's possible that the loop/if logic can be reduced now, but it's still difficult to understand how this all works.	2021-01-16 13:55:52 -05:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Dávid Bolvanský	a1500105ee	[SimplifyCFG] Optimize CFG when null is passed to a function with nonnull argument Example: ``` __attribute__((nonnull,noinline)) char * pinc(char p) { return ++p; } char foo(bool b, char a) { return pinc(b ? 0 : a); } ``` optimize to ``` char foo(bool b, char *a) { return pinc(a); } ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94180	2021-01-15 23:53:43 +01:00
Sanjay Patel	ceb3cdccd0	[SLP] remove dead code in reduction matching; NFC To get into this block we had: !A \|\| B \|\| C and we checked C in the first 'if' clause leaving !A \|\| B. But the 2nd 'if' is checking: A && !B --> !(!A \|\| B)	2021-01-15 17:03:26 -05:00
Nick Desaulniers	ed0fd567eb	BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor Otherwise we'll fail the assertion in SplitBlockPredecessors() related to splitting the edges from CallBr's. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1161 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1252 Reviewed By: void, MaskRay, jyknight Differential Revision: https://reviews.llvm.org/D88438	2021-01-15 13:51:47 -08:00
Roman Lebedev	a14c36fe27	[SimplifyCFG] switchToSelect(): don't forget to insert DomTree edge iff needed DestBB might or might not already be a successor of SelectBB, and it wasn't we need to ensure that we record the fact in DomTree. The testcase used to crash in lazy domtree updater mode + non-per-function domtree validity checks disabled.	2021-01-15 23:35:57 +03:00
Roman Lebedev	c6654a4cda	[SimplifyCFG][BasicBlockUtils] Port SplitBlockPredecessors()/SplitLandingPadPredecessors() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	286cf6cb02	[SimplifyCFG] Port SplitBlockAndInsertIfThen() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	c845c724c2	[Utils][SimplifyCFG] Port SplitBlock() to DomTreeUpdater This is not nice, but it's the best transient solution possible, and is better than just duplicating the whole function. The problem is, this function is widely used, and it is not at all obvious that all the users could be painlessly switched to operate on DomTreeUpdater, and somehow i don't feel like porting all those users first. This function is one of last three that not operate on DomTreeUpdater.	2021-01-15 23:35:56 +03:00
Roman Lebedev	b81f75fa79	[Utils] splitBlockBefore() always operates on DomTreeUpdater, so take it, not DomTree Even though not all it's users operate on DomTreeUpdater, it itself internally operates on DomTreeUpdater, so it must mean everything is fine with that, so just do that globally.	2021-01-15 23:35:56 +03:00
Sanjay Patel	1f21de535d	[SLP] remove unused reduction functions; NFC These were made obsolete by simplifying the code in recent patches.	2021-01-15 14:59:33 -05:00
Jamie Schmeiser	17d0fb7f57	Set option default for enabling memory ssa for new pass manager loop sink pass to true. Summary: Set the default for the option enabling memory ssa use in the loop sink pass to true for the new pass manager. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea) Differential Revision: https://reviews.llvm.org/D92486	2021-01-15 09:56:44 -05:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	9bcc0d1040	[CodeGen, Transforms] Use llvm::sort (NFC)	2021-01-14 20:30:31 -08:00
Sanjay Patel	b21905dfe3	[SLP] remove unnecessary state in matching reductions This is NFC-intended. I'm still trying to figure out how the loop where this is used works. It does not seem like we require this data at all, but it's hard to confirm given the complicated predicates.	2021-01-14 18:32:37 -05:00
Bjorn Pettersson	d58512b2e3	[SLP] Don't vectorize stores of non-packed types (like i1, i2) In the spirit of commit `fc783e91e0` (llvm-svn: 248943) we shouldn't vectorize stores of non-packed types (i.e. types that has padding between consecutive variables in a scalar layout, but being packed in a vector layout). The problem was detected as a miscompile in a downstream test case. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D94446	2021-01-14 11:30:33 +01:00
Daniel Paoliello	ff5e896425	Fix unused variable in CoroFrame.cpp when building Release with GCC 10 When building with GCC 10, the following warning is reported: ``` /llvm-project/llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1527:28: warning: unused variable ‘CS’ [-Wunused-variable] 1527 \| if (CatchSwitchInst *CS = ``` This change adds a cast to `void` to avoid the warning. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94456	2021-01-13 22:53:25 -08:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Wei Mi	86341247c4	[NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h to Pass.h. In some compiler passes like SampleProfileLoaderPass, we want to know which LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and it also cannot represent phase in full LTO. The patch extends it to include full LTO phases and move it from PassBuilder.h to Pass.h, then it is much easier for PassBuilder to communiate with each pass about current LTO phase. Differential Revision: https://reviews.llvm.org/D94613	2021-01-13 15:55:40 -08:00
Arthur Eubanks	39e6d24237	[NewPM] Only non-trivially loop unswitch at -O3 and for non-optsize functions This matches the legacy pipeline/pass. Reviewed By: asbirlea, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D94559	2021-01-13 14:54:49 -08:00
Kazu Hirata	fb98a1be43	Fix the warnings on unused variables (NFC)	2021-01-13 13:32:40 -08:00
Sanjay Patel	123674a816	[SLP] simplify type check for reductions This is NFC-intended. The 'valid' call allows int/FP/pointers for other parts of SLP. The difference here is that we can't reduce pointers.	2021-01-13 13:30:46 -05:00
Andrew Litteken	05b1a15f70	[IROutliner] Adapting to hoisted bitcasts in CodeExtractor In commit `700d2417d8` the CodeExtractor was updated so that bitcasts that have lifetime markers that beginning outside of the region are deduplicated outside the region and are not used as an output. This caused a discrepancy in the IROutliner, where in these cases there were arguments added to the aggregate function that were not needed causing assertion errors. The IROutliner queries the CodeExtractor twice to determine the inputs and outputs, before and after `findAllocas` is called with the same ValueSet for the outputs causing the duplication. This has been fixed with a dummy ValueSet for the first call. However, the additional bitcasts prevent us from using the same similarity relationships that were previously defined by the IR Similarity Analysis Pass. In these cases, we check whether the initial version of the region being analyzed for outlining is still the same as it was previously. If it is not, i.e. because of the additional bitcast instructions from the CodeExtractor, we discard the region. Reviewers: yroux Differential Revision: https://reviews.llvm.org/D94303	2021-01-13 11:10:37 -06:00
Nikita Popov	17863614da	[InstCombine] Fold select -> and/or using impliesPoison We can fold a ? b : false to a & b if is_poison(b) implies that is_poison(a), at which point we're able to reuse all the usual fold on ands. In particular, this covers the very common case of icmp X, C && icmp X, C'. The same applies to ors. This currently only has an effect if the -instcombine-unsafe-select-transform=0 option is set. Differential Revision: https://reviews.llvm.org/D94550	2021-01-13 17:45:40 +01:00
David Sherwood	4cd48535ec	[NFC][InstructionCost] Use InstructionCost in Transforms/Scalar/RewriteStatepointsForGC.cpp In places where we calculate costs using TTI.getXXXCost() interfaces I have changed the code to use InstructionCost instead of unsigned. The change is non functional since InstructionCost behaves in the same way as an integer for valid costs. Currently the getXXXCost() functions used in this file do not return invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential revision: https://reviews.llvm.org/D94484	2021-01-13 09:42:58 +00:00
Kazu Hirata	8a20e2b3d3	[llvm] Use Optional::getValueOr (NFC)	2021-01-12 21:43:50 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
Yuanfang Chen	5c7dcd7aea	[Coroutine] Update promise object's final layout index promise is a header field but it is not guaranteed that it would be the third field of the frame due to `performOptimizedStructLayout`. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94137	2021-01-12 17:44:02 -08:00
Luo, Yuanke	055644cc45	[X86][AMX] Prohibit pointer cast on load. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D94372	2021-01-13 09:39:19 +08:00
Hongtao Yu	175288a1af	Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names. Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94455	2021-01-12 15:15:53 -08:00
modimo	2a49b7c64a	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Nikita Popov	23390e7a13	[InstCombine] Handle logical and/or in assume optimization assume(a && b) can be converted to assume(a); assume(b) even if the condition is logical. Same for assume(!(a \|\| b)).	2021-01-12 22:36:40 +01:00
Sanjay Patel	9e7895a868	[SLP] reduce code duplication while processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	92fb5c49e8	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	554be30a42	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	46507a96fc	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
Philip Reames	caafdf07bb	[LV] Weaken spuriously strong assert in LoopVersioning LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here. Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.	2021-01-12 12:57:13 -08:00
Philip Reames	9f61fbd75a	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Florian Hahn	6cd44b204c	[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`. Similar to D94125, derive `willreturn` for functions that are `readonly` and `mustprogress` in FunctionAttrs. To quote the reasoning from D94125: Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: jdoerfert, nikic Differential Revision: https://reviews.llvm.org/D94502	2021-01-12 20:02:34 +00:00
Dávid Bolvanský	0529946b5b	[instCombine] Add (A ^ B) \| ~(A \| B) -> ~(A & B) define i32 @src(i32 %x, i32 %y) { %0: %xor = xor i32 %y, %x %or = or i32 %y, %x %neg = xor i32 %or, 4294967295 %or1 = or i32 %xor, %neg ret i32 %or1 } => define i32 @tgt(i32 %x, i32 %y) { %0: %and = and i32 %x, %y %neg = xor i32 %and, 4294967295 ret i32 %neg } Transformation seems to be correct! https://alive2.llvm.org/ce/z/Cvca4a	2021-01-12 19:29:17 +01:00
Quentin Colombet	905623b64d	[NFC][LICM] Minor improvements to debug output Added a utility function in Value class to print block name and use block labels for unnamed blocks. Changed LICM to call this function in its debug output. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D93577	2021-01-11 18:02:49 -08:00
Roman Lebedev	ec8a6c11db	[SimplifyCFGPass] iterativelySimplifyCFG(): support lazy DomTreeUpdater This boils down to how we deal with early-increment iterator over function's basic blocks: not only we need to early-increment, after that we also need to skip all the blocks that are scheduled for removal, as per DomTreeUpdater.	2021-01-12 02:09:47 +03:00
Roman Lebedev	81afeacd37	[SimplifyCFGPass] mergeEmptyReturnBlocks(): skip blocks scheduled for removal as per DomTreeUpdater Thus supporting lazy DomTreeUpdater mode, where the domtree updates (and thus block removals) aren't applied immediately, but are delayed until last possible moment.	2021-01-12 02:09:47 +03:00
Roman Lebedev	90a92f8b4d	[NFCI][Utils/Local] removeUnreachableBlocks(): cleanup support for lazy DomTreeUpdater When DomTreeUpdater is in lazy update mode, the blocks that were scheduled to be removed, won't be removed until the updates are flushed, e.g. by asking DomTreeUpdater for a up-to-date DomTree. From the function's current code, it is pretty evident that the support for the lazy mode is an afterthought, see e.g. how we roll-back NumRemoved statistic.. So instead of considering all the unreachable blocks as the blocks-to-be-removed, simply additionally skip all the blocks that are already scheduled to be removed	2021-01-12 02:09:47 +03:00
Roman Lebedev	f9ba347706	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): don't insert a DomTree edge if it already exists When we are adding edges to the terminator and potentially turning it into a switch (if it wasn't already), it is possible that the case we're adding will share it's destination with one of the preexisting cases, in which case there is no domtree edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:47 +03:00
Roman Lebedev	c0de0a1b72	[SimplifyCFG] SimplifyBranchOnICmpChain(): don't insert a DomTree edge that already exists BB was already always branching to EdgeBB, there is no edge to add. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Roman Lebedev	c22bc5f1f8	[SimplifyCFG] SwitchToLookupTable(): don't insert a DomTree edge that already exists SI is the terminator of BB, so the edge we are adding obviously already existed. Indeed, this change does not have a test coverage change. This failure has been exposed in an existing test coverage by a follow-up patch that switches to lazy domtreeupdater mode, and removes domtree verification from SimplifyCFGOpt::simplifyOnce()/SimplifyCFGOpt::run(), IOW it does not appear feasible to add dedicated test coverage here.	2021-01-12 02:09:46 +03:00
Hongtao Yu	32bcfcda4e	Rename debug linkage name with -funique-internal-linkage-names Functions that are renamed under -funique-internal-linkage-names have their debug linkage name updated as well. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93747	2021-01-11 13:56:07 -08:00
Sanjay Patel	288f3fc5df	[InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test This is a more basic pattern that we should handle before trying to solve: https://llvm.org/PR48640 There might be a better way to think about this because the pre-condition that I came up with (number of sign bits in the compare constant) misses a potential transform for each of ugt and ult as commented on in the test file. Tried to model this is in Alive: https://rise4fun.com/Alive/juX1 ...but I couldn't get the ComputeNumSignBits() pre-condition to work as expected, so replaced with leading 0/1 preconditions instead. Name: ugt Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ugt i8 %a, C2 => %r = icmp slt i8 %x, 0 Name: ult Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1 %a = ashr %x, C1 %r = icmp ult i4 %a, C2 => %r = icmp sgt i4 %x, -1 Also approximated in Alive2: https://alive2.llvm.org/ce/z/u5hCcz https://alive2.llvm.org/ce/z/__szVL Differential Revision: https://reviews.llvm.org/D94014	2021-01-11 15:53:39 -05:00
Sriraman Tallam	d8c6d24359	-funique-internal-linkage-names appends a hex md5hash suffix to the symbol name which is not demangler friendly, convert it to decimal. Please see D93747 for more context which tries to make linkage names of internal linkage functions to be the uniqueified names. This causes a problem with gdb because breaking using the demangled function name will not work if the new uniqueified name cannot be demangled. The problem is the generated suffix which is a mix of integers and letters which do not demangle. The demangler accepts either all numbers or all letters. This patch simply converts the hash to decimal. There is no loss of uniqueness by doing this as the precision is maintained. The symbol names get longer by a few characters though. Differential Revision: https://reviews.llvm.org/D94154	2021-01-11 11:10:29 -08:00
Giorgis Georgakoudis	9751705512	[OpenMPOpt][WIP] Expand parallel region merging The existing implementation of parallel region merging applies only to consecutive parallel regions that have speculatable sequential instructions in-between. This patch lifts this limitation to expand merging with any sequential instructions in-between, except calls to unmergable OpenMP runtime functions. In-between sequential instructions in the merged region are sequentialized in a "master" region and any output values are broadcasted to the following parallel regions and the sequential region continuation of the merged region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90909	2021-01-11 08:06:23 -08:00
Florian Hahn	eb0371e403	[VPlan] Unify value/recipe printing after VPDef transition. This patch unifies the way recipes and VPValues are printed after the transition to VPDef. VPSlotTracker has been updated to iterate over all recipes and all their defined values to number those. There is no need to number values in Value2VPValue. It also updates a few places that only used slot numbers for VPInstruction. All recipes now can produce numbered VPValues.	2021-01-11 14:42:46 +00:00
Florian Hahn	a94497a342	[VPlan] Move initial quote emission from ::print to ::dumpBasicBlock. This means there will be no stray " when printing individual recipes using print()/dump() in a debugger, for example.	2021-01-11 12:22:15 +00:00
Bjorn Pettersson	675be65106	Require chained analyses in BasicAA and AAResults to be transitive This patch fixes a bug that could result in miscompiles (at least in an OOT target). The problem could be seen by adding checks that the DominatorTree used in BasicAliasAnalysis and ValueTracking was valid (e.g. by adding DT->verify() call before every DT dereference and then running all tests in test/CodeGen). Problem was that the LegacyPassManager calculated "last user" incorrectly for passes such as the DominatorTree when not telling the pass manager that there was a transitive dependency between the different analyses. And then it could happen that an incorrect dominator tree was used when doing alias analysis (which was a pretty serious bug as the alias analysis result could be invalid). Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94138	2021-01-11 11:50:07 +01:00
David Sherwood	40abeb11f4	[NFC][InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost This patch is part of a series of patches that migrate integer instruction costs to use InstructionCost. In the function selectVectorizationFactor I have simply asserted that the cost is valid and extracted the value as is. In future we expect to encounter invalid costs, but we should filter out those vectorization factors that lead to such invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D92178	2021-01-11 09:22:37 +00:00
David Sherwood	b7ccaca537	[NFC] Remove min/max functions from InstructionCost Removed the InstructionCost::min/max functions because it's fine to use std::min/max instead. Differential Revision: https://reviews.llvm.org/D94301	2021-01-11 09:00:12 +00:00
Serguei Katkov	7f69860243	[LoopUnroll] Fix a crash Loop peeling as a last step triggers loop simplification and this can change the loop structure. As a result all cashed values like latch branch becomes invalid. Patch re-structure the code to take into account the possible changes caused by peeling. Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour Reviewed By: Meinersbur, fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D93686	2021-01-11 10:19:26 +07:00
Philip Reames	4739dd67e7	[LoopDeletion] Break backedge of outermost loops when known not taken This is a resubmit of `dd6bb367` (which was reverted due to stage2 build failures in `7c63aac`), with the additional restriction added to the transform to only consider outer most loops. As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better. Original commit message follows... The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-10 16:02:33 -08:00
Roman Lebedev	8e8d214c4a	[NFCI][SimplifyCFG] Prefer to add Insert edges before Delete edges into DomTreeUpdater, if reasonable This has a measurable impact on the number of DomTree recalculations. While this doesn't handle all the cases, it deals with the most obvious ones.	2021-01-11 00:30:44 +03:00
Sanjay Patel	3f09c77d33	[SLP] fix typo in assert This snuck into `0aa75fb12f` , but I didn't catch it locally.	2021-01-10 13:15:04 -05:00
Sanjay Patel	0aa75fb12f	[SLP] put verifyFunction call behind EXPENSIVE_CHECKS A severe compile-time slowdown from this call is noted in: https://llvm.org/PR48689 My naive fix was to put it under LLVM_DEBUG ( `267ff79` ), but that's not limiting in the way we want. This is a quick fix (or we could just remove the call completely and rely on some later pass to discover potentially wrong IR?). A bigger/better fix would be to improve/limit verifyFunction() as noted in: https://llvm.org/PR47712 Differential Revision: https://reviews.llvm.org/D94328	2021-01-10 12:32:21 -05:00
Florian Hahn	c701f85c45	[STLExtras] Use return type from operator* of the wrapped iter. Currently make_early_inc_range cannot be used with iterators with operator* implementations that do not return a reference. Most notably in the LLVM codebase, this means the User iterator ranges cannot be used with make_early_inc_range, which slightly simplifies iterating over ranges while elements are removed. Instead of directly using BaseT::reference as return type of operator, this patch uses decltype to get the actual return type of the operator implementation in WrappedIteratorT. This patch also updates a few places to use make use of make_early_inc_range. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93992	2021-01-10 14:41:13 +00:00
Florian Hahn	d98fc62ae6	[SimplifyCFG] Keep !dgb metadata of moved instruction, if they match. Currently SimplifyCFG drops the debug locations of 'bonus' instructions. Such instructions are moved before the first branch. The reason for the current behavior is that this could lead to surprising debug stepping, if the block that's folded is dead. In case the first branch and the instructions to be folded have the same debug location, this shouldn't be an issue and we can keep the debug location. Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D93662	2021-01-09 19:15:16 +00:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Kazu Hirata	4d92ab1669	[Transforms] Use llvm::find_if (NFC)	2021-01-09 09:24:58 -08:00
Kazu Hirata	9a7c03b800	[SCEV] Remove unused getOrInsertCanonicalInductionVariable (NFC) The last use was removed on Mar 22, 2012 in commit `f47d0af551`.	2021-01-09 09:24:56 -08:00
Florian Hahn	65f578fc0e	[VPlan] Keep start value of VPWidenPHIRecipe as VPValue. Similar to D92129, update VPWidenPHIRecipe to manage the start value as VPValue. This allows adjusting the start value as a VPlan transform, which will be used in a follow-up patch to support reductions during epilogue vectorization. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93975	2021-01-09 16:34:15 +00:00
Kazu Hirata	f62b93b9a2	[SCEV] Remove unused getExactExistingExpansion (NFC) The last use was removed on Sep 4, 2018 in commit `2cbba56337`.	2021-01-08 18:39:57 -08:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00
Arthur Eubanks	756dd70766	[NewPM] Run ObjC ARC passes Match the legacy PM in running various ObjC ARC passes. This requires making some module passes into function passes. These were initially ported as module passes since they add function declarations (e.g. https://reviews.llvm.org/D86178), but that's still up for debate and other passes do so. Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D93743	2021-01-08 15:47:11 -08:00
Florian Hahn	c493e9216b	[VPlan] Move reduction start value creation to widenPHIRecipe. This was suggested to prepare for D93975. By moving the start value creation to widenPHInstruction, we set the stage to manage the start value directly in VPWidenPHIRecipe, which be used subsequently to set the 'resume' value for reductions during epilogue vectorization. It also moves RdxDesc to the recipe, so we do not have to rely on Legal to look it up later. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D94175	2021-01-08 17:49:43 +00:00
Alexander Belyaev	bcbdeafa9c	Revert "[SLP]Need shrink the load vector after reordering." This reverts commit `4284afdf94`. This changes computed values in fused_batchnorm_test_cpu. Not equal to tolerance rtol=1e-06, atol=0.001 Mismatched value: a is different from b. not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5])) not close lhs = [-0.6636615 -0.9804948 -1.148275 -0.68193716 -0.8572368 -0.65046215 -0.6993756 -1.2244141 -1.0938729 -0.50369143 -0.51830524 -0.738452 -0.7214286 -0.48115745 -0.9380924 -0.9341769 -0.5916775 -1.2896856 -0.7264182 -0.9746917 -0.783249 -0.7659018 -0.86214024 -0.47784212] not close rhs = [ 0.44102234 0.12418899 -0.04359123 0.42274666 0.24744703 0.45422167 0.40530816 -0.11973029 0.01081094 0.6009924 0.5863786 0.3662318 0.38325527 0.62352633 0.1665914 0.1705069 0.5130063 -0.18500176 0.37826565 0.12999213 0.3214348 0.338782 0.24254355 0.62684166] not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046838] not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045 0.00100041 0.00100012 0.00100001 0.0010006 0.00100059 0.00100037 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]	2021-01-08 14:42:26 +01:00
Sanjay Patel	267ff7901c	[SLP] limit verifyFunction to debug build (PR48689) As noted in PR48689, the verifier may have some kind of exponential behavior that should be addressed separately. For now, only run it in debug mode to prevent problems for release+asserts. That limit is what we had before D80401, and I'm not sure if there was a reason to change it in that patch.	2021-01-08 08:10:17 -05:00
Cullen Rhodes	1e7efd397a	[LV] Legalize scalable VF hints In the following loop: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int a, int b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718	2021-01-08 10:49:44 +00:00
David Green	72fb5ba079	[LV] Don't sink into replication regions The new test case here contains a first order recurrences and an instruction that is replicated. The first order recurrence forces an instruction to be sunk _into_, as opposed to after the replication region. That causes several things to go wrong including registering vector instructions multiple times and failing to create dominance relations correctly. Instead we should be sinking to after the replication region, which is what this patch makes sure happens. Differential Revision: https://reviews.llvm.org/D93629	2021-01-08 09:50:10 +00:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Ruiling Song	8dddcc762d	[Cloning] Copy metadata of global declarations We have modules with metadata on declarations, and out-of-tree passes use that metadata, and we need to clone those modules. We really expect such metadata is kept during the clone operation. Reviewed by: arsenm, aprantl Differential Revision: https://reviews.llvm.org/D93451	2021-01-08 08:21:18 +08:00
Roman Lebedev	f2f81c554b	[SimplifyCFG] markAliveBlocks(): switch to non-permissive DomTree updates No actual changes needed, invoke can't have the same block as an unwind destination and a normal destination.	2021-01-08 02:15:27 +03:00
Roman Lebedev	d59f97bb3a	[SimplifyCFG] removeUnwindEdge(): switch to non-permissive DomTree updates No actual changes needed, Catchswitch cannot unwind to one of its catchpads.	2021-01-08 02:15:27 +03:00
Roman Lebedev	f0eba8ce2d	[SimplifyCFG] changeToCall(): switch to non-permissive DomTree updates No actual changes needed, normal and unwind destinations of an invoke can never be identical.	2021-01-08 02:15:27 +03:00
Roman Lebedev	be0a31d13b	[SimplifyCFG] DeleteDeadBlocks(): switch to non-permissive DomTree updates No actual changes needed, DetatchDeadBlocks() was already doing the right thing.	2021-01-08 02:15:27 +03:00
Roman Lebedev	66189212bb	[SimplifyCFG] MergeBlockIntoPredecessor(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same successor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	05adc73db0	[SimplifyCFG] changeToUnreachable(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	7600d7c7be	[SimplifyCFG] removeUnreachableBlocks(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:26 +03:00
Roman Lebedev	1f9b591ee6	[SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock(): switch to non-permissive DomTree updates ... which requires not deleting edges that were just deleted already, by not processing the same predecessor more than once.	2021-01-08 02:15:25 +03:00
Roman Lebedev	b3822728fa	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `indirectbr` handling ... which requires not deleting edges that were just deleted already.	2021-01-08 02:15:25 +03:00
Roman Lebedev	36593a30a4	[SimplifyCFG] ConstantFoldTerminator(): switch to non-permissive DomTree updates in `SwitchInst` handling ... which requires not deleting edges that will still be present.	2021-01-08 02:15:24 +03:00
Roman Lebedev	16ab8e5f6d	[SimplifyCFG] ConstantFoldTerminator(): handle matching destinations of condbr earlier We need to handle this case before dealing with the case of constant branch condition, because if the destinations match, latter fold would try to remove the DomTree edge that would still be present. This allows to make that particular DomTree update non-permissive	2021-01-08 02:15:24 +03:00
Arthur Eubanks	1a2eaebc09	[CoroSplit][NewPM] Don't call LazyCallGraph functions to split when no clones Apparently there can be no clones, as happens in coro-retcon-unreachable.ll. The alternative is to allow no split functions in addSplitRefRecursiveFunctions(), but it seems better to have the caller make sure it's not accidentally splitting no functions out. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D94258	2021-01-07 14:06:35 -08:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sanjay Patel	4c7148d75c	[SLP] remove opcode identifier for reduction; NFC Another step towards allowing intrinsics in reduction matching.	2021-01-07 14:07:27 -05:00
Hiroshi Yamauchi	cf5415c727	[PGO][PGSO] Let unroll hints take precedence over PGSO. Differential Revision: https://reviews.llvm.org/D94199	2021-01-07 10:10:31 -08:00
Roman Lebedev	6be1fd6b20	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): drop reachable errneous assert I have added it in `d15d81c` because it seemed correct, was holding for all the tests so far, and was validating the fix added in the same commit, but as David Major is pointing out (with a reproducer), the assertion isn't really correct after all. So remove it. Note that the `d15d81c` still fine.	2021-01-07 18:05:04 +03:00
Sidharth Baveja	048f184ee4	[SplitEdge] Add new parameter to SplitEdge to name the newly created basic block Summary: Currently SplitEdge does not support passing in parameter which allows you to name the newly created BasicBlock. This patch updates the function such that the name of the block can be passed in, if users of this utility decide to do so. Reviewed By: Whitney, bmahjour, asbirlea, jamieschmeiser Differential Revision: https://reviews.llvm.org/D94176	2021-01-07 14:49:23 +00:00
Alexey Bataev	4284afdf94	[SLP]Need shrink the load vector after reordering. After merging the shuffles, we cannot rely on the previous shuffle anymore and need to shrink the final shuffle, if it is required. Reported in D92668 Differential Revision: https://reviews.llvm.org/D93967	2021-01-07 04:50:48 -08:00
Oliver Stannard	76f6b125ce	Revert "[llvm] Use BasicBlock::phis() (NFC)" Reverting because this causes crashes on the 2-stage buildbots, for example http://lab.llvm.org:8011/#/builders/7/builds/1140. This reverts commit `9b228f107d`.	2021-01-07 09:43:33 +00:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Kazu Hirata	9b228f107d	[llvm] Use BasicBlock::phis() (NFC)	2021-01-06 18:27:35 -08:00
Alina Sbirlea	63aeaf754a	[DominatorTree] Add support for mixed pre/post CFG views. Add support for mixed pre/post CFG views. Update usages of the MemorySSAUpdater to use the new DT API by requesting the DT updates to be done by the MSSAUpdater. Differential Revision: https://reviews.llvm.org/D93371	2021-01-06 14:53:09 -08:00
Sanjay Patel	4c022b5a41	[SLP] use reduction kind's opcode to create new instructions; NFC Similar to `5a1d31a28` - This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-06 14:37:44 -05:00
Sanjay Patel	5d24089a70	[SLP] reduce code for propagating flags on reductions; NFC If we add/change to match intrinsics, this might get more wordy, but there's no need to list each kind currently.	2021-01-06 14:37:44 -05:00
Arthur Eubanks	7fea561eb1	[CGSCC][Coroutine][NewPM] Properly support function splitting/outlining Previously when trying to support CoroSplit's function splitting, we added in a hack that simply added the new function's node into the original function's SCC (https://reviews.llvm.org/D87798). This is incorrect since it might be in its own SCC. Now, more similar to the previous design, we have callers explicitly notify the LazyCallGraph that a function has been split out from another one. In order to properly support CoroSplit, there are two ways functions can be split out. One is the normal expected "outlining" of one function into a new one. The new function may only contain references to other functions that the original did. The original function must reference the new function. The new function may reference the original function, which can result in the new function being in the same SCC as the original function. The weird case is when the original function indirectly references the new function, but the new function directly calls the original function, resulting in the new SCC being a parent of the original function's SCC. This form of function splitting works with CoroSplit's Switch ABI. The second way of splitting is more specific to CoroSplit. CoroSplit's Retcon and Async ABIs split the original function into multiple functions that all reference each other and are referenced by the original function. In order to keep the LazyCallGraph in a valid state, all new functions must be processed together, else some nodes won't be populated. To keep things simple, this only supports the case where all new edges are ref edges, and every new function references every other new function. There can be a reference back from any new function to the original function, putting all functions in the same RefSCC. This also adds asserts that all nodes in a (Ref)SCC can reach all other nodes to prevent future incorrect hacks. The original hacks in https://reviews.llvm.org/D87798 are no longer necessary since all new functions should have been registered before calling updateCGAndAnalysisManagerForPass. This fixes all coroutine tests when opt's -enable-new-pm is true by default. This also fixes PR48190, which was likely due to the previous hack breaking SCC invariants. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93828	2021-01-06 11:19:15 -08:00
Francesco Petrogalli	dfd3384fee	[InstCombine] Update valueCoversEntireFragment to use TypeSize * Update valueCoversEntireFragment to use TypeSize. * Add a regression test. * Assertions have been added to protect untested codepaths. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91806	2021-01-06 17:14:59 +00:00
Florian Hahn	494db3816b	[LoopDeletion] Also consider loops with subloops for deletion. Currently, LoopDeletion does skip loops that have sub-loops, but this means we currently fail to remove some no-op loops. One example are inner loops with live-out values. Those cannot be removed by itself. But the containing loop may itself be a no-op and the whole loop-nest can be deleted. The legality checks do not seem to rely on analyzing inner-loops only for correctness. With LoopDeletion being a LoopPass, the change means that we now unfortunately need to do some extra work in parent loops, by checking some conditions we already checked. But there appears to be no noticeable compile time impact: http://llvm-compile-time-tracker.com/compare.php?from=02d11f3cda2ab5b8bf4fc02639fd1f4b8c45963e&to=843201e9cf3b6871e18c52aede5897a22994c36c&stat=instructions This changes patch leads to ~10 more loops being deleted on MultiSource, SPEC2000, SPEC2006 with -O3 & LTO This patch is also required (together with a few others) to eliminate a no-op loop in omnetpp as discussed on llvm-dev 'LoopDeletion / removal of empty loops.' (http://lists.llvm.org/pipermail/llvm-dev/2020-December/147462.html) This change becomes relevant after removing potentially infinite loops is made possible in 'must-progress' loops (D86844). Note that I added a function call with side-effects to an outer loop in `llvm/test/Transforms/LoopDeletion/update-scev.ll` to preserve the original spirit of the test. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D93716	2021-01-06 14:49:00 +00:00
Florian Hahn	816dba48af	[VPlan] Keep start value in VPWidenIntOrFpInductionRecipe (NFC). This patch updates VPWidenIntOrFpInductionRecipe to hold the start value for the induction variable. This makes the start value explicit and allows for adjusting the start value for a VPlan. The flexibility will be used in further patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92129	2021-01-06 11:47:33 +00:00
Florian Hahn	0ce5f402e0	[VPlan] Add getLiveInIRValue accessor to VPValue. This patch adds a new getLiveInIRValue accessor to VPValue, which returns the underlying value, if the VPValue is defined outside of VPlan. This is required to handle scalars in VPTransformState, which requires dealing with scalars defined outside of VPlan. We can simply check VPValue::Def to determine if the value is defined inside a VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92281	2021-01-06 11:20:42 +00:00
Florian Hahn	f73c09caa2	[VPlan] Use public VPValue constructor in VPPRedInstPHIRecipe (NFC). VPPredInstPHIRecipe does not need access to VPValue via friendship. It can just use the public constructor, Discussed as part of D92281.	2021-01-06 10:47:09 +00:00
Juneyoung Lee	29f8628d1f	[Constant] Add containsPoisonElement This patch - Adds containsPoisonElement that checks existence of poison in constant vector elements, - Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved. Thanks! Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94053	2021-01-06 12:10:33 +09:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Roman Lebedev	a14945c1db	[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): really don't delete DomTree edges multiple times	2021-01-06 01:52:39 +03:00
Roman Lebedev	2b437fcd47	[SimplifyCFG] SwitchToLookupTable(): switch to non-permissive DomTree updates ... which requires not deleting a DomTree edge that we just deleted.	2021-01-06 01:52:38 +03:00
Roman Lebedev	fa5447aa3f	[NFC][SimplifyCFG] SwitchToLookupTable(): pull out SI->getParent() into a variable	2021-01-06 01:52:38 +03:00
Roman Lebedev	d15d81ce15	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): deal with each predecessor only once If the predecessor is a switch, and BB is not the default destination, multiple cases could have the same destination. and it doesn't make sense to re-process the predecessor, because we won't make any changes, once is enough. I'm not sure this can be really tested, other than via the assertion being added here, which fires without the fix.	2021-01-06 01:52:37 +03:00
Roman Lebedev	fc96cb2dad	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): switch to non-permissive DomTree updates ... which requires not adding a DomTree edge that we just added.	2021-01-06 01:52:37 +03:00
Roman Lebedev	29ca7d5a1a	[SimplifyCFG] simplifyUnreachable(): fix handling of degenerate same-destination conditional branch One would hope that it would have been already canonicalized into an unconditional branch, but that isn't really guaranteed to happen with SimplifyCFG's visitation order.	2021-01-06 01:52:36 +03:00
Roman Lebedev	3460719f58	[NFC][SimplifyCFG] Add a test with same-destination condidional branch Reported by Mikael Holmén as post-commit feedback on https://reviews.llvm.org/rG2d07414ee5f74a09fb89723b4a9bb0818bdc2e18#968162	2021-01-06 01:52:36 +03:00
Roman Lebedev	f98535686e	[SimplifyCFG] simplifyUnreachable(): switch to non-permissive DomTree updates ... which requires not removing a DomTree edge if the switch's default still points at that destination, because it can't be removed; ... and not processing the same predecessor more than once.	2021-01-06 01:52:36 +03:00
Sanjay Patel	6a03f8ab62	[SLP] reduce code for finding reduction costs; NFC We can get both (vector/scalar) costs in a single switch instead of sequentially.	2021-01-05 17:35:54 -05:00
Arthur Eubanks	8cf1cc578d	[FuncAttrs] Infer noreturn A function is noreturn if all blocks terminating with a ReturnInst contain a call to a noreturn function. Skip looking at naked functions since there may be asm that returns. This can be further refined in the future by checking unreachable blocks and taking into account recursion. It looks like the attributor pass does this, but that is not yet enabled by default. This seems to help with code size under the new PM since PruneEH does not run under the new PM, missing opportunities to mark some functions noreturn, which in turn doesn't allow simplifycfg to clean up dead code. https://bugs.llvm.org/show_bug.cgi?id=46858. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93946	2021-01-05 13:25:42 -08:00
Sanjay Patel	5a1d31a284	[SLP] use reduction kind's opcode for cost model queries; NFC This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-05 15:12:40 -05:00
Sanjay Patel	d4a999b453	[SLP] reduce code duplication; NFC	2021-01-05 15:12:40 -05:00
Atmn Patel	f88a797521	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2021-01-05 09:56:16 -05:00
Sanjay Patel	3b8b2c7da2	[SLP] delete unused pairwise reduction option SLP tries to model 2 forms of vector reductions: pairwise and splitting. From the cost model code comments, those are defined using an example as: /// Pairwise: /// (v0, v1, v2, v3) /// ((v0+v1), (v2+v3), undef, undef) /// Split: /// (v0, v1, v2, v3) /// ((v0+v2), (v1+v3), undef, undef) I don't know the full history of this functionality, but it was partly added back in D29402. There are apparently no users at this point (no regression tests change). X86 might have managed to work-around the need for this through cost model and codegen improvements. Removing this code makes it easier to continue the work that was started in D87416 / D88193. The alternative -- if there is some target that is silently using this option -- is to move this logic into LoopUtils. We have related/duplicate functionality there via llvm::createTargetReduction(). Differential Revision: https://reviews.llvm.org/D93860	2021-01-05 13:23:07 -05:00
Florian Hahn	8a47e6252a	[VPlan] Re-add interleave group members to plan. Creating in-loop reductions relies on IR references to map IR values to VPValues after interleave group creation. Make sure we re-add the updated member to the plan, so the look-ups still work as expected This fixes a crash reported after D90562.	2021-01-05 15:06:47 +00:00
Simon Pilgrim	313d982df6	[IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse.	2021-01-05 11:01:10 +00:00
Florian Hahn	38c6933dcc	[LV] Simplify lambda in all_of to directly return hasVF() result. (NFC) The if in the lambda is not necessary. We can directly return the result of hasVF.	2021-01-05 10:34:06 +00:00
Simon Pilgrim	a000366d05	[SimplifyIndVar] createWideIV - make WideIVInfo arg a const ref. NFCI. The WideIVInfo arg is only ever used as a const. Fixes cppcheck warning.	2021-01-05 10:31:45 +00:00
Simon Pilgrim	7a97eeb197	[Coroutines] checkAsyncFuncPointer - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning.	2021-01-05 10:31:45 +00:00
Jeremy Morse	914066fe38	[DebugInfo] Avoid LSR crash on large integer inputs Loop strength reduction tries to recover debug variable values by looking for simple offsets from PHI values. In really extreme conditions there may be an offset used that won't fit in an int64_t, hitting an APInt assertion. This patch adds a regression test and adjusts the equivalent value collecting code to filter out any values where the offset can't be represented by an int64_t. This means that for very large integers with very large offsets, the variable location will become undef, which is the same behaviour as before `2a6782bb9f` / D87494. Differential Revision: https://reviews.llvm.org/D94016	2021-01-05 10:25:37 +00:00
Simon Pilgrim	84d5768d97	MemProfiler::insertDynamicShadowAtFunctionEntry - use cast<> instead of dyn_cast<> for dereferenced pointer. NFCI. We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null. Fixes static analyzer warning.	2021-01-05 09:34:01 +00:00
Arthur Eubanks	e30fbbe9a5	[JumpThreading][NewPM] Skip when target has divergent CF Matches the legacy pass. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94028	2021-01-04 16:08:08 -08:00
Roman Lebedev	32c47ebef1	[SimplifyCFG] SimplifyCondBranchToTwoReturns(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted, because we could be dealing with a block that didn't go through ConstantFoldTerminator() yet, and thus has a degenerate cond br with matching true/false destinations.	2021-01-05 01:26:37 +03:00
Roman Lebedev	110b3d7855	[SimplifyCFG] SimplifyEqualityComparisonWithOnlyPredecessor(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted.	2021-01-05 01:26:37 +03:00
Roman Lebedev	a8604e3d5b	[SimplifyCFG] simplifyIndirectBr(): switch to non-permissive DomTree updates ... which requires not deleting an edge that just got deleted.	2021-01-05 01:26:36 +03:00
Roman Lebedev	ed9de61cc3	[SimplifyCFGPass] mergeEmptyReturnBlocks(): switch to non-permissive DomTree updates ... which requires not inserting an edge that already exists.	2021-01-05 01:26:36 +03:00
Roman Lebedev	3fb57222c4	[NFCI] SimplifyCFG: switch to non-permissive DomTree updates, where possible Notably, this doesn't switch every case, remaining cases don't actually pass sanity checks in non-permissve mode, and therefore require further analysis. Note that SimplifyCFG still defaults to not preserving DomTree by default, so this is effectively a NFC change.	2021-01-05 01:26:36 +03:00
Sanjay Patel	36263a7ccc	[LoopUtils] remove redundant opcode parameter; NFC While here, rename the inaccurate getRecurrenceBinOp() because that was also used to get CmpInst opcodes. The recurrence/reduction kind should always refer to the expected opcode for a reduction. SLP appears to be the only direct caller of createSimpleTargetReduction(), and that calling code ideally should not be carrying around both an opcode and a reduction kind. This should allow us to generalize reduction matching to use intrinsics instead of only binops.	2021-01-04 17:05:28 -05:00
Sanjay Patel	9766957524	[LoopUtils] reduce code for creatng reduction; NFC We can return from each case instead creating a temporary variable just to have a common return.	2021-01-04 16:05:03 -05:00
Sanjay Patel	58b6c5d932	[LoopUtils] reorder logic for creating reduction; NFC If we are using a shuffle reduction, we don't need to go through the switch on opcode - return early.	2021-01-04 16:05:02 -05:00
Whitney Tsang	de6d43f16c	Revert "[LoopNest] Allow empty basic blocks without loops" This reverts commit `9a17bff4f7`.	2021-01-04 20:42:21 +00:00
Whitney Tsang	9a17bff4f7	[LoopNest] Allow empty basic blocks without loops Allow loop nests with empty basic blocks without loops in different levels as perfect. Reviewers: Meinersbur Differential Revision: https://reviews.llvm.org/D93665	2021-01-04 19:59:50 +00:00
Philip Reames	7c63aac7bd	Revert "[LoopDeletion] Break backedge of loops when known not taken" This reverts commit `dd6bb367d1`. Multi-stage builders are showing an assertion failure w/LCSSA not being preserved on entry to IndVars. Reason isn't clear, reverting while investigating.	2021-01-04 09:50:47 -08:00
Philip Reames	dd6bb367d1	[LoopDeletion] Break backedge of loops when known not taken The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects. Differential Revision: https://reviews.llvm.org/D93906	2021-01-04 09:19:29 -08:00
Florian Hahn	c367258b5c	[SimplifyCFG] Enabled hoisting late in LTO pipeline. `bb7d3af113` disabled hoisting in SimplifyCFG by default, but enabled it late in the pipeline. But it appears as if the LTO pipelines got missed. This patch adjusts the LTO pipelines to also enable hoisting in the later stages. Unfortunately there's no easy way to add a test for the change I think. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D93684	2021-01-04 16:26:58 +00:00
Florian Hahn	e0905553b4	[ArgPromotion] Delay dead GEP removal until doPromotion. Currently ArgPromotion removes dead GEPs as part of the legality check in isSafeToPromoteArgument. If no promotion happens, this means the pass claims no modifications happened, even though GEPs were removed. This patch fixes the issue by delaying removal of dead GEPs until doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and the code in doPromotion dealing with GEPs is updated to account for dead GEPs. Once we committed to promotion, it should be safe to remove dead GEPs. Alternatively isSafeToPromoteArgument could return an additional boolean to indicate whether it made changes, but this is quite cumbersome and there should be no real benefit of weeding out some dead GEPs here if we do not perform promotion. I added a test for the case where dead GEPs need to be removed when promotion happens in `578c5a0c6e`. Fixes PR47477. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93991	2021-01-04 09:51:20 +00:00
Andrew Litteken	5c951623bc	[IROutliner] Refactoring errors in the cost model from past patches. There were was the reuse of a variable that should not have been occurred due to confusion during committing patches.	2021-01-04 00:11:18 -06:00
Andrew Litteken	05e6ac4eb8	[IROutliner] Removing a duplicate addition, causing overestimates in IROutliner. There was an extra addition left over from a previous commit for the cost model, this removes it.	2021-01-03 23:36:28 -06:00
Roman Lebedev	98cd1c33e3	[NFC][SimplifyCFG] Hoist 'original' DomTree verification from simplifyOnce() into run() This is NFC since SimplifyCFG still currently defaults to not preserving DomTree. SimplifyCFGOpt::simplifyOnce() is only be called from SimplifyCFGOpt::run(), and can not be called externally, since SimplifyCFGOpt is defined in .cpp This avoids some needless verifications, and is thus a bit faster without sacrificing precision.	2021-01-04 01:02:02 +03:00
Roman Lebedev	a7684940f0	[SimplifyCFG] SimplifyTerminatorOnSelect(): fix/tune DomTree updates We only need to remove non-TrueBB/non-FalseBB successors, and we only need to do that once. We don't need to insert any new edges, because no new successors will be added.	2021-01-04 01:02:02 +03:00
Roman Lebedev	70935b9595	[NFC][SimplifyCFG] SimplifyTerminatorOnSelect(): pull out OldTerm->getParent() into a variable	2021-01-04 01:02:02 +03:00
Kazu Hirata	ba82c0b315	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-01-03 09:57:47 -08:00
Juneyoung Lee	1fc992bd86	[Scalarizer] Use poison as insertelement's placeholder This patch makes Scalarizer to use poison as insertelement's placeholder. It contains two changes in Scalarizer.cpp, and the both changes does not change the semantics of the optimized program. It is because the placeholder value (poison) is already completely hidden by following insertelement instructions. The first change at visitBitCastInst() creates poison vector of MidTy and consecutively inserts FanIn times, which is # of elems of MidTy. The second change at ScalarizerVisitor::finish() creates poison with Op->getType(), and it is filled with Count insertelements. The test diffs show that the poison value is never exposed after insertelements. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93989	2021-01-04 00:35:28 +09:00
Roman Lebedev	5fa241a657	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation, take 2	2021-01-03 01:45:48 +03:00
Roman Lebedev	6a3a8d17eb	[SimplifyCFG] FoldValueComparisonIntoPredecessors(): fine-tune/fix DomTree preservation	2021-01-03 01:45:48 +03:00
Roman Lebedev	7c8b8063b6	[SimplifyCFG][AMDGPU] AMDGPUUnifyDivergentExitNodes: SimplifyCFG isn't ready to preserve PostDomTree There is a number of transforms in SimplifyCFG that take DomTree out of DomTreeUpdater, and do updates manually. Until they are fixed, user passes are unable to claim that PDT is preserved. Note that the default for SimplifyCFG is still not to preserve DomTree, so this is still effectively NFC.	2021-01-03 01:45:46 +03:00
Kazu Hirata	530c5af6a4	[Transforms] Construct SmallVector with iterator ranges (NFC)	2021-01-02 09:24:17 -08:00
Florian Hahn	c50f9b2351	[LV] Clean up trailing whitespace (NFC). Clean up some stray whitespace that sneaked in recently.	2021-01-02 16:43:13 +00:00
Roman Lebedev	b9da488ad7	[SimplifyCFG] Don't actually take DomTreeUpdater unless we intend to maintain DomTree validity This guards against unintentional mistakes like the one i just fixed in previous commit.	2021-01-02 14:40:55 +03:00
Roman Lebedev	b4429f3cdd	[SimplifyCFG] Teach removeUndefIntroducingPredecessor to preserve DomTree	2021-01-02 01:01:20 +03:00
Roman Lebedev	657c1e09da	[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 2	2021-01-02 01:01:18 +03:00
Roman Lebedev	f1ce696056	[SimplifyCFG] Teach tryWidenCondBranchToCondBranch() to preserve DomTree	2021-01-02 01:01:17 +03:00
Roman Lebedev	e08fea3b24	[SimplifyCFGPass] Ensure that DominatorTreeWrapperPass is init'd before SimplifyCFG It's probably better than hoping that it will happen to be already initialized.	2021-01-02 01:01:17 +03:00
Kazu Hirata	f43daf1b62	[SSAUpdater] Remove unused code InstrIsPHI (NFC) The last use of this function was removed on Jan 4, 2018 in commit commit `90ecac01e9`.	2021-01-01 12:44:52 -08:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Florian Hahn	d9f306aa52	[LV] Fix crash when generating remarks with multi-exit loops. If DoExtraAnalysis is true (e.g. because remarks are enabled), we continue with the analysis rather than exiting. Update code to conditionally check if the ExitBB has phis or not a single predecessor. Otherwise a nullptr is dereferenced with DoExtraAnalysis.	2021-01-01 13:54:41 +00:00
Roman Lebedev	831636b0e6	[SimplifyCFG] SUCCESS! Teach createUnreachableSwitchDefault() to preserve DomTree This pretty much concludes patch series for updating SimplifyCFG to preserve DomTree. All 318 dedicated `-simplifycfg` tests now pass with `-simplifycfg-require-and-preserve-domtree=1`. There are a few leftovers that apparently don't have good test coverage. I do not yet know what gaps in test coverage will the wider-scale testing reveal, but the default flip might be close.	2021-01-01 03:25:25 +03:00
Roman Lebedev	e1440d43bc	[SimplifyCFG] Teach tryToSimplifyUncondBranchWithICmpInIt() to preserve DomTree	2021-01-01 03:25:25 +03:00
Roman Lebedev	8866583953	[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 2	2021-01-01 03:25:24 +03:00
Roman Lebedev	a815b6b2b2	[SimplifyCFG] Teach eliminateDeadSwitchCases() to preserve DomTree, part 1	2021-01-01 03:25:24 +03:00
Roman Lebedev	0d2f219d4d	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 3	2021-01-01 03:25:23 +03:00
Roman Lebedev	9f17dab1f4	[SimplifyCFG] Teach simplifyIndirectBr() to preserve DomTree	2021-01-01 03:25:23 +03:00
Roman Lebedev	b7c463d7b8	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 2	2021-01-01 03:25:23 +03:00
Roman Lebedev	c1b825d4b8	[SimplifyCFG] Teach FoldValueComparisonIntoPredecessors() to preserve DomTree, part 1	2021-01-01 03:25:22 +03:00
Andrew Litteken	1a9eb19af9	[IROutliner] Adding consistent function attribute merging When combining extracted functions, they may have different function attributes. We want to make sure that we do not make any assumptions, or lose any information. This attempts to make sure that we consolidate function attributes to their most general case. Tests: llvm/test/Transforms/IROutliner/outlining-compatible-and-attribute-transfer.ll llvm/test/Transforms/IROutliner/outlining-compatible-or-attribute-transfer.ll Reviewers: jdoefert, paquette Differential Revision: https://reviews.llvm.org/D87301	2020-12-31 12:30:23 -06:00
Fangrui Song	a90b42b0fe	[ThinLTO] Default -enable-import-metadata to false The default value is dependent on `-DLLVM_ENABLE_ASSERTIONS={off,on}` (D22167), which is error-prone. The few tests checking `!thinlto_src_module` can specify -enable-import-metadata explicitly. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D93959	2020-12-31 10:04:21 -08:00
Dávid Bolvanský	ae69fa9b9f	[InstCombine] Transform (A + B) - (A & B) to A \| B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = and i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = or i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/2fhW6r	2020-12-31 15:04:32 +01:00
Dávid Bolvanský	742ea77ca4	[InstCombine] Transform (A + B) - (A \| B) to A & B (PR48604) define i32 @src(i32 %x, i32 %y) { %0: %a = add i32 %x, %y %o = or i32 %x, %y %r = sub i32 %a, %o ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %b = and i32 %x, %y ret i32 %b } Transformation seems to be correct! https://alive2.llvm.org/ce/z/aQRh2j	2020-12-31 14:03:20 +01:00
Bogdan Graur	8bee4d4e8f	Revert "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops" Test clang/test/Misc/loop-opt-setup.c fails when executed in Release. This reverts commit `6f1503d598`. Reviewed By: SureYeaah Differential Revision: https://reviews.llvm.org/D93956	2020-12-31 11:47:49 +00:00
Atmn Patel	6f1503d598	[LoopDeletion] Allows deletion of possibly infinite side-effect free loops From C11 and C++11 onwards, a forward-progress requirement has been introduced for both languages. In the case of C, loops with non-constant conditionals that do not have any observable side-effects (as defined by 6.8.5p6) can be assumed by the implementation to terminate, and in the case of C++, this assumption extends to all functions. The clang frontend will emit the `mustprogress` function attribute for C++ functions (D86233, D85393, D86841) and emit the loop metadata `llvm.loop.mustprogress` for every loop in C11 or later that has a non-constant conditional. This patch modifies LoopDeletion so that only loops with the `llvm.loop.mustprogress` metadata or loops contained in functions that are required to make progress (`mustprogress` or `willreturn`) are checked for observable side-effects. If these loops do not have an observable side-effect, then we delete them. Loops without observable side-effects that do not satisfy the above conditions will not be deleted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86844	2020-12-30 21:43:01 -05:00
Kazu Hirata	95ea86587c	[PGO] Use isa instead of dyn_cast (NFC)	2020-12-30 17:45:38 -08:00
Roman Lebedev	51879a5256	[LoopIdiom] 'left-shift until bittest': don't forget to check that PHI node is in loop header Fixes an issue reported by Peter Collingbourne in https://reviews.llvm.org/D91726#2475301	2020-12-30 23:58:41 +03:00
Roman Lebedev	7f221c9196	[SimplifyCFG] Teach SwitchToLookupTable() to preserve DomTree	2020-12-30 23:58:41 +03:00
Roman Lebedev	a17025aa61	[SimplifyCFG] Teach switchToSelect() to preserve DomTree	2020-12-30 23:58:40 +03:00
Roman Lebedev	c45f765c0d	[SimplifyCFG] Teach SimplifyBranchOnICmpChain() to preserve DomTree	2020-12-30 23:58:40 +03:00
Sanjay Patel	8ca60db40b	[LoopUtils] reduce FMF and min/max complexity when forming reductions I don't know if there's some way this changes what the vectorizers may produce for reductions, but I have added test coverage with `3567908` and `5ced712` to show that both passes already have bugs in this area. Hopefully this does not make things worse before we can really fix it.	2020-12-30 15:22:26 -05:00
Yuanfang Chen	277ebe46c6	Fix `LLVM_ENABLE_MODULES=On` build for commit `480936e741`.	2020-12-30 10:54:04 -08:00
Andrew Litteken	fe431103b6	[IROutliner] Adding option to enable outlining from linkonceodr functions There are functions that the linker is able to automatically deduplicate, we do not outline from these functions by default. This allows for outlining from those functions. Tests: llvm/test/Transforms/IROutliner/outlining-odr.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87309	2020-12-30 12:08:04 -06:00
Sanjay Patel	e90ea76380	[IR] remove 'NoNan' param when creating FP reductions This is no-functional-change-intended (AFAIK, we can't isolate this difference in a regression test). That's because the callers should be setting the IRBuilder's FMF field when creating the reduction and/or setting those flags after creating. It doesn't make sense to override this one flag alone. This is part of a multi-step process to clean up the FMF setting/propagation. See PR35538 for an example.	2020-12-30 09:51:23 -05:00
Juneyoung Lee	420d046d6b	clang-format, address warnings	2020-12-30 23:05:07 +09:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Juneyoung Lee	bfedd5d2b6	[ConstraintElimination] Add support for select form of and/or This patch adds support for select form of and/or. Currently there is an ongoing effort for moving towards using `select a, b, false` instead of `and i1 a, b` and `select a, true, b` instead of `or i1 a, b` as well. D93065 has links to relevant changes. Alive2 proof: (undef input was disabled due to timeout :( ) - and: https://alive2.llvm.org/ce/z/AgvFbQ - or: https://alive2.llvm.org/ce/z/KjLJyb Differential Revision: https://reviews.llvm.org/D93935	2020-12-30 21:27:36 +09:00
Andrew Litteken	30feb93036	[IROutliner] Adding support for swift errors in the IROutliner Since some values can be swift errors, we need to make sure that we correctly propagate the parameter attributes. Tests found at: llvm/test/Transforms/IROutliner/outlining-swift-error.ll Reviewers: jroelofs, paquette Recommit of: `71867ed5e6` Differential Revision: https://reviews.llvm.org/D87742	2020-12-30 01:17:27 -06:00
Andrew Litteken	eeb99c2ac2	Revert "[IROutliner] Adding support for swift errors" This reverts commit `71867ed5e6`. Reverting for lack of commit messages.	2020-12-30 01:17:27 -06:00
Andrew Litteken	71867ed5e6	[IROutliner] Adding support for swift errors	2020-12-30 01:14:55 -06:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Kazu Hirata	16d20e2554	[Transforms/Utils] Construct SmallVector with iterator ranges (NFC)	2020-12-29 19:23:23 -08:00
Andrew Litteken	df4a931c63	[IROutliner] Adding OptRemarks to the IROutliner Pass This prints OptRemarks at each location where a decision is made to not outline, or to outline a specific section for the IROutliner pass. Test: llvm/test/Transforms/IROutliner/opt-remarks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87300	2020-12-29 15:52:08 -06:00
Roman Lebedev	39a56f7f17	[SimplifyCFG] Teach SimplifyTerminatorOnSelect() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	ec0b671a61	[SimplifyCFG] Teach SimplifyCondBranchToCondBranch() to preserve DomTree	2020-12-30 00:48:12 +03:00
Roman Lebedev	307156246f	[SimplifyCFG] Teach mergeConditionalStoreToAddress() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	d4c0abb4a3	[SimplifyCFG] Teach FoldCondBranchOnPHI() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	b8121b2e62	[SimplifyCFG] Teach SinkCommonCodeFromPredecessors() to preserve DomTree	2020-12-30 00:48:11 +03:00
Roman Lebedev	18c407bf4c	[SimplifyCFG] Teach HoistThenElseCodeToIf() to preserve DomTree	2020-12-30 00:48:10 +03:00
Roman Lebedev	fe9bdd9621	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 2	2020-12-30 00:48:10 +03:00
Roman Lebedev	6027e05dbf	[SimplifyCFG] Teach SimplifyEqualityComparisonWithOnlyPredecessor() to preserve DomTree, part 1	2020-12-30 00:48:10 +03:00
Sanjay Patel	8d18bc8e6d	[Utils] reduce code in createTargetReduction(); NFC The switch duplicated the translation in getRecurrenceBinOp(). This code is still weird because it translates to the TTI ReductionFlags for min/max, but then createSimpleTargetReduction() converts that back to RecurrenceDescriptor::MinMaxRecurrenceKind.	2020-12-29 15:56:19 -05:00
Sanjay Patel	21a3a0225d	[SLP] replace local reduction enum with RecurrenceKind; NFCI I'm not sure if the SLP enum was created before the IVDescriptor RecurrenceDescriptor / RecurrenceKind existed, but the code in SLP is now redundant with that class, so it just makes things more complicated to have both. We eventually call LoopUtils createSimpleTargetReduction() to create reduction ops, so we might as well standardize on those enum names. There's still a question of whether we need to use TTI::ReductionFlags vs. MinMaxRecurrenceKind, but that can be another clean-up step. Another option would just be to flatten the enums in RecurrenceDescriptor into a single enum. There isn't much benefit (smaller switches?) to having a min/max subset.	2020-12-29 14:52:11 -05:00
Andrew Litteken	6df161a2fb	[IROutliner] Adding a cost model, and debug option to turn the model off. This adds a cost model that takes into account the total number of machine instructions to be removed from each region, the number of instructions added by adding a new function with a set of instructions, and the instructions added by handling arguments. Tests not adding flags: llvm/test/Transforms/IROutliner/outlining-cost-model.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87299	2020-12-29 12:43:41 -06:00
Roman Lebedev	374ef57f13	[InstCombine] 'hoist xor-by-constant from xor-by-value': completely give up on constant exprs As Mikael Holmén is noting in the post-commit review for the first fix https://reviews.llvm.org/rGd4ccef38d0bb#967466 not hoisting constantexprs is not enough, because if the xor originally was a constantexpr (i.e. X is a constantexpr). `SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately undo this transform, thus again causing an infinite combine loop. This transform has resulted in a surprising number of constantexpr failures.	2020-12-29 16:28:18 +03:00
Arthur Eubanks	c2ef06d3dd	[NewPM] Port infer-address-spaces And add it to the AMDGPU opt pipeline. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93880	2020-12-28 19:58:12 -08:00
Kazu Hirata	5d2529f28f	[Scalar] Construct SmallVector with iterator ranges (NFC)	2020-12-28 19:55:18 -08:00
Andrew Litteken	1e23802507	[IROutliner] Merging identical output blocks for extracted functions. Many of the sets of output stores will be the same. When a block is created, we check if there is an output block with the same set of store instructions. If there is, we map the output block of the region back to the block, so that the extra argument controlling the switch statement can be set to the appropriate block value. Tests: - llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87298	2020-12-28 21:01:48 -06:00
Andrew Litteken	e6ae623314	[IROutliner] Adding support for consolidating functions with different output arguments. Certain regions can have values introduced inside the region that are used outside of the region. These may not be the same for each similar region, so we must create one over arching set of arguments for the consolidated function. We do this by iterating over the outputs for each extracted function, and creating as many different arguments to encapsulate the different outputs sets. For each output set, we create a different block with the necessary stores from the value to the output register. There is then one switch statement, controlled by an argument to the function, to differentiate which block to use. Changed Tests for consistency: llvm/test/Transforms/IROutliner/extraction.ll llvm/test/Transforms/IROutliner/illegal-assumes.ll llvm/test/Transforms/IROutliner/illegal-memcpy.ll llvm/test/Transforms/IROutliner/illegal-memmove.ll llvm/test/Transforms/IROutliner/illegal-vaarg.ll Tests to test new functionality: llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87296	2020-12-28 16:17:07 -06:00
Nikita Popov	4a16c507cb	[InstCombine] Disable unsafe select transform behind a flag This disables the poison-unsafe select -> and/or transform behind a flag (we continue to perform the fold by default). This is intended to simplify evaluation and testing while we teach various passes to directly recognize the select pattern. This only disables the main select -> and/or transform. A number of related ones are instead changed to canonicalize to the a ? b : false and a ? true : b forms which represent and/or respectively. This requires a bit of care to avoid infinite loops, as we do not want !a ? b : false to be converted into a ? false : b. The basic idea here is the same as D93065, but keeps the change behind a flag for now. Differential Revision: https://reviews.llvm.org/D93840	2020-12-28 22:43:52 +01:00
Roman Lebedev	ef93f7a11c	[SimplifyCFG] FoldBranchToCommonDest: gracefully handle unreachable code () We might be dealing with an unreachable code, so the bonus instruction we clone might be self-referencing. There is a sanity check that all uses of bonus instructions that are not in the original block with said bonus instructions are PHI nodes, and that is obviously not the case for self-referencing instructions.. So if we find such an use, just rewrite it. Thanks to Mikael Holmén for the reproducer! Fixes https://bugs.llvm.org/show_bug.cgi?id=48450#c8	2020-12-28 23:31:19 +03:00
Philip Reames	4b33b23877	Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder This reverts commit `4ffcd4fe9a` thus restoring `e4df6a40da`. The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.	2020-12-28 10:13:28 -08:00
Arthur Eubanks	4ffcd4fe9a	Revert "[LV] Vectorize (some) early and multiple exit loops" This reverts commit `e4df6a40da`. Breaks Windows bots, e.g. http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio	2020-12-28 10:05:41 -08:00
Philip Reames	e4df6a40da	[LV] Vectorize (some) early and multiple exit loops This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways: single exit loops which are not bottom tested multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later) The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts. The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration. The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical). Differential Revision: https://reviews.llvm.org/D93317	2020-12-28 09:40:42 -08:00
Roman Lebedev	d4ccef38d0	[InstCombine] 'hoist xor-by-constant from xor-by-value': ignore constantexprs As it is being reported (in post-commit review) in https://reviews.llvm.org/D93857 this fold (as i expected, but failed to come up with test coverage despite trying) has issues with constant expressions. Since we only care about true constants, which constantexprs are not, don't perform such hoisting for constant expressions.	2020-12-28 20:15:20 +03:00
Yevgeny Rouban	d76c1d2247	[RS4GC] Lazily set changed flag when folding single entry phis The function FoldSingleEntryPHINodes() is changed to return if it has changed IR or not. This return value is used by RS4GC to set the MadeChange flag respectively. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D93810	2020-12-28 10:54:21 +07:00
Juneyoung Lee	9d70dbdc2b	[InstCombine] use poison as placeholder for undemanded elems Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement. However, this is problematic because undef isn’t undefined enough. Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison. This makes a few straightforward optimizations incorrect, such as: ``` ; https://bugs.llvm.org/show_bug.cgi?id=44185 define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %xv = insertelement <4 x float> %q, float %x, i32 2 %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef } ret <4 x float> %r ; %r[3] is undef } => define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %r = insertelement <4 x float> %y, float %x, i32 1 ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison } Transformation doesn't verify! ERROR: Target is more poisonous than source ``` I’d like to suggest 1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too) 2. Updating shufflevector’s semantics to return poison element if mask is undef Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay. m_Undef() matches PoisonValue as well, so existing optimizations will still fire. The only concern is hidden miscompilations that will go incorrect when poison constant is given. A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :( Instead, I’ll simply locally maintain the tests and run Alive2. If there is any bug found, I’ll report it. Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93586	2020-12-28 08:58:15 +09:00
Florian Hahn	4ad41902e8	[GVN] Correctly set modified status when doing PRE on indices. This patch updates GVN to correctly return the modified status, if PRE is performed on indices. It fixes a crash when building the test-suite with EXPENSIVE_CHECKS and LTO.	2020-12-27 21:58:31 +00:00
Juneyoung Lee	d3f1f7b6bc	[EarlyCSE] Use m_LogicalAnd/Or matchers to handle branch conditions EarlyCSE's handleBranchCondition says: ``` // If the condition is AND operation, we can propagate its operands into the // true branch. If it is OR operation, we can propagate them into the false // branch. ``` This holds for the corresponding select patterns as well. This is a part of an ongoing work for disabling buggy select->and/or transformations. See llvm.org/pr48353 and D93065 for more context Proof: and: https://alive2.llvm.org/ce/z/MQWodU or: https://alive2.llvm.org/ce/z/9GLbB_ Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93842	2020-12-28 05:36:26 +09:00
Juneyoung Lee	f1d648b973	[GVN] Use m_LogicalAnd/Or to propagate equality from branch conditions This patch makes GVN recognize `select c1, c2, false` as well as `select c1, true, c2` branch condition and propagate equality from these. See llvm.org/pr48353, D93065 Differential Revision: https://reviews.llvm.org/D93841	2020-12-28 05:28:38 +09:00
Florian Hahn	0ea3749b3c	[LV] Set up branch from middle block earlier. Previously the branch from the middle block to the scalar preheader & exit was being set-up at the end of skeleton creation in completeLoopSkeleton. Inserting SCEV or runtime checks may result in LCSSA phis being created, if they are required. Adjusting branches afterwards may break those PHIs. To avoid this, we can instead create the branch from the middle block to the exit after we created the middle block, so we have the final CFG before potentially adjusting/creating PHIs. This fixes a crash for the included test case. For the non-crashing case, this is almost a NFC with respect to the generated code. The only change is the order of the predecessors of the involved branch targets. Note an assertion was moved from LoopVersioning() to LoopVersioning::versionLoop. Adjusting the branches means loop-simplify form may be broken before constructing LoopVersioning. But LV only uses LoopVersioning to annotate the loop instructions with !noalias metadata, which does not require loop-simplify form. This is a fix for an existing issue uncovered by D93317.	2020-12-27 18:21:12 +00:00
Kazu Hirata	8299fb8f25	[Transforms] Use llvm::append_range (NFC)	2020-12-27 09:57:29 -08:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00
Sanjay Patel	badf0f20f3	[SLP] rename reduction variables for readability; NFC I am hoping to extend the reduction matching code, and it is hard to distinguish "ReductionData" from "ReducedValueData". So extend the tree/root metaphor to include leaves. Another problem is that the name "OperationData" does not provide insight into its purpose. I'm not sure if we can alter that underlying data structure to make the code clearer.	2020-12-26 11:20:25 -05:00
Sanjay Patel	c4ca108966	[SLP] use switch to improve readability; NFC This will get more complicated when we handle intrinsics like maxnum.	2020-12-26 10:59:45 -05:00
Kazu Hirata	46bea9b297	[Local] Remove unused function RemovePredecessorAndSimplify (NFC) The last use of the function was removed on Sep 29, 2010 in commit `99c985c37d`.	2020-12-25 09:35:20 -08:00
Roman Lebedev	25aebe2ccf	[LoopIdiom] 'left-shift-until-bittest': keep no-wrap flags on shift, fix edge-case miscompilation for %x.next While `%x.curr` is always safe to compute, because `LoopBackedgeTakenCount` will always be smaller than `bitwidth(X)`, i.e. we never get poison, rewriting `%x.next` is more complicated, however, because `X << LoopTripCount` will be poison iff `LoopTripCount == bitwidth(X)` (which will happen iff `BitPos` is `bitwidth(x) - 1` and `X` is `1`). So unless we know that isn't the case (as alive2 notes, we know it's safe to do iff shift had no-wrap flags, or bitpos does not indicate signbit, or we know that %x is never `1`), we'll need to emit an alternative, safe IR, by either just shifting the `%x.curr`, or conditionally selecting between the computed `%x.next` and `0`.. Former IR looks better so let's do that. While there, ensure that we don't drop no-wrap flags from said shift.	2020-12-24 21:20:52 +03:00
Roman Lebedev	d9ebaeeb46	[InstCombine] Hoist xor-by-constant from xor-by-value This is one of the deficiencies that can be observed in https://godbolt.org/z/YPczsG after D91038 patch set. This exposed two missing folds, one was fixed by the previous commit, another one is `(A ^ B) \| ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`. `-early-cse` will catch it: https://godbolt.org/z/4n1T1v, but isn't meaningful to fix it in InstCombine, because we'd need to essentially do our own CSE, and we can't even rely on `Instruction::isIdenticalTo()`, because there are no guarantees that the order of operands matches. So let's just accept it as a loss.	2020-12-24 21:20:50 +03:00
Roman Lebedev	5b78303433	[InstCombine] Fold `a & ~(a ^ b)` to `x & y` ``` ---------------------------------------- define i32 @and_xor_not_common_op(i32 %a, i32 %b) { %0: %b2 = xor i32 %b, 4294967295 %t2 = xor i32 %a, %b2 %t4 = and i32 %t2, %a ret i32 %t4 } => define i32 @and_xor_not_common_op(i32 %a, i32 %b) { %0: %t4 = and i32 %a, %b ret i32 %t4 } Transformation seems to be correct! ```	2020-12-24 21:20:49 +03:00
Roman Lebedev	b3021a72a6	[IR][InstCombine] Add m_ImmConstant(), that matches on non-ConstantExpr constants, and use it A pattern to ignore ConstantExpr's is quite common, since they frequently lead into infinite combine loops, so let's make writing it easier.	2020-12-24 21:20:47 +03:00
Roman Lebedev	ff3749fc79	[NFC] SimplifyCFGOpt::simplifyUnreachable(): pacify unused variable warning Thanks to Luke Benes for pointing it out.	2020-12-24 21:20:46 +03:00
Kazu Hirata	df812115e3	[CodeGen, Transforms] Use llvm::any_of (NFC)	2020-12-24 09:08:36 -08:00
Simon Pilgrim	89abe1cf83	[InstCombine] foldICmpUsingKnownBits - use KnownBits signed/unsigned getMin/MaxValue helpers. NFCI. Replace the local compute*SignedMinMaxValuesFromKnownBits methods with the equivalent KnownBits helpers to determine the min/max value ranges.	2020-12-24 14:22:26 +00:00
Nikita Popov	ef2f843347	Revert "[InstCombine] Check inbounds in load/store of gep null transform (PR48577)" This reverts commit `899faa50f2`. Upon further consideration, this does not fix the right issue. Doing this fold for non-inbounds GEPs is legal, because the resulting pointer is still based-on null, which has no associated address range, and as such and access to it is UB. https://bugs.llvm.org/show_bug.cgi?id=48577#c3	2020-12-24 12:36:56 +01:00
Nikita Popov	90177912a4	Revert "[InstCombine] Fold gep inbounds of null to null" This reverts commit `eb79fd3c92`. This causes stage2 crashes, possibly due to StringMap being miscompiled. Reverting for now.	2020-12-24 10:20:31 +01:00
Roman Lebedev	f8079355c6	[InstCombine] canonicalizeAbsNabs(): don't propagate NSW flag for NABS patter As Nuno is noting in post-commit review in https://reviews.llvm.org/D87188#2467915 it is not correct to keep NSW for negated abs pattern, so don't do that.	2020-12-24 00:06:09 +03:00
Nikita Popov	759b8c11c3	[InstCombine] Handle different pointer types when folding gep of null The source pointer type is not necessarily the same as the result pointer type, so we can't simply return the original null pointer, it might be a different one.	2020-12-23 21:58:26 +01:00
Nikita Popov	eb79fd3c92	[InstCombine] Fold gep inbounds of null to null Effectively, this is what we were previously already doing when the GEP was used in conjunction with a load or store, but this fold can also be applied more generally: > The only in bounds address for a null pointer in the default > address-space is the null pointer itself.	2020-12-23 21:41:53 +01:00
Nikita Popov	899faa50f2	[InstCombine] Check inbounds in load/store of gep null transform (PR48577) If the GEP isn't inbounds, then accessing a GEP of null location is generally not UB. While this is a minimal fix, the GEP of null handling should probably be its own fold.	2020-12-23 21:03:22 +01:00
Craig Topper	897990e614	[IROutliner] Use isa instead of dyn_cast where the casted value isn't used. NFC Fixes unused variable warnings.	2020-12-23 11:40:15 -08:00
Roman Lebedev	2b61e7c68c	[LoopIdiom] 'left-shift until bittest' idiom: support rewriting loop as countable, allow extra cruft The current state of the transform is still not enough to support my motivational pattern, because it has one more "induction variable". I have delayed posting this patch, because originally even just rewriting the loop as countable wasn't enough to nicely transform my motivational pattern, because i expected that extra IV to be rewritten afterwards, but it wasn't happening until i fixed that in D91800. So, this patch allows the 'left-shift until bittest' loop idiom as long as the inserted ops are cheap, and lifts any and all extra use checks on the instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92754	2020-12-23 22:28:10 +03:00
Roman Lebedev	a0ddc61c5b	[LoopIdiom] 'left-shift until bittest' idiom: support canonical sign bit mask If the bitmask is for sign bit, instcombine would have canonicalized the pattern into a proper sign bit check. Supporting that is still simple, but requires a bit of a roundtrip - we first have to use `decomposeBitTestICmp()`, and the rest again just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91726	2020-12-23 22:28:09 +03:00
Roman Lebedev	cb2e5980ba	[LoopIdiom] 'left-shift until bittest' idiom: support constant bit mask The handing of the case where the mask is a constant is trivial, if said constant is a power of two, the bit in question is log2(mask), rest just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91725	2020-12-23 22:28:09 +03:00
Roman Lebedev	e124844709	[LoopIdiom] Introduce 'left-shift until bittest' idiom The motivation here is the following inner loop in fp16/fp24 -> fp32 expander, that runs as part of the floating-point DNG decompression in RawSpeed library: `cd380bb9a2/src/librawspeed/decompressors/DeflateDecompressor.cpp (L112-L115)` ``` while (!(fp32_fraction & (1 << 23))) { fp32_exponent -= 1; fp32_fraction <<= 1; } ``` (https://godbolt.org/z/r13YMh) As one might notice, that loop is currently uncountable, and that whole code stays scalar. Yet, it is rather trivial to make that loop countable: https://godbolt.org/z/do8WMz and we can prove that via alive2: https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?) ... and that allow for the whole fp16->fp32 code to vectorize: https://godbolt.org/z/7hYr13 Now, while i'd love to get there, i feel like i should take it in steps. For now, this introduces support for the most basic case, where the bit position is known as a variable, and the loop will go away (has no live-outs other than the recurrence, no extra instructions in the loop). I have added sufficient (i believe) test coverage, and alive2 is happy with those transforms. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91038	2020-12-23 22:28:09 +03:00
Andrew Litteken	b1191c8438	[IROutliner] Adding support for elevating constants that are not the same in each region to arguments When there are constants that have the same structural location, but not the same value, between different regions, we cannot simply outline the region. Instead, we find the constants that are not the same in each location, and promote them to arguments to be passed into the respective functions. At each call site, we pass the constant in as an argument regardless of type. Added/Edited Tests: llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll llvm/test/Transforms/IROutliner/outlining-different-constants.ll llvm/test/Transforms/IROutliner/outlining-different-globals.ll Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D87294	2020-12-23 13:03:05 -06:00
Evgeniy Brevnov	9fb074e7bb	[BPI] Improve static heuristics for "cold" paths. Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness of other paths. New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights. One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together. In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers. Reviewed By: yrouban Differential Revision: https://reviews.llvm.org/D79485	2020-12-23 22:47:36 +07:00
Kazu Hirata	3c707d73f2	[NewGVN] Remove for_each_found (NFC) The last use of the function was removed on Sep 30, 2017 in commit `9b926e90d3`.	2020-12-22 20:13:27 -08:00
Sanjay Patel	0d15d4b6f4	[SLP] use operand index abstraction for number of operands I think this is NFC currently, but the bug would be exposed when we allow binary intrinsics (maxnum, etc) as candidates for reductions. The code in matchAssociativeReduction() is using OperationData::getNumberOfOperands() when comparing whether the "EdgeToVisit" iterator is in-bounds, so this code must use the same (potentially offset) operand value to set the "EdgeToVisit".	2020-12-22 16:05:39 -05:00
Arnold Schwaighofer	333108e8be	Add a llvm.coro.end.async intrinsic The llvm.coro.end.async intrinsic allows to specify a function that is to be called as the last action before returning. This function will be inlined after coroutine splitting. This function can contain a 'musttail' call to allow for guaranteed tail calling as the last action. Differential Revision: https://reviews.llvm.org/D93568	2020-12-22 10:52:28 -08:00
Florian Hahn	ef4dbb2b7a	[LV] Use ScalarEvolution::getURemExpr to reduce duplication. ScalarEvolution should be able to handle both constant and variable trip counts using getURemExpr, so we do not have to handle them separately. This is a small simplification of `a56280094e`. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93677	2020-12-22 14:48:42 +00:00
Florian Hahn	c0c0ae16c3	[VPlan] Make VPInstruction a VPDef This patch turns updates VPInstruction to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90565	2020-12-22 09:53:47 +00:00
Gil Rapaport	a56280094e	[LV] Avoid needless fold tail When the trip-count is provably divisible by the maximal/chosen VF, folding the loop's tail during vectorization is redundant. This commit extends the existing test for constant trip-counts to any trip-count known to be divisible by maximal/selected VF by SCEV. Differential Revision: https://reviews.llvm.org/D93615	2020-12-22 10:25:20 +02:00
Ta-Wei Tu	d7a6f3a105	[LoopNest] Extend `LPMUpdater` and adaptor to handle loop-nest passes This is a follow-up patch of D87045. The patch implements "loop-nest mode" for `LPMUpdater` and `FunctionToLoopPassAdaptor` in which only top-level loops are operated. `createFunctionToLoopPassAdaptor` decides whether the returned adaptor is in loop-nest mode or not based on the given pass. If the pass is a loop-nest pass or the pass is a `LoopPassManager` which contains only loop-nest passes, the loop-nest version of adaptor is returned; otherwise, the normal (loop) version of adaptor is returned. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D87531	2020-12-22 08:47:38 +08:00
Congzhe Cao	c60a58f8d4	[InstCombine] Add check of i1 types in select-to-zext/sext transformation When doing select-to-zext/sext transformations, we should not handle TrueVal and FalseVal of i1 type otherwise it would result in zext/sext i1 to i1. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93272	2020-12-21 18:46:24 -05:00
Michael Forster	d56982b6f5	Remove unused variables. Differential Revision: https://reviews.llvm.org/D93635	2020-12-21 16:24:43 +01:00
Simon Pilgrim	88c5b50060	[AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts (REAPPLIED) The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns. This should allow us to begin resolving PR46896 et al. Ensure we block poison in a funnel shift value - similar to rG0fe91ad463fea9d08cbcd640a62aa9ca2d8d05e0 Reapplied with fix for PR48068 - we weren't checking that the shift values could be hoisted from their basicblocks. Differential Revision: https://reviews.llvm.org/D90625	2020-12-21 15:22:27 +00:00
Florian Hahn	f250892373	[VPlan] Make VPRecipeBase inherit from VPDef. This patch makes VPRecipeBase a direct subclass of VPDef, moving the SubclassID to VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90564	2020-12-21 13:34:00 +00:00
Florian Hahn	cd608dc8d3	[VPlan] Use VPDef for VPInterleaveRecipe. This patch turns updates VPInterleaveRecipe to manage the values it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90562	2020-12-21 10:56:53 +00:00
David Sherwood	3bf7d47a97	[NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp An earlier patch introduced asserts that the InstructionCost is valid because at that time the ReuseShuffleCost variable was an unsigned. However, now that the variable is an InstructionCost instance the asserts can be removed. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174	2020-12-21 09:12:28 +00:00
Kazu Hirata	5d24935f22	[PGO] Remove dead member variable InstrumentFuncEntry (NFC) This patch removes InstrumentFuncEntry as it is dead. The constructor of FuncPGOInstrumentation passes InstrumentFuncEntry to MST, but it doesn't make a local copy as a member variable.	2020-12-20 09:57:05 -08:00
Andrew Litteken	7c6f28a438	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D86978	2020-12-19 17:34:34 -06:00
Andrew Litteken	b8a2b6af37	Revert "[IROutliner] Deduplicating functions that only require inputs." Missing reviewers and differential revision in commit message. This reverts commit `5cdc4f57e5`.	2020-12-19 17:33:49 -06:00
Andrew Litteken	5cdc4f57e5	[IROutliner] Deduplicating functions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch deduplicates extracted functions that only have inputs and non of the special cases. We test that correctly deduplicate in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll	2020-12-19 17:26:29 -06:00
Roman Lebedev	c043f5055e	[SimplifyCFG] Teach FoldBranchToCommonDest() to preserve DomTree, part 1 ... for conditional branch case	2020-12-20 00:18:36 +03:00
Roman Lebedev	262ff9c23e	[SimplifyCFG] Teach TryToMergeLandingPad() to preserve DomTree	2020-12-20 00:18:36 +03:00
Roman Lebedev	6a1617d67c	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 2 ... for the custom case returning void.	2020-12-20 00:18:36 +03:00
Roman Lebedev	b94520c9ee	[SimplifyCFG] Teach SimplifyCondBranchToTwoReturns() to preserve DomTree, part 1 ... for the general case of returning a value.	2020-12-20 00:18:35 +03:00
Roman Lebedev	4d87a6ad13	[NFCI][SimplifyCFG] SimplifyCondBranchToTwoReturns(): pull out BI->getParent() into a variable	2020-12-20 00:18:35 +03:00
Roman Lebedev	83659c7076	[SimplifyCFG] simplifySingleResume(): FoldReturnIntoUncondBranch() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Apparently, there were no dedicated tests just for that functionality, so i'm adding one here.	2020-12-20 00:18:34 +03:00
Roman Lebedev	b7d00e29b7	[SimplifyCFG] Teach simplifySingleResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	c209b88dd4	[SimplifyCFG] Teach simplifyCommonResume() to preserve DomTree	2020-12-20 00:18:34 +03:00
Roman Lebedev	76e74d9395	[SimplifyCFG] Teach removeEmptyCleanup() to preserve DomTree	2020-12-20 00:18:33 +03:00
Roman Lebedev	4be8707e64	[SimplifyCFG] Teach FoldTwoEntryPHINode() to preserve DomTree Still boring, simply drop all edges to successors of DomBlock, and add an edge to to BB instead.	2020-12-20 00:18:33 +03:00
Roman Lebedev	b43b77ff9b	[NFCI][SimlifyCFG] simplifyOnce(): also perform DomTree validation And that exposes that a number of tests don't actually manage to maintain DomTree validity, which is inline with my observations. Once again, SimlifyCFG pass currently does not require/preserve DomTree by default, so this is effectively NFC.	2020-12-20 00:18:32 +03:00
Andrew Litteken	c52bcf3a9b	[IRSim][IROutliner] Limit to extracting regions that only require inputs. Extracted regions can have both inputs and outputs. In addition, the CodeExtractor removes inputs that are only used in llvm.assumes, and sunken allocas (values are used entirely in the extracted region as denoted by lifetime intrinsics). We also cannot combine sections that have different constants in the same structural location, and these constants will have to elevated to argument. This patch limits the extracted regions to those that only require inputs, and do not have any other special cases. We test that we do not outline the wrong constants in: test/Transforms/IROutliner/outliner-different-constants.ll test/Transforms/IROutliner/outliner-different-globals.ll test/Transforms/IROutliner/outliner-constant-vs-registers.ll We test that correctly outline in: test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Reviewers: paquette, plofti Differential Revision: https://reviews.llvm.org/D86977	2020-12-19 13:33:54 -06:00
Kazu Hirata	56edfcada9	[Target, Transforms] Use contains (NFC)	2020-12-19 10:43:19 -08:00
Aditya Kumar	1ab4db0f84	[HotColdSplit] Reflect full cost of parameters in split penalty Make the penalty for splitting a region more accurately reflect the cost of materializing all of the inputs/outputs to/from the region. This almost entirely eliminates code growth within functions which undergo splitting in key internal frameworks, and reduces the size of those frameworks between 2.6% to 3%. rdar://49167240 Patch by: Vedant Kumar(@vsk) Reviewers: hiraditya,rjf,t.p.northover Reviewed By: hiraditya,rjf Differential Revision: https://reviews.llvm.org/D59715	2020-12-18 17:06:17 -08:00
Akira Hatanaka	ffd982f7db	[ObjC][ARC] Fix a bug where the inline-asm retain/claim RV marker wasn't inserted when the original call had a 'returned' argument The code is testing whether the instruction BBI points to is the call that is paired up with the retainRV/claimRV call, but it doesn't work when the call has a 'returned' argument since GetArgRCIdentityRoot looks through 'returned' arguments. rdar://72485383	2020-12-18 16:59:06 -08:00
Sanjay Patel	37d0dda739	[SLP] fix typo; NFC	2020-12-18 16:55:52 -05:00
Nikita Popov	1f1145006b	[DSE] Use correct memory location for read clobber check MSSA DSE starts at a killing store, finds an earlier store and then checks that the earlier store is not read along any paths (without being killed first). However, it uses the memory location of the killing store for that, not the earlier store that we're attempting to eliminate. This has a number of problems: * Mismatches between what BasicAA considers aliasing and what DSE considers an overwrite (even though both are correct in isolation) can result in miscompiles. This is PR48279, which D92045 tries to fix in a different way. The problem is that we're using a location from a store that is potentially not executed and thus may be UB, in which case analysis results can be arbitrary. * Metadata on the killing store may be used to determine aliasing, but there is no guarantee that the metadata is valid, as the specific killing store may not be executed. Using the metadata on the earlier store is valid (it is the store we're removing, so on any execution where its removal may be observed, it must be executed). * The location is imprecise. For full overwrites the killing store will always have a location that is larger or equal than the earlier access location, so it's beneficial to use the earlier access location. This is not the case for partial overwrites, in which case either location might be smaller. There is some room for improvement here. Using the earlier access location means that we can no longer cache which accesses are read for a given killing store, as we may be querying different locations. However, it turns out that simply dropping the cache has no notable impact on compile-time. Differential Revision: https://reviews.llvm.org/D93523	2020-12-18 20:26:53 +01:00
Kazu Hirata	5ac37725df	[GVNHoist] Remove successorDominate (NFC) The function was introduced on Aug 25, 2016 in commit `5f0d0e60d1`. Its last use was removed on Sep 13, 2017 in commit `dfa8741c96`.	2020-12-18 10:29:52 -08:00
Roman Lebedev	897c985e1e	[InstCombine] Canonicalize SPF to abs intrinsic This patch enables canonicalization of SPF_ABS and SPF_ABS to the abs intrinsic. This is a recommit, the original try was `05d4c4ebc2`, but it was reverted due to an apparent miscompile, which since then has just been fixed by the previous commit. Differential Revision: https://reviews.llvm.org/D87188	2020-12-18 21:18:14 +03:00
Whitney Tsang	2a814cd9e1	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-18 17:37:17 +00:00
Arnamoy Bhattacharyya	06d5b1c9ad	[SROA] Remove Dead Instructions while creating speculative instructions The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions. This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet: ``` while (!PN.use_empty()) { LoadInst LI = cast<LoadInst>(PN.user_back()); LI->replaceAllUsesWith(NewPN); LI->eraseFromParent(); } ``` If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line: ``` LoadInst Load = PredBuilder.CreateAlignedLoad( LoadTy, InVal, Alignment, (PN.getName() + ".sroa.speculate.load." + Pred->getName())); ``` This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D92431	2020-12-18 11:47:02 -05:00
Sanjay Patel	47aaa99c0e	[VectorCombine] allow peeking through GEPs when creating a vector load This is an enhancement motivated by https://llvm.org/PR16739 (see D92858 for another). We can look through a GEP to find a base pointer that may be safe to use for a vector load. If so, then we shuffle (shift) the necessary vector element over to index 0. Alive2 proof based on 1 of the regression tests: https://alive2.llvm.org/ce/z/yPJLkh The vector translation is independent of endian (verify by changing to leading 'E' in the datalayout string). Differential Revision: https://reviews.llvm.org/D93229	2020-12-18 09:25:03 -05:00
Yevgeny Rouban	f0e3d1d6ca	[IndVars] Fix adding trunc instructions to unwind blocks Truncate instruction must not be inserted before landing pads. The insertion point is fixed.	2020-12-18 12:52:23 +07:00
Kazu Hirata	b621116716	[Transforms] Use llvm::erase_if (NFC)	2020-12-17 19:53:10 -08:00
Rong Xu	31c0b8700b	Fix clang-ppc64le-rhel buildbot build error ix buildbot build error due to commit 3733463d: [IR][PGO] Add hot func attribute and use hot/cold attribute in func section	2020-12-17 19:14:43 -08:00
Rong Xu	3733463dbb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Andrew Litteken	cea807602a	[IRSim][IROutliner] Adding InstVisitor to disallow certain operations. This adds a custom InstVisitor to return false on instructions that should not be allowed to be outlined. These match the illegal instructions in the IRInstructionMapper with exception of the addition of the llvm.assume intrinsic. Tests all the tests marked: illegal-*-.ll with a test for each kind of instruction that has been marked as illegal. Reviewers: jroelofs, paquette Differential Revisions: https://reviews.llvm.org/D86976	2020-12-17 19:33:57 -06:00
Roman Lebedev	2d07414ee5	[SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree Pretty boring, removeUnwindEdge() already known how to update DomTree, so if we are to call it, we must first flush our own pending updates; otherwise, we just stop predecessors from branching to us, and for certain predecessors, stop their predecessors from branching to them also.	2020-12-18 00:37:22 +03:00
Roman Lebedev	2ee724863e	[SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:22 +03:00
Roman Lebedev	164e0847a5	[SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree ... so just ensure that we pass DomTreeUpdater it into it. Fixes DomTree preservation for a large number of tests, all of which are marked as such so that they do not regress.	2020-12-18 00:37:21 +03:00
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `d20e0c3444`.	2020-12-17 21:00:37 +00:00
Johannes Doerfert	994bb6eb7d	[OpenMP][NFC] Provide a new remark and documentation If a GPU function is externally reachable we give up trying to find the (unique) kernel it is called from. This can hinder optimizations. Emit a remark and explain mitigation strategies. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93439	2020-12-17 14:38:26 -06:00
Andrew Litteken	dae34463e3	[IRSim][IROutliner] Adding the extraction basics for the IROutliner. Extracting the similar regions is the first step in the IROutliner. Using the IRSimilarityIdentifier, we collect the SimilarityGroups and sort them by how many instructions will be removed. Each IRSimilarityCandidate is used to define an OutlinableRegion. Each region is ordered by their occurrence in the Module and the regions that are not compatible with previously outlined regions are discarded. Each region is then extracted with the CodeExtractor into its own function. We test that correctly extract in: test/Transforms/IROutliner/extraction.ll test/Transforms/IROutliner/address-taken.ll test/Transforms/IROutliner/outlining-same-globals.ll test/Transforms/IROutliner/outlining-same-constants.ll test/Transforms/IROutliner/outlining-different-structure.ll Recommit of `bf899e8913` fixing memory leaks. Reviewers: paquette, jroelofs, yroux Differential Revision: https://reviews.llvm.org/D86975	2020-12-17 11:27:26 -06:00
Nabeel Omer	df2b9a3e02	[DebugInfo] Avoid re-ordering assignments in LCSSA The LCSSA pass makes use of a function insertDebugValuesForPHIs() to propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty behaviour occurs when the parent PHI of a newly inserted PHI is not the most recent assignment to a source variable. insertDebugValuesForPHIs ends up propagating a value that isn't the most recent assignemnt. This change removes the call to insertDebugValuesForPHIs() from LCSSA, preventing incorrect dbg.value intrinsics from being propagated. Propagating variable locations between blocks will occur later, during LiveDebugValues. Differential Revision: https://reviews.llvm.org/D92576	2020-12-17 16:17:32 +00:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Florian Hahn	01089c876b	[InstCombine] Preserve !annotation on newly created instructions. If the source instruction has !annotation metadata, all instructions created during combining should also have it. Tell the builder to add it. The !annotation system was discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) This patch is based on an earlier patch by Francis Visoiu Mistrih. Reviewed By: thegameg, lebedev.ri Differential Revision: https://reviews.llvm.org/D91444	2020-12-17 15:20:23 +00:00
Florian Hahn	75c04bfc61	[SimplifyCFG] Preserve !annotation in FoldBranchToCommonDest. When folding a branch to a common destination, preserve !annotation on the created instruction, if the terminator of the BB that is going to be removed has !annotation. This should ensure that !annotation is attached to the instructions that 'replace' the original terminator. Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D93410	2020-12-17 14:06:58 +00:00
Jun Ma	0138399903	[InstCombine] Remove scalable vector restriction in InstCombineCasts Differential Revision: https://reviews.llvm.org/D93389	2020-12-17 22:02:33 +08:00
Florian Hahn	29077ae860	[IRBuilder] Generalize debug loc handling for arbitrary metadata. This patch extends IRBuilder to allow adding/preserving arbitrary metadata on created instructions. Instead of using references to specific metadata nodes (like DebugLoc), IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which are added to each created instruction. The patch itself is a NFC and only moves the existing debug location handling over to the new system. In a follow-up patch it will be used to preserve !annotation metadata besides !dbg. The current approach requires iterating over MetadataToCopy to avoid adding duplicates, but given that the number of metadata kinds to copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2 (!dbg and !annotation)) that should not matter. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D93400	2020-12-17 13:27:43 +00:00
Cullen Rhodes	1fd3a04775	[LV] Disable epilogue vectorization for scalable VFs Epilogue vectorization doesn't support scalable vectorization factors yet, disable it for now. Reviewed By: sdesmalen, bmahjour Differential Revision: https://reviews.llvm.org/D93063	2020-12-17 12:14:03 +00:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00
Barry Revzin	92310454bf	Make LLVM build in C++20 mode Part of the <=> changes in C++20 make certain patterns of writing equality operators ambiguous with themselves (sorry!). This patch goes through and adjusts all the comparison operators such that they should work in both C++17 and C++20 modes. It also makes two other small C++20-specific changes (adding a constructor to a type that cases to be an aggregate, and adding casts from u8 literals which no longer have type const char*). There were four categories of errors that this review fixes. Here are canonical examples of them, ordered from most to least common: // 1) Missing const namespace missing_const { struct A { #ifndef FIXED bool operator==(A const&); #else bool operator==(A const&) const; #endif }; bool a = A{} == A{}; // error } // 2) Type mismatch on CRTP namespace crtp_mismatch { template <typename Derived> struct Base { #ifndef FIXED bool operator==(Derived const&) const; #else // in one case changed to taking Base const& friend bool operator==(Derived const&, Derived const&); #endif }; struct D : Base<D> { }; bool b = D{} == D{}; // error } // 3) iterator/const_iterator with only mixed comparison namespace iter_const_iter { template <bool Const> struct iterator { using const_iterator = iterator<true>; iterator(); template <bool B, std::enable_if_t<(Const && !B), int> = 0> iterator(iterator<B> const&); #ifndef FIXED bool operator==(const_iterator const&) const; #else friend bool operator==(iterator const&, iterator const&); #endif }; bool c = iterator<false>{} == iterator<false>{} // error \|\| iterator<false>{} == iterator<true>{} \|\| iterator<true>{} == iterator<false>{} \|\| iterator<true>{} == iterator<true>{}; } // 4) Same-type comparison but only have mixed-type operator namespace ambiguous_choice { enum Color { Red }; struct C { C(); C(Color); operator Color() const; bool operator==(Color) const; friend bool operator==(C, C); }; bool c = C{} == C{}; // error bool d = C{} == Red; } Differential revision: https://reviews.llvm.org/D78938	2020-12-17 10:44:10 +00:00
Florian Hahn	eba09a2db9	[InstCombine] Preserve !annotation for newly created instructions. When replacing an instruction with !annotation with a newly created replacement, add the !annotation metadata to the replacement. This mostly covers cases where the new instructions are created using the ::Create helpers. Instructions created by IRBuilder will be handled by D91444. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D93399	2020-12-17 09:06:51 +00:00
Kazu Hirata	4ad5b634f6	[GCN] Remove unused function handleNewInstruction (NFC) The function was added without a user on Dec 22, 2016 in commit `7e274e02ae`. It seems to be unused since then.	2020-12-16 21:57:48 -08:00
Hongtao Yu	ac068e014b	[CSSPGO] Consume pseudo-probe-based AutoFDO profile This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`. Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites). Adjustment to profile format A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums. Differential Revision: https://reviews.llvm.org/D92347	2020-12-16 15:57:18 -08:00

... 10 11 12 13 14 ...

27111 Commits