llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	3adb89bb9f	[ThinLTO] Make cloneUsedGlobalVariables deterministic Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements. While here, decrease inline element counts from 8 to 4. The number of `llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full LTO/hybrid LTO, the number may be large, so we need to be careful. According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is good for a large project with WholeProgramDevirt, when available_externally vtables are placed in the llvm.compiler.used set. Differential Revision: https://reviews.llvm.org/D97128	2021-02-23 16:09:05 -08:00
Teresa Johnson	0a5949dcfa	[WPD] Fix handling of pure virtual base class The fix in `3c4c205060` caused an assert in the case of a pure virtual base class. In that case, the vTableFuncs list on the summary will be empty, so we were hitting the new assert that the linkage type was not available_externally. In the case of pure virtual, we do not want to assert, and additionally need to set VS so that we don't treat it conservatively and quit the analysis of the type id early. This exposed a pre-existing issue where we were not updating the vcall visibility on pure virtual functions when whole program visibility was specified. We were skipping updating the visibility on any global vars that didn't have any vTableFuncs, which meant all pure virtual were not updated, and the later analysis would block any devirtualization of calls that had a type id used on those pure virtual vtables (see the handling in the other code modified in this patch). Simply remove that check. It will mean that we may update the vcall visibility on global vars that aren't vtables, but that setting is ignored for any global vars that didn't have type metadata anyway. Added a new test case that asserted without removing the assert, and that requires the other fixes in this patch (updateVCallVisibilityInIndex and not skipping all vtables without virtual funcs) to get a successful devirtualization with index-only WPD. I added cases to test hybrid and regular LTO for completeness, although those already worked without the fixes here. With this final fix, a clang multistage bootstrap with WPD builds and runs all tests successfully. Differential Revision: https://reviews.llvm.org/D97126	2021-02-23 16:07:09 -08:00
Kristina Bessonova	e97aab8d15	[ThinLTO] Fix import of multiply defined global variables Currently, if there is a module that contains a strong definition of a global variable and a module that has both a weak definition for the same global and a reference to it, it may result in an undefined symbol error while linking with ThinLTO. It happens because: * the strong definition become internal because it is read-only and can be imported; * the weak definition gets replaced by a declaration because it's non-prevailing; * the strong definition failed to be imported because the destination module already contains another definition of the global yet this def is non-prevailing. The patch adds a check to computeImportForReferencedGlobals() that allows considering a global variable for being imported even if the module contains a definition of it in the case this def has an interposable linkage type. Note that currently the check is based only on the linkage type (and this seems to be enough at the moment), but it might be worth to account the information whether the def is prevailing or not. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95943	2021-02-21 18:34:12 +02:00
Teresa Johnson	fde55a9c9b	[LTO] Fix cloning of llvm.used when splitting module Refines the fix in `3c4c205060` to only put globals whose defs were cloned into the split regular LTO module on the cloned llvm.used globals. This avoids an issue where one of the attached values was a local that was promoted in the original module after the module was cloned. We only need to have the values defined in the new module on those globals. Fixes PR49251. Differential Revision: https://reviews.llvm.org/D97013	2021-02-20 09:46:43 -08:00
Wei Mi	4ffad1fb48	[SampleFDO] Add PromotedInsns to prevent repeated ICP. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, We use 0 count value profile to memorize which target has been promoted and prevent repeated ICP for the same target, so we delete PromotedInsns. However, I found the implementation in the patch has some shortcomings to be fixed otherwise there will still be repeated ICP. So I add PromotedInsns back temorarily. Will remove it after I get a thorough fix.	2021-02-19 10:01:49 -08:00
Nikita Popov	71a8e4e7d6	[MemCopyOpt] Enable MemorySSA by default This enables use of MemorySSA instead of MemDep in MemCpyOpt. To allow this without significant compile-time impact, the MemCpyOpt pass is moved directly before DSE (in the cases where this was not already the case), which allows us to reuse the existing MemorySSA analysis. Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt can also perform simple optimizations across basic blocks. Differential Revision: https://reviews.llvm.org/D94376	2021-02-19 18:06:25 +01:00
Nikita Popov	370addb996	[IR] Move willReturn() to Instruction This moves the willReturn() helper from CallBase to Instruction, so that it can be used in a more generic manner. This will make it easier to fix additional passes (ADCE and BDCE), and will give us one place to change if additional instructions should become non-willreturn (e.g. there has been talk about handling volatile operations this way). I have also included the IntrinsicInst workaround directly in here, so that it gets applied consistently. (As such this change is not entirely NFC -- FuncAttrs will now use this as well.) Differential Revision: https://reviews.llvm.org/D96992	2021-02-19 11:56:01 +01:00
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Hongtao Yu	e87b1b1d4e	[CSSPGO] Use callsite sample counts to annotate indirect call sites. With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling. I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D96990	2021-02-18 14:52:34 -08:00
Teresa Johnson	d55d46f43b	[WPD] Add an optional checking mode for debugging devirtualization This adds an internal option -wholeprogramdevirt-check which if enabled will guard each devirtualization with a runtime check against the expected target, and an invocation of a debug trap if the check fails. This is useful for debugging WPD failures involving undefined behavior (e.g. casting to another class type not in the inheritance chain). Differential Revision: https://reviews.llvm.org/D95969	2021-02-17 16:46:15 -08:00
Rong Xu	7397905ab0	[SampleFDO] Third Try: Refactor SampleProfile.cpp Apply the patch for the third time after fixing buildbot failures. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. (4) Add inline keyword to avoid duplicated symbols -- they will be removed later when the class is changed to a template. Differential Revision: https://reviews.llvm.org/D96455	2021-02-17 15:31:50 -08:00
Teresa Johnson	3c4c205060	[WPD][lld] Test handling of vtable definition from shared libraries Adds a lld test for a case that the handling added for dynamically exported symbols in `1487747e99` already fixes. Because isExportDynamic returns true when the symbol is SharedKind with default visibility, it will treat as dynamically exported and block devirtualization when the definition of a vtable comes from a shared library. This is desireable as it is dangerous to devirtualize in that case, since there could be hidden overrides in the shared library. Typically that happens when the shared library header contains available externally definitions, which applications can override. An example is std::error_category, which is overridden in LLVM and causing failures after a self build with WPD enabled, because libstdc++ contains hidden overrides of the virtual base class methods. The regular LTO case in the new test already worked, but there are 2 fixes in this patch needed for the index-only case and the hybrid LTO case. For the index-only case, WPD should not simply ignore available externally vtables. A follow on fix will be made to clang to emit type metadata for those vtables, which the new test is modeling. For the hybrid case, we need to ensure when the module is split that any llvm.*used globals are cloned to the regular LTO split module so available externally vtable definitions are not prematurely deleted. Another follow on fix will add the equivalent gold test, which requires a small fix to the plugin to treat symbols in dynamic libraries the same way lld already is. Differential Revision: https://reviews.llvm.org/D96721	2021-02-17 12:49:24 -08:00
Vedant Kumar	c28622fbf3	Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455" This reverts commit `c73cbf218a`. Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)" This reverts commit `a23e6b321c`. Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" This reverts commit `6fd5ccff72`. Still seeing link failures when building llc (or other tools), due to the new SampleProfileLoaderBaseImpl.h containing definitions that get duplicated across multiple TU's. ``` duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const) const' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const, llvm::BasicBlock const*>)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) ```	2021-02-17 10:22:24 -08:00
Rong Xu	6fd5ccff72	[SampleFDO] Reapply: Refactor SampleProfile.cpp Reapply patch after fixing buildbot failure. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 16:43:21 -08:00
Mehdi Amini	c761fe77bd	Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp" This reverts commit `310b35304c`. The build is broken with -DBUILD_SHARED_LIBS=ON : lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const, llvm::ProfileSummaryInfo, bool)': SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const' SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const' ...	2021-02-16 22:11:42 +00:00
Rong Xu	310b35304c	[SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 11:18:21 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Teresa Johnson	a80232bd5f	[LTT] Address post-review comments (NFC) Implement some post-review cleanup suggestions for D96083.	2021-02-13 15:52:59 -08:00
Hongtao Yu	de40f6d623	[CSSPGO] Process functions in a top-down order on a dynamic call graph. Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining. The profile top-down order for SCC is also extended to support non-CS profiles. Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95988	2021-02-11 12:36:59 -08:00
Benjamin Kramer	8fb4a4f7bb	[SampleFDO] Silence -Wnon-virtual-dtor warning There's no polymorphic deletion happening here.	2021-02-10 23:37:15 +01:00
Rong Xu	db0d7d0ba9	[SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen Break SampleProfileLoader into to a base and a derived class. Base class (SampleProfileLoaderBaseImpl) includes the common code for IR and MachineIR (CodeGen) sample loader. It will be templatelized in the later patch. Inline and Probe related code will remain in the derived class of SampleProfileLoader and stays in SampleProfile.cpp. We need to refactor some functions: (1) getInstWeight() to enable the code sharing -- put the core into getInstWeightImpl(). (2) emitAnnotation() and propagateWeights() to carve out the code specific to SampleProfileLoader. (3) make getInstWeight() and findFunctionSamples() virtual and override in SampleProfileLoader as they need to access the fields in the derived class. Differential Revision: https://reviews.llvm.org/D95832	2021-02-10 13:29:15 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Johannes Doerfert	81429abd83	[Attributor][FIX] Do not create UB by introducing a `noundef undef` This was reported as PR49104. The reproducer uses varargs but the issue is the same, we know an argument is dead but can't change the signature for some reason. The PR49104 situation was: We are in an CG-SCC traversal and we remove all the uses of an argument and proof it thereby dead. However, if we do not remove the argument, via signature rewrite, we need to ensure that the `undef` we introduce at the call site doesn't clash with a `noundef` attribute.	2021-02-09 13:02:38 -06:00
Kazu Hirata	302313a264	[Transforms] Use range-based for loops (NFC)	2021-02-08 22:33:53 -08:00
Teresa Johnson	3a27933ec2	[LTT] Don't attempt to lower type tests used only by assumes Type tests used only by assumes were original for devirtualization, but are meant to be kept through the first invocation of LTT so that they can be used for additional optimization. In the regular LTO case where the IR is analyzed we may find a resolution for the type test and end up rewriting the associated vtable global, which can have implications on section splitting. Simply ignore these type tests. Fixes PR48245. Differential Revision: https://reviews.llvm.org/D96083	2021-02-06 09:02:10 -08:00
Simon Pilgrim	f7d07dbb29	IROutliner.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:38:09 +00:00
Simon Pilgrim	476b912e7c	SampleProfile.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:31:17 +00:00
Simon Pilgrim	89edda7084	IROutliner.cpp - fix Wdocumentation warnings. NFCI.	2021-02-05 11:21:00 +00:00
Kazu Hirata	be37475897	[Transforms/IPO] Use range-based for loops (NFC)	2021-02-03 20:41:20 -08:00
Rong Xu	b8f13db5b7	[SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker This patch detaches SampleProfileLoader from class SampleCoverageTracker. We plan to move SampleProfileLoader to a template class. This would remain SampleCoverageTracker as a class. Also make callsiteIsHot() as a file static function. Differential Revision: https://reviews.llvm.org/D95823	2021-02-03 11:38:04 -08:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Wenlei He	1645f465be	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan:	2021-02-02 07:55:08 -08:00
Adrian Kuegel	48ca6da9d2	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit `9a03058d63`.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	3a65ec4bf9	Revert "Fix build break from D95024" This reverts commit `09cd849fde`.	2021-02-02 11:51:04 +01:00
Wenlei He	09cd849fde	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	9a03058d63	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Hongtao Yu	224fee8219	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791	2021-02-01 13:56:40 -08:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Kazu Hirata	8ed1636184	[llvm] Use isa instead of dyn_cast (NFC)	2021-01-29 23:23:37 -08:00
Hongtao Yu	7e99bddfea	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Teresa Johnson	1487747e99	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
Fangrui Song	54fb3ca96e	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Bjorn Pettersson	a9bd3d37bd	[NewPM] Add ExtraVectorizerPasses support As it looks like NewPM generally is using SimpleLoopUnswitch instead of LoopUnswitch, this patch also use SimpleLoopUnswitch in the ExtraVectorizerPasses sequence (compared with LegacyPM which use the LoopUnswitch pass). Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D95457	2021-01-26 22:59:10 +01:00
Florian Hahn	35b3989a30	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
modimo	ce7f9cdb50	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Wei Mi	c9cd9a0066	[SampleFDO] Report error when reading a bad/incompatible profile instead of turning off SampleFDO silently. Currently sample loader pass turns off SampleFDO optimization silently when it sees error in reading the profile. This behavior will defeat the tests which could have caught those bad/incompatible profile problems. This patch change the behavior to report error. Differential Revision: https://reviews.llvm.org/D95269	2021-01-25 10:28:23 -08:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Xun Li	bd3ca6666d	[Inlining] Delete redundant optnone/alwaysinline check The same check is done in InlineCost: `8b0bd54d0e/llvm/lib/Analysis/InlineCost.cpp (L2537-L2552)` Also, doing a check on the callee here is confusing, because anything that deals with callee should be done in the inner loop where we proecss all calls from the same caller. Differential Revision: https://reviews.llvm.org/D95186	2021-01-21 18:38:10 -08:00
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Kazu Hirata	e53472de68	[Transforms] Use llvm::append_range (NFC)	2021-01-20 21:35:54 -08:00
Mircea Trofin	ccec2cf1d9	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `d97f776be5`. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	95ce32c787	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Mircea Trofin	d97f776be5	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `e8aec763a5`.	2021-01-20 11:19:34 -08:00
Mircea Trofin	e8aec763a5	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
David Sherwood	255a507716	[NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp In places where we call a TTI.getXXCost() function I have changed the code to use InstructionCost instead of unsigned. This is in preparation for later on when we will change the TTI interfaces to return InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94427	2021-01-20 08:33:59 +00:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Wei Mi	21b1ad0340	[SampleFDO] Add the support to split the function profiles with context into separate sections. For ThinLTO, all the function profiles without context has been annotated to outline functions if possible in prelink phase. In postlink phase, profile annotation in postlink phase is only meaningful for function profile with context. If the profile is large, it is better to split the profile into two parts, one with context and one without, so the profile reading in postlink phase only has to read the part with context. To have the profile splitting, we extend the ExtBinary format to support different section arrangement. It will be flexible to add other section layout in the future without the need to create new class inheriting from ExtBinary class. Differential Revision: https://reviews.llvm.org/D94435	2021-01-19 15:16:19 -08:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Wei Mi	86341247c4	[NFC] Rename ThinLTOPhase to ThinOrFullLTOPhase and move it from PassBuilder.h to Pass.h. In some compiler passes like SampleProfileLoaderPass, we want to know which LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and it also cannot represent phase in full LTO. The patch extends it to include full LTO phases and move it from PassBuilder.h to Pass.h, then it is much easier for PassBuilder to communiate with each pass about current LTO phase. Differential Revision: https://reviews.llvm.org/D94613	2021-01-13 15:55:40 -08:00
Andrew Litteken	05b1a15f70	[IROutliner] Adapting to hoisted bitcasts in CodeExtractor In commit `700d2417d8` the CodeExtractor was updated so that bitcasts that have lifetime markers that beginning outside of the region are deduplicated outside the region and are not used as an output. This caused a discrepancy in the IROutliner, where in these cases there were arguments added to the aggregate function that were not needed causing assertion errors. The IROutliner queries the CodeExtractor twice to determine the inputs and outputs, before and after `findAllocas` is called with the same ValueSet for the outputs causing the duplication. This has been fixed with a dummy ValueSet for the first call. However, the additional bitcasts prevent us from using the same similarity relationships that were previously defined by the IR Similarity Analysis Pass. In these cases, we check whether the initial version of the region being analyzed for outlining is still the same as it was previously. If it is not, i.e. because of the additional bitcast instructions from the CodeExtractor, we discard the region. Reviewers: yroux Differential Revision: https://reviews.llvm.org/D94303	2021-01-13 11:10:37 -06:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
modimo	2a49b7c64a	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Florian Hahn	6cd44b204c	[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`. Similar to D94125, derive `willreturn` for functions that are `readonly` and `mustprogress` in FunctionAttrs. To quote the reasoning from D94125: Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: jdoerfert, nikic Differential Revision: https://reviews.llvm.org/D94502	2021-01-12 20:02:34 +00:00
Giorgis Georgakoudis	9751705512	[OpenMPOpt][WIP] Expand parallel region merging The existing implementation of parallel region merging applies only to consecutive parallel regions that have speculatable sequential instructions in-between. This patch lifts this limitation to expand merging with any sequential instructions in-between, except calls to unmergable OpenMP runtime functions. In-between sequential instructions in the merged region are sequentialized in a "master" region and any output values are broadcasted to the following parallel regions and the sequential region continuation of the merged region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90909	2021-01-11 08:06:23 -08:00
Florian Hahn	c701f85c45	[STLExtras] Use return type from operator* of the wrapped iter. Currently make_early_inc_range cannot be used with iterators with operator* implementations that do not return a reference. Most notably in the LLVM codebase, this means the User iterator ranges cannot be used with make_early_inc_range, which slightly simplifies iterating over ranges while elements are removed. Instead of directly using BaseT::reference as return type of operator, this patch uses decltype to get the actual return type of the operator implementation in WrappedIteratorT. This patch also updates a few places to use make use of make_early_inc_range. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93992	2021-01-10 14:41:13 +00:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Arthur Eubanks	7fea561eb1	[CGSCC][Coroutine][NewPM] Properly support function splitting/outlining Previously when trying to support CoroSplit's function splitting, we added in a hack that simply added the new function's node into the original function's SCC (https://reviews.llvm.org/D87798). This is incorrect since it might be in its own SCC. Now, more similar to the previous design, we have callers explicitly notify the LazyCallGraph that a function has been split out from another one. In order to properly support CoroSplit, there are two ways functions can be split out. One is the normal expected "outlining" of one function into a new one. The new function may only contain references to other functions that the original did. The original function must reference the new function. The new function may reference the original function, which can result in the new function being in the same SCC as the original function. The weird case is when the original function indirectly references the new function, but the new function directly calls the original function, resulting in the new SCC being a parent of the original function's SCC. This form of function splitting works with CoroSplit's Switch ABI. The second way of splitting is more specific to CoroSplit. CoroSplit's Retcon and Async ABIs split the original function into multiple functions that all reference each other and are referenced by the original function. In order to keep the LazyCallGraph in a valid state, all new functions must be processed together, else some nodes won't be populated. To keep things simple, this only supports the case where all new edges are ref edges, and every new function references every other new function. There can be a reference back from any new function to the original function, putting all functions in the same RefSCC. This also adds asserts that all nodes in a (Ref)SCC can reach all other nodes to prevent future incorrect hacks. The original hacks in https://reviews.llvm.org/D87798 are no longer necessary since all new functions should have been registered before calling updateCGAndAnalysisManagerForPass. This fixes all coroutine tests when opt's -enable-new-pm is true by default. This also fixes PR48190, which was likely due to the previous hack breaking SCC invariants. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93828	2021-01-06 11:19:15 -08:00
Arthur Eubanks	8cf1cc578d	[FuncAttrs] Infer noreturn A function is noreturn if all blocks terminating with a ReturnInst contain a call to a noreturn function. Skip looking at naked functions since there may be asm that returns. This can be further refined in the future by checking unreachable blocks and taking into account recursion. It looks like the attributor pass does this, but that is not yet enabled by default. This seems to help with code size under the new PM since PruneEH does not run under the new PM, missing opportunities to mark some functions noreturn, which in turn doesn't allow simplifycfg to clean up dead code. https://bugs.llvm.org/show_bug.cgi?id=46858. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93946	2021-01-05 13:25:42 -08:00
Florian Hahn	c367258b5c	[SimplifyCFG] Enabled hoisting late in LTO pipeline. `bb7d3af113` disabled hoisting in SimplifyCFG by default, but enabled it late in the pipeline. But it appears as if the LTO pipelines got missed. This patch adjusts the LTO pipelines to also enable hoisting in the later stages. Unfortunately there's no easy way to add a test for the change I think. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D93684	2021-01-04 16:26:58 +00:00
Florian Hahn	e0905553b4	[ArgPromotion] Delay dead GEP removal until doPromotion. Currently ArgPromotion removes dead GEPs as part of the legality check in isSafeToPromoteArgument. If no promotion happens, this means the pass claims no modifications happened, even though GEPs were removed. This patch fixes the issue by delaying removal of dead GEPs until doPromotion: isSafeToPromoteArgument can simply skips dead GEPs and the code in doPromotion dealing with GEPs is updated to account for dead GEPs. Once we committed to promotion, it should be safe to remove dead GEPs. Alternatively isSafeToPromoteArgument could return an additional boolean to indicate whether it made changes, but this is quite cumbersome and there should be no real benefit of weeding out some dead GEPs here if we do not perform promotion. I added a test for the case where dead GEPs need to be removed when promotion happens in `578c5a0c6e`. Fixes PR47477. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93991	2021-01-04 09:51:20 +00:00
Andrew Litteken	5c951623bc	[IROutliner] Refactoring errors in the cost model from past patches. There were was the reuse of a variable that should not have been occurred due to confusion during committing patches.	2021-01-04 00:11:18 -06:00
Andrew Litteken	05e6ac4eb8	[IROutliner] Removing a duplicate addition, causing overestimates in IROutliner. There was an extra addition left over from a previous commit for the cost model, this removes it.	2021-01-03 23:36:28 -06:00
Kazu Hirata	ba82c0b315	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-01-03 09:57:47 -08:00
Kazu Hirata	530c5af6a4	[Transforms] Construct SmallVector with iterator ranges (NFC)	2021-01-02 09:24:17 -08:00
Andrew Litteken	1a9eb19af9	[IROutliner] Adding consistent function attribute merging When combining extracted functions, they may have different function attributes. We want to make sure that we do not make any assumptions, or lose any information. This attempts to make sure that we consolidate function attributes to their most general case. Tests: llvm/test/Transforms/IROutliner/outlining-compatible-and-attribute-transfer.ll llvm/test/Transforms/IROutliner/outlining-compatible-or-attribute-transfer.ll Reviewers: jdoefert, paquette Differential Revision: https://reviews.llvm.org/D87301	2020-12-31 12:30:23 -06:00
Fangrui Song	a90b42b0fe	[ThinLTO] Default -enable-import-metadata to false The default value is dependent on `-DLLVM_ENABLE_ASSERTIONS={off,on}` (D22167), which is error-prone. The few tests checking `!thinlto_src_module` can specify -enable-import-metadata explicitly. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D93959	2020-12-31 10:04:21 -08:00
Yuanfang Chen	277ebe46c6	Fix `LLVM_ENABLE_MODULES=On` build for commit `480936e741`.	2020-12-30 10:54:04 -08:00
Andrew Litteken	fe431103b6	[IROutliner] Adding option to enable outlining from linkonceodr functions There are functions that the linker is able to automatically deduplicate, we do not outline from these functions by default. This allows for outlining from those functions. Tests: llvm/test/Transforms/IROutliner/outlining-odr.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87309	2020-12-30 12:08:04 -06:00
Andrew Litteken	30feb93036	[IROutliner] Adding support for swift errors in the IROutliner Since some values can be swift errors, we need to make sure that we correctly propagate the parameter attributes. Tests found at: llvm/test/Transforms/IROutliner/outlining-swift-error.ll Reviewers: jroelofs, paquette Recommit of: `71867ed5e6` Differential Revision: https://reviews.llvm.org/D87742	2020-12-30 01:17:27 -06:00
Andrew Litteken	eeb99c2ac2	Revert "[IROutliner] Adding support for swift errors" This reverts commit `71867ed5e6`. Reverting for lack of commit messages.	2020-12-30 01:17:27 -06:00
Andrew Litteken	71867ed5e6	[IROutliner] Adding support for swift errors	2020-12-30 01:14:55 -06:00
Andrew Litteken	df4a931c63	[IROutliner] Adding OptRemarks to the IROutliner Pass This prints OptRemarks at each location where a decision is made to not outline, or to outline a specific section for the IROutliner pass. Test: llvm/test/Transforms/IROutliner/opt-remarks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87300	2020-12-29 15:52:08 -06:00
Andrew Litteken	6df161a2fb	[IROutliner] Adding a cost model, and debug option to turn the model off. This adds a cost model that takes into account the total number of machine instructions to be removed from each region, the number of instructions added by adding a new function with a set of instructions, and the instructions added by handling arguments. Tests not adding flags: llvm/test/Transforms/IROutliner/outlining-cost-model.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87299	2020-12-29 12:43:41 -06:00
Andrew Litteken	1e23802507	[IROutliner] Merging identical output blocks for extracted functions. Many of the sets of output stores will be the same. When a block is created, we check if there is an output block with the same set of store instructions. If there is, we map the output block of the region back to the block, so that the extra argument controlling the switch statement can be set to the appropriate block value. Tests: - llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87298	2020-12-28 21:01:48 -06:00
Andrew Litteken	e6ae623314	[IROutliner] Adding support for consolidating functions with different output arguments. Certain regions can have values introduced inside the region that are used outside of the region. These may not be the same for each similar region, so we must create one over arching set of arguments for the consolidated function. We do this by iterating over the outputs for each extracted function, and creating as many different arguments to encapsulate the different outputs sets. For each output set, we create a different block with the necessary stores from the value to the output register. There is then one switch statement, controlled by an argument to the function, to differentiate which block to use. Changed Tests for consistency: llvm/test/Transforms/IROutliner/extraction.ll llvm/test/Transforms/IROutliner/illegal-assumes.ll llvm/test/Transforms/IROutliner/illegal-memcpy.ll llvm/test/Transforms/IROutliner/illegal-memmove.ll llvm/test/Transforms/IROutliner/illegal-vaarg.ll Tests to test new functionality: llvm/test/Transforms/IROutliner/outlining-different-output-blocks.ll llvm/test/Transforms/IROutliner/outlining-remapped-outputs.ll llvm/test/Transforms/IROutliner/outlining-same-output-blocks.ll Reviewers: jroelofs, paquette Differential Revision: https://reviews.llvm.org/D87296	2020-12-28 16:17:07 -06:00
Kazu Hirata	8299fb8f25	[Transforms] Use llvm::append_range (NFC)	2020-12-27 09:57:29 -08:00
Craig Topper	897990e614	[IROutliner] Use isa instead of dyn_cast where the casted value isn't used. NFC Fixes unused variable warnings.	2020-12-23 11:40:15 -08:00
Andrew Litteken	b1191c8438	[IROutliner] Adding support for elevating constants that are not the same in each region to arguments When there are constants that have the same structural location, but not the same value, between different regions, we cannot simply outline the region. Instead, we find the constants that are not the same in each location, and promote them to arguments to be passed into the respective functions. At each call site, we pass the constant in as an argument regardless of type. Added/Edited Tests: llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll llvm/test/Transforms/IROutliner/outlining-different-constants.ll llvm/test/Transforms/IROutliner/outlining-different-globals.ll Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D87294	2020-12-23 13:03:05 -06:00
Michael Forster	d56982b6f5	Remove unused variables. Differential Revision: https://reviews.llvm.org/D93635	2020-12-21 16:24:43 +01:00

1 2 3 4 5 ...

4506 Commits