llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	c384b20b55	[opt] Remove temporary legacy pass name translations And update corresponding tests.	2022-10-07 11:09:46 -07:00
Paul Kirth	3155e3070c	[llvm][misexpect] Re-enable MisExpect for SampleProfiling MisExpect was occasionally crashing under SampleProfiling, due to a division by zero. We worked around that in D124302 by changing the assert to an early return. This patch is intended to add a test case for the crashing scenario and re-enable MisExpect for SampleProfiling. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D124481	2022-08-26 20:24:10 +00:00
Fangrui Song	0271ae65a6	[test] Change test/SampleProfile to use opaque pointers	2022-07-17 17:38:35 -07:00
Fangrui Song	5250e7a0d8	[test] Change -sample-profile tests to -passes= so that we can remove SampleProfileLoaderLegacyPass.	2022-07-17 12:00:41 -07:00
Fangrui Song	6f32e71b54	[test] Remove duplicate -sample-profile tests When -passes=sample-profile is tested, -sample-profile is redundant.	2022-07-17 00:52:30 -07:00
Mingming Liu	bc856eb3fc	[SampleProfile][Inline] Annotate sample profile inline remarks with link phase (prelink/postlink) information. Differential Revision: https://reviews.llvm.org/D126833	2022-06-22 17:00:53 -07:00
Dávid Bolvanský	f02a0a69af	[NFCI] Fixed missing colon in CHECK directives	2022-04-03 11:52:38 +02:00
Hongtao Yu	7a316c0a1f	[CSSPGO] Turn on profi and ext-tsp when using probe-based profile. Probe-based profile leads to a better performance when combined with profi and ext-tsp block layout. I'm turning them on by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122442	2022-03-25 09:09:21 -07:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `c5f789050d` and reapplies `7aea3ea8c3` with additional test changes.	2022-03-25 09:36:50 -05:00
minglotus-6	e2074de6a8	[ProfSampleLoader] When disable-sample-loader-inlining is true, merge profiles of inlined instances to outlining versions. When --disable-sample-loader-inlining is true, skip inline transformation, but merge profiles of inlined instances to outlining versions. Differential Revision: https://reviews.llvm.org/D121862	2022-03-23 13:02:48 -07:00
Arthur Eubanks	d051c566cd	[test] Remove the last couple uses of -analyze in llvm/test	2022-03-23 11:31:12 -07:00
spupyrev	f2ade65fb2	[CSSPGO] Even flow distribution Differential Revision: https://reviews.llvm.org/D118640	2022-03-02 13:12:05 -08:00
Hongtao Yu	07846e3387	[CSSPGO][PriorityInliner] Do not use block weight to drive callsite inlining. The priority-based inliner currenlty uses block count combined with callee entry count to drive callsite inlining. This doesn't work well with LTO where postlink inlining is driven by prelink-annotated block count which could be based on the merge of all context profiles. I'm fixing it by using callee profile entry count only which should be context-sensitive. I'm seeing 0.2% perf improvment for one of our internal large benchmarks with probe-based non-CS profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D120784	2022-03-01 18:43:19 -08:00
minglotus-6	142cedc283	[SampleProf][Inliner] Add an option to turn off inliner in sample-profile pass. Use case is offline evaluation (for inliner effectiveness) or debugging. Differential Revision: https://reviews.llvm.org/D120344	2022-02-23 14:21:33 -08:00
Arthur Eubanks	f72b76cde5	[test] Replace/remove some 'opt -analyze' RUN lines	2022-02-09 15:49:53 -08:00
Chris Bieneman	91337e9091	Handle whitespace in symbol list Trimming whitespace or carriage returns from symbols allows this code to work on Windows and makes it match other places symbol lists are handled. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D117570	2022-01-18 14:34:40 -06:00
spupyrev	13d1364a34	A better profi rebalancer This is an extension of profi post-processing step that rebalances counts in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically, the new version finds many more "unknown" subgraphs and marks more "unknown" basic blocks as hot (which prevents unwanted optimization passes). I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8. The algorithm is still linear and yields no build time overhead.	2022-01-18 12:14:24 -08:00
Hongtao Yu	5740bb801a	[CSSPGO] Use nested context-sensitive profile. CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205	2021-12-14 14:40:25 -08:00
Hongtao Yu	4e24ca1cdc	[CSSPGO] Turn on Profi by default As titled. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D115011	2021-12-02 17:56:38 -08:00
spupyrev	93a2c2919f	profi - a flow-based profile inference algorithm: Part III (out of 3) This is a continuation of D109860 and D109903. An important challenge for profile inference is caused by the fact that the sample profile is collected on a fully optimized binary, while the block and edge frequencies are consumed on an early stage of the compilation that operates with a non-optimized IR. As a result, some of the basic blocks may not have associated sample counts, and it is up to the algorithm to deduce missing frequencies. The problem is illustrated in the figure where three basic blocks are not present in the optimized binary and hence, receive no samples during profiling. We found that it is beneficial to treat all such blocks equally. Otherwise the compiler may decide that some blocks are “cold” and apply undesirable optimizations (e.g., hot-cold splitting) regressing the performance. Therefore, we want to distribute the counts evenly along the blocks with missing samples. This is achieved by a post-processing step that identifies "dangling" subgraphs consisting of basic blocks with no sampled counts; once the subgraphs are found, we rebalance the flow so as every branch probability is 50:50 within the subgraphs. Our experiments indicate up to 1% performance win using the optimization on some binaries and a significant improvement in the quality of profile counts (when compared to ground-truth instrumentation-based counts) {F19093045} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109980	2021-12-02 12:01:30 -08:00
spupyrev	98dd2f9ed3	profi - a flow-based profile inference algorithm: Part II (out of 3) This is a continuation of D109860. Traditional flow-based algorithms cannot guarantee that the resulting edge frequencies correspond to a connected flow in the control-flow graph. For example, for an instance in the attached figure, a flow-based (or any other) inference algorithm may produce an output in which the hot loop is disconnected from the entry block (refer to the rightmost graph in the figure). Furthermore, creating a connected minimum-cost maximum flow is a computationally NP-hard problem. Hence, we apply a post-processing adjustments to the computed flow by connecting all isolated flow components ("islands"). This feature helps to keep all blocks with sample counts connected and results in significant performance wins for some binaries. {F19077343} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109903	2021-12-02 11:04:21 -08:00
spupyrev	7cc2493daa	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. UPD Dec 1st 2021: - synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`; - added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-12-01 15:30:38 -08:00
Hongtao Yu	bf317f6698	[CSSPGO] Sorting nodes in a cycle of profiled call graph. For nodes that are in a cycle of a profiled call graph, the current order the underlying scc_iter computes purely depends on how those nodes are reached from outside the SCC and inside the SCC, based on the Tarjan algorithm. This does not honor profile edge hotness, thus does not gurantee hot callsites to be inlined prior to cold callsites. To mitigate that, I'm adding an extra sorter on top of scc_iter to sort scc functions in the order of callsite hotness, instead of changing the internal of scc_iter. Sorting on callsite hotness can be optimally based on detecting cycles on a directed call graph, i.e, to remove the coldest edge until a cycle is broken. However, detecting cycles isn't cheap. I'm using an MST-based approach which is faster and appear to deliver some performance wins. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D114204	2021-11-30 09:01:08 -08:00
Mehdi Amini	1392b654ff	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `884b6dd311`. The windows build is broken with a linker error.	2021-11-23 20:10:36 +00:00
spupyrev	884b6dd311	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 11:02:40 -08:00
Philip Reames	065f777d27	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `b00fc19822`. This change fails to build (link) on ubuntu x86,	2021-11-23 09:18:28 -08:00
spupyrev	b00fc19822	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 09:08:30 -08:00
Simon Pilgrim	d391e4fe84	[X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth. Helps prevent future scheduler model mismatches like those that were only addressed in D44687. Differential Revision: https://reviews.llvm.org/D113302	2021-11-07 15:06:54 +00:00
Hongtao Yu	d137854412	[SamplePGO] Fix callsite sample lookup to use dwarf names when dwarf linkage name isn't available. When linkage name isn't available in dwarf (ususally the case of C code), looking up callee samples should be based on the dwarf name instead of using an empty string. Also fixing a test issue where using empty string to look up callee samples accidentally returns the correct samples because it is treated as indirect call. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D112948	2021-11-01 21:24:33 -07:00
modimo	5caad9b5d3	[InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner Adds the following switches: 1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are: 1. Original: defers to original advisor 2. AlwaysInline: inline all sites not in replay 3. NeverInline: inline no sites not in replay 2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are: 1. Line 2. LineColumn 3. LineDiscriminator 4. LineColumnDiscriminator Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks. All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using: 1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function 2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system 3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another. Updated various tests to cover the new switches and negative remarks Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D112040	2021-10-29 12:32:03 -07:00
modimo	51ce567b38	[SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO. Testing: ninja check-all Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D112033	2021-10-29 12:04:52 -07:00
Steven Wan	57cb84f5a2	Point replay file to non-existent dummy Operating systems such as AIX allow open and read on directories, passing in a direcotry as the replay file triggers `Invalid remark format` instead of `Could not open remarks file: Is a directory`. This patch substitutes the directory with a non-existent filename. The current filecheck should still work as-is. Reviewed By: modimo, Whitney Differential Revision: https://reviews.llvm.org/D112745	2021-10-29 11:58:40 -04:00
Bjorn Pettersson	a413663d8f	[NewPM][test] Avoid using -enable-new-pm=1 since -passes implies new PM	2021-10-20 15:16:17 +02:00
modimo	41f814589f	[InlineAdvisor][NFC] Fix tests added in D110658 V2 On Windows there's an *.exe suffix to opt that isn't present in Linux. Remove the check for opt in the string	2021-10-18 15:27:33 -07:00
modimo	2786dc1096	[InlineAdvisor][NFC] Fix tests added in D110658 on Windows which outputs "is a directory" rather than "Is a directory" on error compared to linux	2021-10-18 14:21:01 -07:00
modimo	313c657fce	[InlineAdvisor] Add -inline-replay-scope=<Function\|Module> to control replay scope The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks. This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself. Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D110658	2021-10-18 13:08:39 -07:00
Artur Pilipenko	3f96f7b30c	Fix getInlineCost with ComputeFullInlineCost enabled Fix a bug when getInlineCost incorrectly returns a cost/threshold pair instead of an explicit never inline. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D111687	2021-10-14 17:41:41 -07:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Hongtao Yu	d9b511d8e8	[CSSPGO] Set PseudoProbeInserter as a default pass. Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110209	2021-09-22 09:09:48 -07:00
Hongtao Yu	c5fafc1e73	[CSSPGO] Tweakes to lower pseudo probe runtime overhead A couple tweaks to 1. allow more thinlto importing by excluding probe intrinsics from IR size in module summary 2. Allow general default attributes (nofree nosync nounwind) for pseudo probe intrinsic. Without those attributes, pseudo probes will be basically treated as unknown calls which will in turn block their containing functions from annotated with those attributes. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D109976	2021-09-17 12:28:09 -07:00
Arthur Eubanks	37e6a27da7	[test] Fixup tests with -analyze in llvm/test/Transforms	2021-09-04 16:45:51 -07:00
Wenlei He	054487c5b2	[CSSPGO] Honor preinliner decision for ThinLTO importing When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context. Differential Revision: https://reviews.llvm.org/D109088	2021-09-02 17:29:26 -07:00
Kevin Athey	04ed6e7afc	Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing" This reverts commit `a2768b4732`. Breaks sanitizer-x86_64-linux-fast buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11334 Log snippet: Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80 FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729) ****************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ****************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -- Exit Code: 2 Command Output (stderr): -- /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples' #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase, 8u>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21 #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13 #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16 #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12 #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>, llvm::ProfileSummaryInfo, llvm::CallGraph) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15 #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21 #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17 #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21 #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine, llvm::TargetLibraryInfoImpl, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7 #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12 #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in FileCheck error: '<stdin>' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -- ****************** Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80 FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729) **************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ****************** Script: -- : 'RUN: at line 4'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 5'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 8'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 11'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 14'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -- Exit Code: 2 Command Output (stderr): -- /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples' #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase, 8u>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21 #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13 #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16 #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12 #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>, llvm::ProfileSummaryInfo, llvm::CallGraph) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15 #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21 #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17 #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21 #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine, llvm::TargetLibraryInfoImpl, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7 #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12 #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in FileCheck error: '<stdin>' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -- ****************** Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. ****************** Failed Tests (2): LLVM :: Transforms/SampleProfile/early-inline.ll LLVM :: Transforms/SampleProfile/inline-cold.ll	2021-09-02 14:48:31 -07:00
Wenlei He	a2768b4732	[CSSPGO] Honor preinliner decision for ThinLTO importing When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context. Differential Revision: https://reviews.llvm.org/D109088	2021-09-02 08:24:06 -07:00
Wenlei He	c000b8bd5c	[CSSPGO] Use preinliner decision by default when available For CSSPGO, turn on `sample-profile-use-preinliner` by default. This simplifies the use of llvm-profgen preinliner as it's now simply driven by ContextShouldBeInlined flag for each context profile without needing extra compiler switch. Note that llvm-profgen's preinliner is still off by default, under switch `csspgo-preinliner`. Differential Revision: https://reviews.llvm.org/D109111	2021-09-01 23:45:38 -07:00
Hongtao Yu	f4711e0d00	[CSSPGO] Sort function offset table to speed up profile loading. With the context split work, the context-based (an array of strings) sorting performed at profile load time is way more expansive than single-string-based sorting. This is likely due to auxiliary operations done on each array element, such as indirect references, std::min operations, also likely cache misses. In this change I'm presorting profiles during profile generation time to avoid sorting at compile time. Compared to the previous context-split work, this effectively cuts down compile time by 20% for one of our large services and brings us closer to non-CS build, with still a small gap in build time. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D109036	2021-09-01 12:17:48 -07:00
Hongtao Yu	7ca8030030	[CSSPGO] Enable loading MD5 CS profile. Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster. There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D108342	2021-09-01 09:19:47 -07:00
Hongtao Yu	b9db70369b	[CSSPGO] Split context string to deduplicate function name used in the context. Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context. A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing. When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`. Testing This is no-diff change in terms of code quality and profile content (for text profile). For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated. The compile time of ads is dropped by 25%. Differential Revision: https://reviews.llvm.org/D107299	2021-08-30 20:09:29 -07:00
Wenlei He	a45d72e024	[CSSPGO] Add switch for sample loader to honor global pre-inliner decision from llvm-profgen The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy. Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner. Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked. Differential Revision: https://reviews.llvm.org/D108677	2021-08-25 17:20:15 -07:00
Hongtao Yu	ccb5b9bbfb	[CSSPGO] Allow the use of debug-info-for-profiling and pseudo-probe-for-profiling together Previoulsy debug-info-for-profiling and pseudo-probe-for-profiling are mutual exclusive because they compete the dwarf discrimnator for callsites on the IR. This changes allows to use the two switches together. The side effect is that callsite discriminators will be taken by pseudo probe, while discriminators for other instructions are still available for AutoFDO use. This is less than ideal, however, it still allows us a chance to smoothly transition from AutoFDO to CSSPGO, by collecting both profiles from a CSSPGO binary. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D107876	2021-08-12 08:52:49 -07:00

1 2 3 4 5 ...

304 Commits