llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	a273c40820	llvm/tools: Convert tests to opaque pointers	2022-11-27 20:20:04 -08:00
Zequan Wu	a7fa5febaa	[Test] Fix CHECK typo. Differential Revision: https://reviews.llvm.org/D137287	2022-11-04 10:18:04 -07:00
Hongtao Yu	d5a963ab8b	[PseudoProbe] Replace relocation with offset for entry probe. Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the code body - For subsequent probes, an incremental offset from the current probe to the previous probe The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address. A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name. Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function ``` Foo() { … Probe 1 … Probe 2 } ``` If it is transformed into two binary functions: ``` Foo: … Foo.outlined: … ``` The encoding for the two binary functions will be separate: ``` GUID of Foo Probe 1 GUID of Foo Sentinel probe of Foo.outlined Probe 2 ``` Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address. On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues. The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding. Reviewed By: wenlei, maksfb Differential Revision: https://reviews.llvm.org/D135912 Differential Revision: https://reviews.llvm.org/D135914 Differential Revision: https://reviews.llvm.org/D136394	2022-10-27 13:28:22 -07:00
wlei	467652486f	[llvm-profgen] Fix inconsistent loading address issues This is to fix two issues related with loading address: 1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address. 2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong. Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D126827	2022-10-13 23:19:30 -07:00
wlei	1ea43e601a	remove the internal signatures from perf binaries	2022-09-19 22:32:31 -07:00
Hongtao Yu	acfd0a3456	[llvm-profgen] Update callsite body samples by summing up all call target samples. Current profile generation caculcates callsite body samples and call target samples separately. The former is done based on LBR range samples while the latter is done based on branch samples. Note that there's a subtle difference. LBR ranges is formed from two consecutive branch samples. Therefore the last entry in a LBR record will not be counted towards body samples while there's still a chance for it to be counted towards call targets if it is a function call. I'm making sense of the call body samples by updating it to the aggregation of call targets. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122609	2022-05-16 09:13:37 -07:00
Hongtao Yu	1662cfa4be	[CSSPGO][CSProfileConverter] Remove call target samples when including callee samples into caller. When a flat CS profile is converted to a nested profile, the call target samples for inlined callee contexts are left over in the callsite target map. This could cause indirect call promotion to function improperly. One issue is that the inlined callsites are treated with double amount of samples. The other is the inlined callsites are reconsidered for subsequent PGO ICP. I'm fixing this by excluding call targets from the callsite for inlined targets. While fixing this I found that callsite target sum and the number of body samples for that callsite could be mismatched. {D122609} has an explanation and a fix for that on llvm-profgen side. For now I'm tolerating it in this change. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D125266	2022-05-13 09:19:32 -07:00
Hongtao Yu	9f732af583	[llvm-profgen] Filter out oversized LBR ranges. As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid. For example, the last two pairs in the following trace form a range [0x0d7b02b0, 0x368ba706] that covers a ton of functions in the binary. Such oversized range should also be ignored. 0x0c74505f/0x368b99a0 0x368ba706/0x0c745040 0x0d7b1c3f/0x0d7b02b0 Add a defensive check to filter out those ranges based that the valid range should not cross the unconditional branch(Call, return, unconditional jmp). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D125448	2022-05-12 10:58:50 -07:00
Hongtao Yu	23191a4ffe	[CSSPGO][llvm-profgen] Do not duplicate context profiles into base profile when converting CS flat profile to nested. Recent experiments with our two large internal services showed that duplicating context profiles into base profile caused code size inflation and didn't deliver good performance compared to no such duplication. It was a trick we made to catch up with the CS flat profile and I'm now turning it off by default. The code size inflation mainly comes from the enriched based profiles. A base profile for a function represents the uninlined (or outlined) portion of the whole function running time. Such portion could be very small if a function is inlined into most of its hot callsites. Duplicating context profiles of the function into its base profiles could cause the outlined body to be hot enough and in turn get many of its callees inlined, thus increases the code size. The size inflation could further cause perf regression. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124796	2022-05-12 09:29:25 -07:00
Hongtao Yu	e36786d15f	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602	2022-04-29 17:03:52 -07:00
wlei	bfcb2c1119	[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues This patch is fixing two issues for both CS and non-CS. 1) For external-call-internal, the head samples of the the internal function should be recorded. 2) avoid ignoring LBR after meeting the interrupt branch for CS profile LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder. Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context. The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D118177	2022-04-28 16:07:28 -07:00
Arthur Eubanks	61d418f971	[test] Remove references to -fexperimental-new-pass-manager in tests This has been the default for a while and we're in the process of removing the legacy PM optimization pipeline.	2022-04-11 13:29:08 -07:00
Hongtao Yu	8a0406dcc8	[llvm-profgen] Filter out invalid LBR ranges. The profiler can sometimes give us a LBR trace that implicates bogus code ranges. For example, 0xc5acb56/0xc66c6c0 0xc628195/0xf31fbb0 0xc611261/0xc628130 0xc5c1a21/0xc6111c0 0x1f7edfd3/0xc5c3a50 0xc5c154f/0x1f7edec0 0xe8eed07/0xc5c11e0 , note that the first two pairs are supposed to form a linear execution range, in this case, it is [0xf31fbb0, 0xc5acb56] , which doesn't make sense. Such bogus ranges should be ruled out to avoid generating a bad profile. I'm fixing this for both CS and non-CS cases. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D123271	2022-04-07 21:42:01 -07:00
Hongtao Yu	937924eb49	[llvm-profgen] Read sample profiles for post-processing. Sometimes we would like to run post-processing repeatedly on the original sample profile for tuning. In order to avoid regenerating the original profile from scratch every time, this change adds the support of reading in the original profile (called symbolized profile) and running the post-processor on it. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D121655	2022-03-30 13:51:16 -07:00
Hongtao Yu	3f97016857	[llvm-profgen] Decoding pseudo probe for profiled function only. Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled. To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain. I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding. I'm seeing 20GB memory is saved for one of our internal large service. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D121643	2022-03-23 14:15:11 -07:00
Hongtao Yu	bc380c0930	[llvm-profgen] Turn on CS nested profile generation by default for CSSPGO. CS nested profile has a benefit over the CS flat profile that is to speed up the build while achieve an on-par performance. I'm turning it on by default for CSSPGO. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D121142	2022-03-08 09:05:27 -08:00
Hongtao Yu	23391febd8	[llvm-profgen] Generating probe-based non-CS profile. I'm bring up the support of pseudo-probe-based non-CS profile generation. The approach is quite similar to generating dwarf-based non-CS profile. The main difference is for a given linear instruction range, instead of each disassembled instruction, pseudo probes that are covered by the range are processed. The pseudo probe extraction code is shared with CS probe profile generation. I'm seeing 0.7% performance win for one of our internal large benchmark compared to using non-CS dwarf-based profile, and 0.5% win for another large benchmark when combined with profi. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D120335	2022-03-01 18:49:08 -08:00
wlei	b3a778fb5e	[llvm-profgen] Support symbol loading for debug fission Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the `getNonSkeletonUnitDIE(false)` API to achieve this. Add test cause to make sure all the ranges is well retrieved by the loader. Reviewed By: ayermolo, hoy, wenlei Differential Revision: https://reviews.llvm.org/D115973	2022-02-23 09:40:46 -08:00
Hongtao Yu	f0f70ae674	[CSSPGO] Do not recount callee samples when computing profile summary for nested CS profile. When generating nested CS profile with all calling contexts of a function duplicated into a base profile under `--generate-merged-base-profiles`, do not recount callee samples when computing profile summary. This fixes the profile summary mismatch between flat cs profile and nested cs profile, for both extbinary and text format. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119494	2022-02-11 09:05:51 -08:00
wlei	6693c562f9	[llvm-profgen] Support to load debug info from a second binary For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug). Here add support in llvm-profgen to use two binaries as input. The original one is executable binary and added for debug info only binary. Adding a flag `--debug-binary=file-path`, with this, the binary will load debug info from debug binary. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115948	2022-01-24 17:14:05 -08:00
Wenlei He	f6f0409f6f	[llvm-profgen] Turn on preinliner by default preinliner has been tuned on large server workloads and it's not ready to be turned on by default. this change also updates the thresholds based on tuning. Differential Revision: https://reviews.llvm.org/D115770	2021-12-14 17:46:57 -08:00
wlei	0f53df864e	[CSSPGO][llvm-profgen] Fix external address issues of perf reader (return to external addr part) Before we have an issue with artificial LBR whose source is a return, recalling that "an internal code(A) can return to external address, then from the external address call a new internal code(B), making an artificial branch that looks like a return from A to B can confuse the unwinder". We just ignore the LBRs after this artificial LBR which can miss some samples. This change aims at fixing this by correctly unwinding them instead of ignoring them. List some typical scenarios covered by this change. 1) multiple sequential call back happen in external address, e.g. ``` [ext, call, foo] [foo, return, ext] [ext, call, bar] ``` Unwinder should avoid having foo return from bar. Wrong call stack is like [foo, bar] 2) the call stack before and after external call should be correctly unwinded. ``` {call stack1} {call stack2} [foo, call, ext] [ext, call, bar] [bar, return, ext] [ext, return, foo ] ``` call stack 1 should be the same to call stack2. Both shouldn't be truncated 3) call stack should be truncated after call into external code since we can't do inlining with external code. ``` [foo, call, ext] [ext, call, bar] [bar, call, baz] [baz, return, bar ] [bar, return, ext] ``` the call stack of code in baz should not include foo. ### Implementation: We leverage artificial frame to fix #2 and #3: when we got a return artificial LBR, push an extra artificial frame to the stack. when we pop frame, check if the parent is an artificial frame to pop(fix #2). Therefore, call/ return artificial LBR is just the same as regular LBR which can keep the call stack. While recording context on the trie, artificial frame is used as a tag indicating that we should truncate the call stack(fix #3). To differentiate #1 and #2, we leverage `getCallAddrFromFrameAddr`. Normally the target of the return should be the next inst of a call inst and `getCallAddrFromFrameAddr` will return the address of call inst. Otherwise, getCallAddrFromFrameAddr will return to 0 which is the case of #1. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115550	2021-12-14 16:40:54 -08:00
wlei	3dcb60db9a	[CSSPGO][llvm-profgen] Fix external address issues of perf reader (leading external LBR part) We can have the sampling just hit into the external addresses, in that case, both the top stack frame and the latest LBR target are external addresses. For example: ``` ffffffff 0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0 ffffffff 40067e 0xffffffff/0xffffffff/P/-/-/0 0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0 ``` Before we will ignore the entire samples. However, we found there exists some internal LBRs in the remaining part of sample, the range between them is still a valid range, we will lose some valid LBRs. Those LBRs will be unwinded based on a empty(context-less) call stack. This change tries to fix it, instead of ignoring the entire sample, we only ignore the leading external addresses. Note that the first outgoing LBR is useful since there is a valid range between it's source and next LBR's target. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115538	2021-12-14 16:40:53 -08:00
wlei	3220571793	[llvm-profgen] Skip disassembling for PLT section Skip disassembling .plt section, then .plt section code will be treated as external code. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115699	2021-12-14 16:40:53 -08:00
Hongtao Yu	5740bb801a	[CSSPGO] Use nested context-sensitive profile. CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205	2021-12-14 14:40:25 -08:00
wlei	7d62b68abc	[llvm-profgen] remove check Attributes to fix build failure	2021-12-08 14:13:14 -08:00
wlei	057b0430af	[llvm-profgen] fix build failure in cs-extbinary.test	2021-12-08 13:18:26 -08:00
wlei	5d66113afc	[llvm-profgen] fix to use profile-summary-hot-count instead of profile-summary-cold-count for CS profile	2021-12-08 12:54:07 -08:00
wlei	484a569eea	[llvm-profgen] Fix total samples related issues Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool. Three changes in this diff: 1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch. 2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold. 3. Change to accumulate total samples per byte instead of per instruction. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115013	2021-12-08 12:33:41 -08:00
wlei	27cb3707db	[llvm-profgen] Trim cold function profiles for non-CS AutoFDO This change allows to trim the profile if it's considered to be cold for baseline AutoFDO. We reuse the cold threshold from `ProfileSummaryBuilder::getColdCountThreshold(..)` which can be set by percent(--profile-summary-cutoff-cold) or by value(--profile-summary-cold-count). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113785	2021-12-08 12:20:50 -08:00
wlei	41a681ce09	[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen: 1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary. 2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`. 3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData. Reviewed By: xur, hoy, wenlei Differential Revision: https://reviews.llvm.org/D113296	2021-11-30 15:57:59 -08:00
wlei	c2e08aba1a	[llvm-profgen] Compute and show profile density AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler. We define the density of a profile Prof as follows: - For each function A in the profile, density(A) = total_samples(A) / sizeof(A). - density(Prof) = min(density(A)) for all functions A that are warm (defined below). A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the `ProfileSummaryBuilder::getHotCountThreshold(..)` as threshold which can be set by percent(`--profile-summary-cutoff-hot`) or by value(`--profile-summary-hot-count`). We also introduce `--hot-function-density-threshold` to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples. This also applies for CS profile with all profiles merged into base. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113781	2021-11-29 23:54:31 -08:00
Wenlei He	f7976edc1e	[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools. Differential Revision: https://reviews.llvm.org/D113727	2021-11-15 19:00:27 -08:00
wlei	5bf191a381	[llvm-profgen] Fix index out of bounds error while using ip.advance Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113238	2021-11-05 18:38:40 -07:00
wlei	138202a8c3	[llvm-profgen] Warn on invalid range and show warning summary Two things in this diff: 1) Warn on the invalid range, currently three types of checking, see the detailed message in the code. 2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues. Example of warning summary. ``` warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction. warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function. ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D111902	2021-11-02 19:55:55 -07:00
wlei	3f3103c6a9	[llvm-profgen] Fill zero count for all function ranges Allow filling zero count for all the function ranges even there is no samples hitting that function. Add a switch for this. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D112858	2021-11-01 09:57:05 -07:00
wlei	f5537643b8	[llvm-profgen] Update total samples by accumulating all its body samples Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D112672	2021-10-29 10:36:57 -07:00
wlei	2f8196db92	[llvm-profgen] Fix bug of populating profile symbol list Previous implementation of populating profile symbol list is wrong, it only included the profiled symbols. Actually it should use all symbols, here this switches to use the symbols from debug info. Also turned the flag off by default. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D111824	2021-10-29 09:59:12 -07:00
Hongtao Yu	259e4c5658	[CSSPGO] Trim cold base profiles for the CS preinliner. Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D112489	2021-10-27 22:50:27 -07:00
wlei	a5f411b7f8	[llvm-profgen] Allow unsymbolized profile as perf input This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this. Format of unsymbolized profile: ``` [context stack1] # If it's a CS profile number of entries in RangeCounter from_1-to_1:count_1 from_2-to_2:count_2 ...... from_n-to_n:count_n number of entries in BranchCounter src_1->dst_1:count_1 src_2->dst_2:count_2 ...... src_n->dst_n:count_n [context stack2] ...... ``` Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D111750	2021-10-25 23:58:08 -07:00
Wenlei He	e8c245dcd3	[llvm-profgen] Skip duplication factor outside of body sample computation We incorrectly use duplication factor for total samples even though we already accumulate samples instead of taking MAX. It causes profile to have bloated total samples for functions with loop unrolled or vectorized. The change fix the issue for total sample, head sample and call target samples. Differential Revision: https://reviews.llvm.org/D112042	2021-10-19 23:10:45 -07:00
Wenlei He	a316343e19	[llvm-profgen] Allow generating AutoFDO profile from CSSPGO binary Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe). Differential Revision: https://reviews.llvm.org/D111776	2021-10-14 09:11:56 -07:00
wlei	30ca33eab0	[llvm-profgen] Ignore the whole trace with the leading external branch The first LBR entry can be an external branch, we should ignore the whole trace. ``` 7f7448e889e4 0x7f7448e889e4/0x7f7448e88826/P/-/-/1 0x7f7448e8899f/0x7f7448e889d8/P/-/-/4 ... ``` Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D111749	2021-10-13 16:52:29 -07:00
wlei	b1a45c62f0	[llvm-profgen] Ignore branch count against outline function For some transformations like hot-cold split or coro split, it can outline its part of function ranges. Since sample loader is the early stage of backend and no split happens at that time, compiler can't recognize those function, so in llvm-profgen we should attribute the sample to the original function. This is already done for the body range samples since we use the symbols from dwarf which is created before the split. But for branch samples, the call from master function to its outlined function is actually not a call to the original function, we shouldn't add head/callsie samples for it. So instead of dwarf symbol, we use the symbols from symbol table and ignore those functions with special suffixes(like `.cold` ,`.resume`) for accumulating the callsite/head samples. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110864	2021-10-07 14:03:34 -07:00
wlei	16516f8925	[llvm-profgen] Support symbol list for accurate profile Differential Revision: https://reviews.llvm.org/D110859	2021-10-06 11:41:39 -07:00
wlei	31a5cb3292	[llvm-profgen] Filter out invalid debug line Differential Revision: https://reviews.llvm.org/D110081	2021-10-04 19:09:06 -07:00
wlei	46cf7d75d9	[llvm-profgen] Add duplication factor for line-number based profile This change adds duplication factor multiplier while accumulating body samples for line-number based profile. The body sample count will be `duplication-factor * count`. Base discriminator and duplication factor is decoded from the raw discriminator, this requires some refactor works. Differential Revision: https://reviews.llvm.org/D109934	2021-10-04 19:08:55 -07:00
wlei	a03cf331e1	[llvm-profgen] Strip context to support non-CS profile generation for hybrid sample Differential Revision: https://reviews.llvm.org/D109769	2021-09-28 12:20:23 -07:00
wlei	1422fa5fab	[llvm-profgen] Unify output format of different unsymbolized profiles Differential Revision: https://reviews.llvm.org/D110080	2021-09-24 14:18:00 -07:00
wlei	28277e9b48	[AutoFDO][llvm-profgen] Report zero count for unexecuted part of function code In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function. This change also extends the current `findDisjointRanges` method which now can support adding zero-value range. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D109713	2021-09-24 14:15:05 -07:00

1 2

100 Commits