Commit Graph

10 Commits

Author SHA1 Message Date
Hongtao Yu bc380c0930 [llvm-profgen] Turn on CS nested profile generation by default for CSSPGO.
CS nested profile has a benefit over the CS flat profile that is to speed up the build while achieve an on-par performance. I'm turning it on by default for CSSPGO.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D121142
2022-03-08 09:05:27 -08:00
Wenlei He f6f0409f6f [llvm-profgen] Turn on preinliner by default
preinliner has been tuned on large server workloads and it's not ready to be turned on by default. this change also updates the thresholds based on tuning.

Differential Revision: https://reviews.llvm.org/D115770
2021-12-14 17:46:57 -08:00
wlei 484a569eea [llvm-profgen] Fix total samples related issues
Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool.

Three changes in this diff:

1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch.

2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold.

3. Change to accumulate total samples per byte instead of per instruction.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D115013
2021-12-08 12:33:41 -08:00
wlei f5537643b8 [llvm-profgen] Update total samples by accumulating all its body samples
Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D112672
2021-10-29 10:36:57 -07:00
wlei 31a5cb3292 [llvm-profgen] Filter out invalid debug line
Differential Revision: https://reviews.llvm.org/D110081
2021-10-04 19:09:06 -07:00
Hongtao Yu bd52495518 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Hongtao Yu cef9b96b01 [CSSPGO] Report zero-count probe in profile instead of dangling probes.
Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:

1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.

2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.

3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.

5. Better readability.

6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.

Note that the current patch does include any work for #3. There will be follow-up changes.

For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of  LBRProfileSection is reduced by 35%.

For #4, I have seen general counts quality for SPEC2017 is improved by 10%.

Reviewed By: wenlei, wlei, wmi

Differential Revision: https://reviews.llvm.org/D104129
2021-06-16 11:45:29 -07:00
Wenlei He aaa826fac1 [CSSPGO][llvm-profgen] Make extended binary the default output format
Make extended binary the default output format for CSSPGO. This avoids having to pass flag every time when generating profile. It also matches llvm-profdata where binary profile is the default (should we switch to extbinary as default for llvm-profdata?).

We plan to compress name table for context profile, which depends on the built-in compression of extbinary.

Differential Revision: https://reviews.llvm.org/D103650
2021-06-03 17:58:16 -07:00
Wenlei He fa14fd30ce [CSSPGO][llvm-profgen] Change default cold threshold for context merging
llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance.

We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen.

Redundant switch csprof-cold-thres is also removed and tests cleaned up.

Differential Revision: https://reviews.llvm.org/D103071
2021-05-25 10:41:10 -07:00
wlei dddd590fd0 [CSSPGO][llvm-profgen] Fix getCanonicalFnName usage in llvm-profgen
Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it.

Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding.

Differential Revision: https://reviews.llvm.org/D98226
2021-03-15 21:00:42 -07:00