Commit Graph

7 Commits

Author SHA1 Message Date
Hongtao Yu bc380c0930 [llvm-profgen] Turn on CS nested profile generation by default for CSSPGO.
CS nested profile has a benefit over the CS flat profile that is to speed up the build while achieve an on-par performance. I'm turning it on by default for CSSPGO.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D121142
2022-03-08 09:05:27 -08:00
Wenlei He f6f0409f6f [llvm-profgen] Turn on preinliner by default
preinliner has been tuned on large server workloads and it's not ready to be turned on by default. this change also updates the thresholds based on tuning.

Differential Revision: https://reviews.llvm.org/D115770
2021-12-14 17:46:57 -08:00
Hongtao Yu 5740bb801a [CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.

For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts,  since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.

In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.

A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes,  a nested profile shown as below can also have a CFG checksum.

```

main:1968679:12
 2: 24
 3: 28 _Z5funcAi:18
 3.1: 28 _Z5funcBi:30
 3: _Z5funcAi:1467398
  0: 10
  1: 10 _Z8funcLeafi:11
  3: 24
  1: _Z8funcLeafi:1467299
   0: 6
   1: 6
   3: 287884
   4: 287864 _Z3fibi:315608
   15: 23
   !CFGChecksum: 138828622701
   !Attributes: 2
  !CFGChecksum: 281479271677951
  !Attributes: 2
```

Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
 - Changes in the sample loader to support probe-based nested profile.

I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.

Test Plan:

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D115205
2021-12-14 14:40:25 -08:00
Hongtao Yu d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Hongtao Yu cef9b96b01 [CSSPGO] Report zero-count probe in profile instead of dangling probes.
Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:

1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.

2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.

3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.

5. Better readability.

6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.

Note that the current patch does include any work for #3. There will be follow-up changes.

For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of  LBRProfileSection is reduced by 35%.

For #4, I have seen general counts quality for SPEC2017 is improved by 10%.

Reviewed By: wenlei, wlei, wmi

Differential Revision: https://reviews.llvm.org/D104129
2021-06-16 11:45:29 -07:00
Wenlei He aaa826fac1 [CSSPGO][llvm-profgen] Make extended binary the default output format
Make extended binary the default output format for CSSPGO. This avoids having to pass flag every time when generating profile. It also matches llvm-profdata where binary profile is the default (should we switch to extbinary as default for llvm-profdata?).

We plan to compress name table for context profile, which depends on the built-in compression of extbinary.

Differential Revision: https://reviews.llvm.org/D103650
2021-06-03 17:58:16 -07:00
Hongtao Yu 3b51b51877 [CSSPGO][llvm-profgen] Report samples for untrackable frames.
Fixing an issue where samples collected for an untrackable frame is not reported. An untrackable frame refers to a frame whose caller is untrackable due to missing debug info or pseudo probe. Though the frame is connected to its parent frame through the frame pointer chain at runtime, the compiler cannot build the connection without debug info or pseudo probe. In such case we just need to report the untrackable frame as the base frame and all of its child frames.

With more samples reported I'm seeing this improves the performance of an internal benchmark by 2.5%.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D102961
2021-05-24 12:39:12 -07:00