This patch teaches llvm-profdata to output the sample profile in the
JSON format. The new option is intended to be used for research and
development purposes. For example, one can write a Python script to
take a JSON file and analyze how similar different inline instances of
a given function are to each other.
I've chosen JSON because Python can parse it reasonably fast, and it
just takes a couple of lines to read the whole data:
import json
with open ('profile.json') as f:
profile = json.load(f)
Differential Revision: https://reviews.llvm.org/D130944
Add a check to detect that the profiled binary was build with position
independent code. Add a test with a pie binary to which can be reused
later when support is added. Also clean up the error messages with
trailing colons.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D128564
Until we have symbolization for position independent code lets update
this documentation since clang now defaults to position independent
code.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D127808
This change prints out the segment information in the raw profile in
YAML format for testing. Since we don't capture build ids yet, we print
out <None> for now.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D126840
Update the YAML format print out of the profile to include a summary
instead of displaying the headers in the raw file buffer. This allows us
to release the raw buffer early saving memory.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D126834
Extend the Frame struct to hold the symbol name if requested
when a RawMemProfReader object is constructed. This change updates the
tests and removes the need to pass --debug to obtain the mapping from
GUID to symbol names.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D126344
When a flat CS profile is converted to a nested profile, the call target samples for inlined callee contexts are left over in the callsite target map. This could cause indirect call promotion to function improperly. One issue is that the inlined callsites are treated with double amount of samples. The other is the inlined callsites are reconsidered for subsequent PGO ICP.
I'm fixing this by excluding call targets from the callsite for inlined targets. While fixing this I found that callsite target sum and the number of body samples for that callsite could be mismatched. {D122609} has an explanation and a fix for that on llvm-profgen side. For now I'm tolerating it in this change.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D125266
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D122602
Use `ProfileSummaryBuilder::DefaultCutoffs` for llvm-profdata detailed summary printing for Instr profile.
Differential Revision: https://reviews.llvm.org/D122210
To ease profile annotation, each of the callsites in a function can be
annotated with profile data - "IR metadata format for MemProf" [1]. This
patch extends the on-disk serialized record format to store the debug
information for allocation callsites incl inline frames. This change is
incompatible with the existing format i.e. indexed profiles must be
regenerated, raw profiles are unaffected.
[1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D121179
We add to ensure that we are observing the correct callstack order in memprof
during symbolization. There was some confusion whether the order of
DIFrame objects were reversed but in reality the leaf function is at
index 0 so no code changes are required.
Differential Revision: https://reviews.llvm.org/D121759
This patch filters out callstack frames which can't be symbolized or if
the frames belong to the runtime. Symbolization may not be possible if
debug information is unavailable or if the addresses are from a shared
library. For now we only support optimization of the main binary which
is statically linked to the compiler runtime.
Differential Revision: https://reviews.llvm.org/D120860
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
This commit also includes the changes reviewed separately in D120093.
Differential Revision: https://reviews.llvm.org/D120103
This reverts commit 85355a560a.
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
This reverts commit e6999040f5.
Update test to fix signed int comparison warning, fix whitespace in
compiler-rt MIBEntryDef.inc file.
Differential Revision: https://reviews.llvm.org/D117256
This reverts commit 0f73fb18ca.
Use llvm/Profile/MIBEntryDef.inc instead of relative path.
Generated the raw profile data with `-mllvm
-enable-name-compression=false` so that builbots where the reader is
built without zlib do not fail.
Also updated the test build instructions.
This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.
The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.
Differential Revision: https://reviews.llvm.org/D118653
Use the macro based format to add a wrapper around the MemInfoBlock
when stored in the MemProfRecord. This wrapped block can then be
serialized/deserialized based on a schema specified by a list of enums.
Differential Revision: https://reviews.llvm.org/D117256
When generating nested CS profile with all calling contexts of a function duplicated into a base profile under `--generate-merged-base-profiles`, do not recount callee samples when computing profile summary. This fixes the profile summary mismatch between flat cs profile and nested cs profile, for both extbinary and text format.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D119494
This change extends the RawMemProfReader to read all the sections of the
raw profile and symbolize the virtual addresses recorded as part of the
callstack for each allocation. For now the symbolization is used to
display the contents of the profile with llvm-profdata.
Differential Revision: https://reviews.llvm.org/D116784
Print out the profile summary in YAML format to make it easier to for
tools and tests to read in the contents of the raw profile.
Differential Revision: https://reviews.llvm.org/D116783
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
Use the `llvm-profdata show` command to verify debug info for profile correlation using the `--debug-info` option.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D118181
Existing code tended to assume that counters had type `uint64_t` and
computed size from the number of counters. Fix this code to directly
compute the counters size in number of bytes where possible. When the
number of counters is needed, use `__llvm_profile_counter_entry_size()`
or `getCounterTypeSize()`. In a later diff these functions will depend
on the profile mode.
Change the meaning of `DataSize` and `CountersSize` to make them more clear.
* `DataSize` (`CountersSize`) - the size of the data (counter) section in bytes.
* `NumData` (`NumCounters`) - the number of data (counter) entries.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116179
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12
2: 24
3: 28 _Z5funcAi:18
3.1: 28 _Z5funcBi:30
3: _Z5funcAi:1467398
0: 10
1: 10 _Z8funcLeafi:11
3: 24
1: _Z8funcLeafi:1467299
0: 6
1: 6
3: 287884
4: 287864 _Z3fibi:315608
15: 23
!CFGChecksum: 138828622701
!Attributes: 2
!CFGChecksum: 281479271677951
!Attributes: 2
```
Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
- Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
The first 8b of each raw profile section need to be aligned to 8b since
the first item in each section is a u64 count of the number of items in
the section.
Summary of changes:
* Assert alignment when reading counts.
* Update test to check alignment, relax some size checks to allow padding.
* Update raw binary inputs for llvm-profdata tests.
Differential Revision: https://reviews.llvm.org/D114826
The memprof profile reader tests rely on binary data which is generated
from and meant to be interpreted on little endian architectures. Add a
REQUIRES: x86_64-linux clause to both tests to ensure they don't fail on big
endian targets such as ppc.
This commit adds initial support to llvm-profdata to read and print
summaries of raw memprof profiles.
Summary of changes:
* Refactor shared defs to MemProfData.inc
* Extend show_main to display memprof profile summaries.
* Add a simple raw memprof profile reader.
* Add a couple of tests to tools/llvm-profdata.
Differential Revision: https://reviews.llvm.org/D114286
This patch fixes some issues introduced in
https://reviews.llvm.org/D108942:
1) Remove the default label to fix the bots that use
-Werror,-Wcovered-switch-default
2) Modify the malformed test to fix the bots that are
built without zlib support
3) Modify some error messages in malformed profiles
If profile data is malformed for any kind of reason, we generate
an error that only reports "malformed instrumentation profile data"
without any further information. This patch extends InstrProfError
class to receive an optional error message argument, so that we can
do better error reporting.
Differential Revision: https://reviews.llvm.org/D108942
This is to account for the change that made CountersPtr in __profd_
relative which landed in a1532ed275.
That change hasn't updated the raw profile version, and while the
profile layout stayed the same, profiles generated by tip-of-tree
LLVM are incompatible with 13.x tooling.
Differential Revision: https://reviews.llvm.org/D111123
Some tests with binary IDs would fail with error: no profile can be merged.
This is because raw profiles could have unaligned headers when emitting binary
IDs. This means padding should be emitted after binary IDs are emitted to
ensure everything else is aligned. This patch adds padding after each binary ID
to ensure the next binary ID size is 8-byte aligned. This also adds extra
checks to ensure we aren't reading corrupted data when printing binary IDs.
Differential Revision: https://reviews.llvm.org/D110365
format.
Currently when we add a new section in the profile format and generate a profile
containing the new section, older compiler which reads the new profile will
issue an error. The forward incompatibility can cause unnecessary churn when
extending the profile. This patch removes the incompatibility when adding a new
section for extbinary format.
Differential Revision: https://reviews.llvm.org/D109398
On some platform (eg: AIX), diff will complain about newline.
diff: Missing newline at the end of file
.../llvm/test/tools/llvm-profdata/Inputs/cs-sample.proftext.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299
We have "-profile-isfs" internal option for text, binary, and
compactbinary format (mostly for debug and test purpose). We
need to set the related flag in FunctionSamples so that ProfileIsFS
is written to the header in extbinary format.
Differential Revision: https://reviews.llvm.org/D108707