llvm-project

Commit Graph

Author	SHA1	Message	Date
Rong Xu	077baefc99	[llvm-profdata] Use flattening sample profile in profile supplementation We need to flatten the SampleFDO profile in profile supplementation because the InstrFDO profile does not have inlined callsite counters. Without flattening profile, FDO optimizations are not stable: we will not supplement the second generation profile when the modified functions are all inlined. This patch fixes this issue: we will flatten the profile for functions that appears in FDO profile. Note that we only need to find the hot/warm functions in SampleFDO profile, so we will not perform a full flatten. We will use a DFS traversal to compute the accumulated entry count and max bodycount. This is much cheaper than full flattening. Differential Revision: https://reviews.llvm.org/D138893	2022-11-29 22:23:47 -08:00
Ellis Hoag	ea607d033a	[llvm-profdata] Rename show flag to --show-format In https://reviews.llvm.org/D135127 we created the show flag `--output-format` which was confusing because it behaved differently than the same flag in the merge command. So, rename the flag to `--show-format`. This also allows us to add the `text` option to mean "normal text output" rather than "text-encoded profiles" like it does for the merge command. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D135467	2022-10-07 11:35:07 -07:00
Ellis Hoag	901f555eca	[llvm-profdata] Add --output-format option Add `--output-format` option for the `llvm-profdata show` command to select the type of output. The existing `--text` flag is used to emit text encoded profiles. To avoid confusion, `--output-format=text-encoding` indicates that the output will be profiles encoded in the text format, and `--output-format=text` indicates the default text output that doesn't necessarily represent a profile. `--output-format=json` is an alias for `--json` and `--output-format=yaml` will be used in D134770. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D135127	2022-10-07 09:47:23 -07:00
Rong Xu	d7ef0c3970	[llvm-profdata] Improve profile supplementation Current implementation promotes a non-cold function in the SampleFDO profile into a hot function in the FDO profile. This is too aggressive. This patch promotes a hot functions in the SampleFDO profile into a hot function, and a warm function in SampleFDO into a warm function in FDO. Differential Revision: https://reviews.llvm.org/D132601	2022-08-29 16:50:42 -07:00
Rong Xu	db18f26567	[llvm-profdata] Handle internal linkage functions in profile supplementation This patch has the following changes: (1) Handling of internal linkage functions (static functions) Static functions in FDO have a prefix of source file name, while they do not have one in SampleFDO. Current implementation does not handle this and we are not updating the profile for static functions. This patch fixes this. (2) Handling of -funique-internal-linakge-symbols Again this is for the internal linkage functions. Option -funique-internal-linakge-symbols can now be applied to both FDO and SampleFDO compilation. When it is used, it demangles internal linkage function names and adds a hash value as the postfix. When both SampleFDO and FDO profiles use this option, or both not use this option, changes in (1) should handle this. Here we also handle when the SampleFDO profile using this option while FDO profile not using this option, or vice versa. There is one case where this patch won't work: If one of the profiles used mangled name and the other does not. For example, if the SampleFDO profile uses clang c-compiler and without -funique-internal-linakge-symbols, while the FDO profile uses -funique-internal-linakge-symbols. The SampleFDO profile contains unmangled names while the FDO profile contains mangled names. If both profiles use c++ compiler, this won't happen. We think this use case is rare and does not justify the effort to fix. Differential Revision: https://reviews.llvm.org/D132600	2022-08-29 16:15:12 -07:00
Rong Xu	d22c5d0f55	[llvm-profdata] Adjust profile supplementation heuristics 1) We now use the count size in FDO as the main factor to deal with pre-inliner. Currently we use the number of sample records in the SampleFDO profile. But that only counts the top-level body sample records (not including the nested call-sites). We are seeing some big functions not being updated because of this. I think using the count size in FDO profile is more reasonable to judge if the function is likely to be inlined to the callers in pre-inliner. (2) We use getMaxCount in SampleFDO rather the HeadSample to determine if if the function is hot in SampleFDO. This is in-sync with the logic in the compiler (also HeadSample can be 0). Differential Revision: https://reviews.llvm.org/D132602	2022-08-29 14:17:27 -07:00
Eli Friedman	8f826fe723	Fix reverse-iteration buildbot. A couple of instances of iterating over maps snuck in while the bot was down; fix them to use maps with deterministic iteration.	2022-08-19 14:21:05 -07:00
Kazu Hirata	a044d0491e	[llvm-profdata] Support JSON as as an output-only format This patch teaches llvm-profdata to output the sample profile in the JSON format. The new option is intended to be used for research and development purposes. For example, one can write a Python script to take a JSON file and analyze how similar different inline instances of a given function are to each other. I've chosen JSON because Python can parse it reasonably fast, and it just takes a couple of lines to read the whole data: import json with open ('profile.json') as f: profile = json.load(f) Differential Revision: https://reviews.llvm.org/D130944	2022-08-09 16:24:53 -07:00
Snehasish Kumar	3a1a404ae2	[memprof] Return an error for unsupported symbolization. Add a check to detect that the profiled binary was build with position independent code. Add a test with a pie binary to which can be reused later when support is added. Also clean up the error messages with trailing colons. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D128564	2022-06-27 09:43:26 -07:00
Fangrui Song	103b28902f	[llvm-profdata][test] Change -Wl,-no-pie to -no-pie after D127808 The driver option -no-pie is preferred: Clang selects different crt*.o files, though the PIC one usually can replace the non-PIC one.	2022-06-15 10:46:37 -07:00
Snehasish Kumar	b0c51f00ae	[memprof] Update the test comments to include -Wl,-no-pie Until we have symbolization for position independent code lets update this documentation since clang now defaults to position independent code. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D127808	2022-06-15 16:49:19 +00:00
Snehasish Kumar	8a87f42fc6	[memprof] Print out the segment information in YAML format. This change prints out the segment information in the raw profile in YAML format for testing. Since we don't capture build ids yet, we print out <None> for now. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D126840	2022-06-02 02:26:39 +00:00
Snehasish Kumar	962db7de84	[memprof] Update summary output. Update the YAML format print out of the profile to include a summary instead of displaying the headers in the raw file buffer. This allows us to release the raw buffer early saving memory. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D126834	2022-06-02 02:15:42 +00:00
Snehasish Kumar	ec51971eae	[memprof] Keep and display symbol names in the RawMemProfReader. Extend the Frame struct to hold the symbol name if requested when a RawMemProfReader object is constructed. This change updates the tests and removes the need to pass --debug to obtain the mapping from GUID to symbol names. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D126344	2022-05-25 21:17:44 +00:00
Hongtao Yu	1662cfa4be	[CSSPGO][CSProfileConverter] Remove call target samples when including callee samples into caller. When a flat CS profile is converted to a nested profile, the call target samples for inlined callee contexts are left over in the callsite target map. This could cause indirect call promotion to function improperly. One issue is that the inlined callsites are treated with double amount of samples. The other is the inlined callsites are reconsidered for subsequent PGO ICP. I'm fixing this by excluding call targets from the callsite for inlined targets. While fixing this I found that callsite target sum and the number of body samples for that callsite could be mismatched. {D122609} has an explanation and a fix for that on llvm-profgen side. For now I'm tolerating it in this change. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D125266	2022-05-13 09:19:32 -07:00
Hongtao Yu	e36786d15f	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602	2022-04-29 17:03:52 -07:00
Wenlei He	0ca8ff4da1	[llvm-profdata] Unify default cutoffs for detailed summary printing Use `ProfileSummaryBuilder::DefaultCutoffs` for llvm-profdata detailed summary printing for Instr profile. Differential Revision: https://reviews.llvm.org/D122210	2022-03-23 14:38:53 -07:00
Snehasish Kumar	27a4f2545f	Reland "[memprof] Store callsite metadata with memprof records." This reverts commit `f4b794427e`. Reland with underlying msan issue fixed in D122260.	2022-03-22 14:40:02 -07:00
Mitch Phillips	f4b794427e	Revert "[memprof] Store callsite metadata with memprof records." This reverts commit `0d362c90d3`. Reason: Causes the MSan buildbot to fail (see comments on https://reviews.llvm.org/D121179 for more information	2022-03-21 15:59:13 -07:00
Snehasish Kumar	0d362c90d3	[memprof] Store callsite metadata with memprof records. To ease profile annotation, each of the callsites in a function can be annotated with profile data - "IR metadata format for MemProf" [1]. This patch extends the on-disk serialized record format to store the debug information for allocation callsites incl inline frames. This change is incompatible with the existing format i.e. indexed profiles must be regenerated, raw profiles are unaffected. [1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D121179	2022-03-21 13:58:29 -07:00
Snehasish Kumar	49c048add4	[memprof] Add a test to verify callstack order. We add to ensure that we are observing the correct callstack order in memprof during symbolization. There was some confusion whether the order of DIFrame objects were reversed but in reality the leaf function is at index 0 so no code changes are required. Differential Revision: https://reviews.llvm.org/D121759	2022-03-16 10:10:57 -07:00
Snehasish Kumar	11314f4059	[memprof] Filter out callstack frames which cannot be symbolized. This patch filters out callstack frames which can't be symbolized or if the frames belong to the runtime. Symbolization may not be possible if debug information is unavailable or if the addresses are from a shared library. For now we only support optimization of the main binary which is statically linked to the compiler runtime. Differential Revision: https://reviews.llvm.org/D120860	2022-03-04 11:10:08 -08:00
Snehasish Kumar	0a4184909a	Reland "[memprof] Extend the index prof format to include memory profiles." This patch adds support for optional memory profile information to be included with and indexed profile. The indexed profile header adds a new field which points to the offset of the memory profile section (if present) in the indexed profile. For users who do not utilize this feature the only overhead is a 64-bit offset in the header. The memory profile section contains (1) profile metadata describing the information recorded for each entry (2) an on-disk hashtable containing the profile records indexed via llvm::md5(function_name). We chose to introduce a separate hash table instead of the existing one since the indexing for the instrumented fdo hash table is based on a CFG hash which itself is perturbed by memprof instrumentation. This commit also includes the changes reviewed separately in D120093. Differential Revision: https://reviews.llvm.org/D120103	2022-02-17 22:09:52 -08:00
Snehasish Kumar	19bdf44d85	Revert "Reland "[memprof] Extend the index prof format to include memory profiles."" This reverts commit `807ba7aace`.	2022-02-17 15:51:04 -08:00
Snehasish Kumar	807ba7aace	Reland "[memprof] Extend the index prof format to include memory profiles." This reverts commit `85355a560a`. This patch adds support for optional memory profile information to be included with and indexed profile. The indexed profile header adds a new field which points to the offset of the memory profile section (if present) in the indexed profile. For users who do not utilize this feature the only overhead is a 64-bit offset in the header. The memory profile section contains (1) profile metadata describing the information recorded for each entry (2) an on-disk hashtable containing the profile records indexed via llvm::md5(function_name). We chose to introduce a separate hash table instead of the existing one since the indexing for the instrumented fdo hash table is based on a CFG hash which itself is perturbed by memprof instrumentation. Differential Revision: https://reviews.llvm.org/D118653	2022-02-17 13:14:17 -08:00
Snehasish Kumar	50713461d4	Reland "[memprof] Introduce a wrapper around MemInfoBlock." This reverts commit `e6999040f5`. Update test to fix signed int comparison warning, fix whitespace in compiler-rt MIBEntryDef.inc file. Differential Revision: https://reviews.llvm.org/D117256	2022-02-14 19:04:36 -08:00
Snehasish Kumar	e6999040f5	Revert "[memprof] Introduce a wrapper around MemInfoBlock." This reverts commit `9b67165285`. [3/4]	2022-02-14 11:42:58 -08:00
Snehasish Kumar	85355a560a	Revert "Reland "[memprof] Extend the index prof format to include memory profiles."" This reverts commit `de54e4ab78` [1/4]	2022-02-14 11:42:58 -08:00
Snehasish Kumar	de54e4ab78	Reland "[memprof] Extend the index prof format to include memory profiles." This reverts commit `0f73fb18ca`. Use llvm/Profile/MIBEntryDef.inc instead of relative path. Generated the raw profile data with `-mllvm -enable-name-compression=false` so that builbots where the reader is built without zlib do not fail. Also updated the test build instructions.	2022-02-14 10:52:13 -08:00
Snehasish Kumar	0f73fb18ca	Revert "[memprof] Extend the index prof format to include memory profiles." This reverts commit `43c2348c5b`. Buildbots are failing with an error on reading memprof testdata. "Inputs/basic.profraw: profile uses zlib compression but the profile reader was built without zlib support" https://lab.llvm.org/buildbot/#/builders/16/builds/24490	2022-02-14 10:25:01 -08:00
Snehasish Kumar	43c2348c5b	[memprof] Extend the index prof format to include memory profiles. This patch adds support for optional memory profile information to be included with and indexed profile. The indexed profile header adds a new field which points to the offset of the memory profile section (if present) in the indexed profile. For users who do not utilize this feature the only overhead is a 64-bit offset in the header. The memory profile section contains (1) profile metadata describing the information recorded for each entry (2) an on-disk hashtable containing the profile records indexed via llvm::md5(function_name). We chose to introduce a separate hash table instead of the existing one since the indexing for the instrumented fdo hash table is based on a CFG hash which itself is perturbed by memprof instrumentation. Differential Revision: https://reviews.llvm.org/D118653	2022-02-14 09:53:45 -08:00
Snehasish Kumar	9b67165285	[memprof] Introduce a wrapper around MemInfoBlock. Use the macro based format to add a wrapper around the MemInfoBlock when stored in the MemProfRecord. This wrapped block can then be serialized/deserialized based on a schema specified by a list of enums. Differential Revision: https://reviews.llvm.org/D117256	2022-02-14 09:53:45 -08:00
Hongtao Yu	f0f70ae674	[CSSPGO] Do not recount callee samples when computing profile summary for nested CS profile. When generating nested CS profile with all calling contexts of a function duplicated into a base profile under `--generate-merged-base-profiles`, do not recount callee samples when computing profile summary. This fixes the profile summary mismatch between flat cs profile and nested cs profile, for both extbinary and text format. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119494	2022-02-11 09:05:51 -08:00
Snehasish Kumar	216575e581	Revert "Revert "[ProfileData] Read and symbolize raw memprof profiles."" This reverts commit `dbf47d227d`. Reapply https://reviews.llvm.org/D116784 now that https://reviews.llvm.org/D118413 has landed with a couple of fixes: * fix raw profile reader unaligned access identified by ubsan * fix windows build by using MOCK_CONST_METHOD3 instead of MOCK_METHOD.	2022-02-08 13:37:27 -08:00
Snehasish Kumar	dbf47d227d	Revert "[ProfileData] Read and symbolize raw memprof profiles." This reverts commit `26f978d4c5`. This patch added a transitive dependency on libcurl via symbolize. See discussion https://reviews.llvm.org/D116784#inline-1137928 https://reviews.llvm.org/D113717#3295350	2022-02-03 16:14:05 -08:00
Snehasish Kumar	26f978d4c5	[ProfileData] Read and symbolize raw memprof profiles. This change extends the RawMemProfReader to read all the sections of the raw profile and symbolize the virtual addresses recorded as part of the callstack for each allocation. For now the symbolization is used to display the contents of the profile with llvm-profdata. Differential Revision: https://reviews.llvm.org/D116784	2022-02-03 14:33:50 -08:00
Snehasish Kumar	14f4f63af5	[memprof] Print out the summary in YAML format. Print out the profile summary in YAML format to make it easier to for tools and tests to read in the contents of the raw profile. Differential Revision: https://reviews.llvm.org/D116783	2022-02-03 14:33:50 -08:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Ellis Hoag	c9baa5608b	[InstrProf][Correlate] Verify debug info with llvm-profdata show Use the `llvm-profdata show` command to verify debug info for profile correlation using the `--debug-info` option. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D118181	2022-01-27 10:11:04 -08:00
Hongtao Yu	ff0b634d97	[CSSPGO] Print "context-nested" instead of "preilnined" for ProfileSummarySection. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D117141	2022-01-18 18:10:42 -08:00
Ellis Hoag	f21473752b	[InstrProf][NFC] Do not assume size of counter type Existing code tended to assume that counters had type `uint64_t` and computed size from the number of counters. Fix this code to directly compute the counters size in number of bytes where possible. When the number of counters is needed, use `__llvm_profile_counter_entry_size()` or `getCounterTypeSize()`. In a later diff these functions will depend on the profile mode. Change the meaning of `DataSize` and `CountersSize` to make them more clear. * `DataSize` (`CountersSize`) - the size of the data (counter) section in bytes. * `NumData` (`NumCounters`) - the number of data (counter) entries. Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116179	2022-01-14 11:29:11 -08:00
Hongtao Yu	5740bb801a	[CSSPGO] Use nested context-sensitive profile. CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts. For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned. In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed. A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum. ``` main:1968679:12 2: 24 3: 28 _Z5funcAi:18 3.1: 28 _Z5funcBi:30 3: _Z5funcAi:1467398 0: 10 1: 10 _Z8funcLeafi:11 3: 24 1: _Z8funcLeafi:1467299 0: 6 1: 6 3: 287884 4: 287864 _Z3fibi:315608 15: 23 !CFGChecksum: 138828622701 !Attributes: 2 !CFGChecksum: 281479271677951 !Attributes: 2 ``` Specific work included in this change: - A recursive profile converter to convert CS flat profile to nested profile. - Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile. - Unifiy sample loader inliner path for CS and preinlined nested profile. - Changes in the sample loader to support probe-based nested profile. I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D115205	2021-12-14 14:40:25 -08:00
Snehasish Kumar	3a4d373ec2	[memprof] Align each rawprofile section to 8b. The first 8b of each raw profile section need to be aligned to 8b since the first item in each section is a u64 count of the number of items in the section. Summary of changes: * Assert alignment when reading counts. * Update test to check alignment, relax some size checks to allow padding. * Update raw binary inputs for llvm-profdata tests. Differential Revision: https://reviews.llvm.org/D114826	2021-11-30 20:12:43 -08:00
Snehasish Kumar	86d5dc9afc	[memprof] Disallow memprof profile reader tests on non-x86 archs. The memprof profile reader tests rely on binary data which is generated from and meant to be interpreted on little endian architectures. Add a REQUIRES: x86_64-linux clause to both tests to ensure they don't fail on big endian targets such as ppc.	2021-11-30 12:27:06 -08:00
Snehasish Kumar	7cca33b40f	[memprof] Extend llvm-profdata to display MemProf profile summaries. This commit adds initial support to llvm-profdata to read and print summaries of raw memprof profiles. Summary of changes: * Refactor shared defs to MemProfData.inc * Extend show_main to display memprof profile summaries. * Add a simple raw memprof profile reader. * Add a couple of tests to tools/llvm-profdata. Differential Revision: https://reviews.llvm.org/D114286	2021-11-30 10:45:26 -08:00
Gulfem Savrun Yeniceri	126e7611c7	[compiler-rt] Fix diagnostic in InstrProfError This patch fixes some issues introduced in https://reviews.llvm.org/D108942: 1) Remove the default label to fix the bots that use -Werror,-Wcovered-switch-default 2) Modify the malformed test to fix the bots that are built without zlib support 3) Modify some error messages in malformed profiles	2021-11-09 20:30:03 +00:00
Gulfem Savrun Yeniceri	ee88b8d63e	[compiler-rt] Add more diagnostic to InstrProfError If profile data is malformed for any kind of reason, we generate an error that only reports "malformed instrumentation profile data" without any further information. This patch extends InstrProfError class to receive an optional error message argument, so that we can do better error reporting. Differential Revision: https://reviews.llvm.org/D108942	2021-11-09 18:04:12 +00:00
Hongtao Yu	d0eb472f33	[llvm-profdata] Print out section flags for FunctionMetadata section As titled. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D113064	2021-11-02 17:59:22 -07:00
Petr Hosek	24c615fa6b	[InstrProfData] Bump the raw profile version to 8 This is to account for the change that made CountersPtr in __profd_ relative which landed in `a1532ed275`. That change hasn't updated the raw profile version, and while the profile layout stayed the same, profiles generated by tip-of-tree LLVM are incompatible with 13.x tooling. Differential Revision: https://reviews.llvm.org/D111123	2021-10-05 09:57:56 -07:00
Leonard Chan	b9f547e8e5	[llvm][profile] Add padding after binary IDs Some tests with binary IDs would fail with error: no profile can be merged. This is because raw profiles could have unaligned headers when emitting binary IDs. This means padding should be emitted after binary IDs are emitted to ensure everything else is aligned. This patch adds padding after each binary ID to ensure the next binary ID size is 8-byte aligned. This also adds extra checks to ensure we aren't reading corrupted data when printing binary IDs. Differential Revision: https://reviews.llvm.org/D110365	2021-09-28 11:50:50 -07:00

1 2 3 4 5 ...

251 Commits