llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	89fab98e88	[DebugInfo] llvm::Optional => std::optional https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-05 00:09:22 +00:00
Kazu Hirata	286223edc6	[llvm-profgen] Use std::optional in ProfiledBinary.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-26 19:01:24 -08:00
Hongtao Yu	d5a963ab8b	[PseudoProbe] Replace relocation with offset for entry probe. Currently pseudo probe encoding for a function is like: - For the first probe, a relocation from it to its physical position in the code body - For subsequent probes, an incremental offset from the current probe to the previous probe The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address. A source function could be lowered into multiple binary functions due to outlining (e.g, coro-split). Since those binary function have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing binary function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the binary function name. Most of the source functions end up with only one binary function. For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source. The sentinel probe indicates the binary function name to differentiate subsequent probes from the ones from a different binary function. For examples, given source function ``` Foo() { … Probe 1 … Probe 2 } ``` If it is transformed into two binary functions: ``` Foo: … Foo.outlined: … ``` The encoding for the two binary functions will be separate: ``` GUID of Foo Probe 1 GUID of Foo Sentinel probe of Foo.outlined Probe 2 ``` Then probe1 will be decoded against binary `Foo`'s address, and Probe 2 will be decoded against `Foo.outlined`. The sentinel probe of `Foo.outlined` makes sure there's not accidental relocation from `Foo.outlined`'s probes to `Foo`'s entry address. On the BOLT side, to be minimal intrusive, the pseudo probe re-encoding sticks with the old encoding format. This is fine since unlike linker, Bolt processes the pseudo probe section as a whole and it is free from relocation overflow issues. The change is downwards compatible as long as there's no mixed use of the old encoding and the new encoding. Reviewed By: wenlei, maksfb Differential Revision: https://reviews.llvm.org/D135912 Differential Revision: https://reviews.llvm.org/D135914 Differential Revision: https://reviews.llvm.org/D136394	2022-10-27 13:28:22 -07:00
wlei	91cc53d5a4	[llvm-profgen] Do not cache the frame location stack during computing inlined context size In `computeInlinedContextSizeForRange`, the offset of range is only used one time, there is no need to cache the frame location stack. Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D128859	2022-10-25 21:08:36 -07:00
wlei	467652486f	[llvm-profgen] Fix inconsistent loading address issues This is to fix two issues related with loading address: 1) When multiple MMAPs occur and their loading address are different, before it only used the first MMap as base address, all perf address after it used the wrong base address. 2) For pseudo probe profile, the address is always based on preferred loading address. If the base address is not equal to the preferred loading address, the pseudo probe address query will be wrong. Solution: Instead of converting the address to offset lazily, right now all the address after parsing are converted on the fly based on preferred loading address in the parsing time. There is no "offset" used in profile generator any more. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D126827	2022-10-13 23:19:30 -07:00
Kazu Hirata	611ffcf4e4	[llvm] Use value instead of getValue (NFC)	2022-07-13 23:11:56 -07:00
wlei	7e86b13c63	[CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner. One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node. Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic. Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D127031	2022-06-27 23:22:21 -07:00
Kazu Hirata	a7938c74f1	[llvm] Don't use Optional::hasValue (NFC) This patch replaces Optional::hasValue with the implicit cast to bool in conditionals only.	2022-06-25 21:42:52 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit `aa8feeefd3`.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Fangrui Song	557efc9a8b	[llvm] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC Some cl::ZeroOrMore were added to avoid the `may only occur zero or one times!` error. More were added due to cargo cult. Since the error has been removed, cl::ZeroOrMore is unneeded. Also remove cl::init(false) while touching the lines.	2022-06-03 21:59:05 -07:00
Hongtao Yu	9f732af583	[llvm-profgen] Filter out oversized LBR ranges. As a follow up to {D123271}, LBR ranges that are too big should also be considered as invalid. For example, the last two pairs in the following trace form a range [0x0d7b02b0, 0x368ba706] that covers a ton of functions in the binary. Such oversized range should also be ignored. 0x0c74505f/0x368b99a0 0x368ba706/0x0c745040 0x0d7b1c3f/0x0d7b02b0 Add a defensive check to filter out those ranges based that the valid range should not cross the unconditional branch(Call, return, unconditional jmp). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D125448	2022-05-12 10:58:50 -07:00
wlei	bfcb2c1119	[llvm-profgen] Decouple artificial branch from LBR parser and fix external address related issues This patch is fixing two issues for both CS and non-CS. 1) For external-call-internal, the head samples of the the internal function should be recorded. 2) avoid ignoring LBR after meeting the interrupt branch for CS profile LBR parser is shared between CS and non-CS, we found it's error-prone while dealing with artificial branch inside LBR parser. Since artificial branch is mainly used for CS profile unwinding, this patch tries to simplify LBR parser by decoupling artificial branch code from it, the concept of artificial branch is removed and split into two transitional branches(internal-to-external, external-to-internal). Then we leave all the processing of external branch to unwinder. Specifically for unwinder, remembering that we introduce external frame in https://reviews.llvm.org/D115550. We can just take external address as a regular address and reuse current unwind function(unwindCall, unwindReturn). For a normal case, the external frame will match an external LBR, and it will be filtered out by `unwindLinear` without losing any context. The data also shows that the interrupt or standalone LBR pattern(unpaired case) does exist, we choose to handle it by clearing the call stack and keeping unwinding. Here we leverage checking in `unwindLinear`, because a standalone LBR, no matter its type, since it doesn’t have other part to pair, it will eventually cause a wrong linear range, like [external, internal], [internal, external]. Then set the state to invalid there. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D118177	2022-04-28 16:07:28 -07:00
Hongtao Yu	3f97016857	[llvm-profgen] Decoding pseudo probe for profiled function only. Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled. To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain. I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding. I'm seeing 20GB memory is saved for one of our internal large service. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D121643	2022-03-23 14:15:11 -07:00
serge-sans-paille	db29f4374d	Cleanup include: DebugInfo/Symbolize Estimation of the impact on preprocessor output after: 1067349756 before:1067487786 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120433	2022-02-24 13:25:11 +01:00
wlei	b3a778fb5e	[llvm-profgen] Support symbol loading for debug fission Support to load debug info from dwarf split file, like .dwo, .dwp files. Leverage the `getNonSkeletonUnitDIE(false)` API to achieve this. Add test cause to make sure all the ranges is well retrieved by the loader. Reviewed By: ayermolo, hoy, wenlei Differential Revision: https://reviews.llvm.org/D115973	2022-02-23 09:40:46 -08:00
Hongtao Yu	34e131b0f2	[llvm-profgen] On-demand track optimized-away inlinees for preinliner. Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking on-demand based on profiled functions only. This saves about 10% memory overall for a medium-sized benchmark. Before: note: After parsePerfTraces note: Thu Jan 27 18:42:09 2022 note: VM: 8.68 GB RSS: 8.39 GB note: After computeSizeForProfiledFunctions note: Thu Jan 27 18:42:41 2022 note: VM: 10.63 GB RSS: 10.20 GB note: After generateProbeBasedProfile note: Thu Jan 27 18:45:49 2022 note: VM: 25.00 GB RSS: 24.95 GB note: After postProcessProfiles note: Thu Jan 27 18:49:29 2022 note: VM: 26.34 GB RSS: 26.27 GB After: note: After parsePerfTraces note: Fri Jan 28 12:04:49 2022 note: VM: 8.68 GB RSS: 7.65 GB note: After computeSizeForProfiledFunctions note: Fri Jan 28 12:05:26 2022 note: VM: 8.68 GB RSS: 8.42 GB note: After generateProbeBasedProfile note: Fri Jan 28 12:08:03 2022 note: VM: 22.93 GB RSS: 22.89 GB note: After postProcessProfiles note: Fri Jan 28 12:11:30 2022 note: VM: 24.27 GB RSS: 24.22 GB This should be a no-diff change in terms of profile quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D118515	2022-02-08 08:33:23 -08:00
Simon Pilgrim	c56a85fde0	[llvm-profgen] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointers are dereferenced immediately, so assert the cast is correct instead of returning nullptr	2022-02-02 14:12:10 +00:00
wlei	6693c562f9	[llvm-profgen] Support to load debug info from a second binary For reducing binary size purpose, the binary's debug info and executable segment can be separated(like using objcopy --only-keep-debug). Here add support in llvm-profgen to use two binaries as input. The original one is executable binary and added for debug info only binary. Adding a flag `--debug-binary=file-path`, with this, the binary will load debug info from debug binary. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115948	2022-01-24 17:14:05 -08:00
Simon Pilgrim	f4aa2a42ed	[llvm-profgen] ProfiledBinary::load - use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointer is always dereferenced immediately, so assert the cast is correct instead of returning nullptr	2022-01-14 15:51:21 +00:00
wlei	3220571793	[llvm-profgen] Skip disassembling for PLT section Skip disassembling .plt section, then .plt section code will be treated as external code. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115699	2021-12-14 16:40:53 -08:00
wlei	484a569eea	[llvm-profgen] Fix total samples related issues Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool. Three changes in this diff: 1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch. 2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold. 3. Change to accumulate total samples per byte instead of per instruction. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115013	2021-12-08 12:33:41 -08:00
wlei	41a681ce09	[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen: 1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary. 2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`. 3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData. Reviewed By: xur, hoy, wenlei Differential Revision: https://reviews.llvm.org/D113296	2021-11-30 15:57:59 -08:00
Wenlei He	f7976edc1e	[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools. Differential Revision: https://reviews.llvm.org/D113727	2021-11-15 19:00:27 -08:00
wlei	aab1810006	[llvm-profgen] Fix bug of setting function entry Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore reporting callsite sample that's from an intra func branch. However, in HHVM, it appears that the symbol table name is inconsistent with the dwarf info func name, it's likely due to `OptimizeGlobalAliases`. This change is a workaround in llvm-profgen side to mark the only one range as the function entry and add warnings for the remaining inconsistence. This also fixed a missing `getCanonicalFnName` for symbol name which caused the mismatching as well. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113492	2021-11-12 12:18:43 -08:00
wlei	5bf191a381	[llvm-profgen] Fix index out of bounds error while using ip.advance Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113238	2021-11-05 18:38:40 -07:00
wlei	138202a8c3	[llvm-profgen] Warn on invalid range and show warning summary Two things in this diff: 1) Warn on the invalid range, currently three types of checking, see the detailed message in the code. 2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues. Example of warning summary. ``` warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction. warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function. ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D111902	2021-11-02 19:55:55 -07:00
wlei	f5537643b8	[llvm-profgen] Update total samples by accumulating all its body samples Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D112672	2021-10-29 10:36:57 -07:00
Kazu Hirata	3b285ff517	[llvm-profgen] Fix a set-but-unused warning This patch fixes: llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable 'EndOffset' set but not used [-Werror,-Wunused-but-set-variable] The last use of the variable was removed on Oct 26 in commit `40ca411251`.	2021-10-29 10:19:44 -07:00
wlei	2f8196db92	[llvm-profgen] Fix bug of populating profile symbol list Previous implementation of populating profile symbol list is wrong, it only included the profiled symbols. Actually it should use all symbols, here this switches to use the symbols from debug info. Also turned the flag off by default. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D111824	2021-10-29 09:59:12 -07:00
wlei	40ca411251	[llvm-profgen] Switch to DWARF-based symbol and ranges It happened a bug that some callsite name in the profile is not a real function, it turned out that there're some non-function symbol from the ELF text section, e.g. the global accessible branch label and also recalled that we can have one function being split into multiple ranges. We shouldn't count samples for those are not the entry of the real function. So this change tried to fix this issue by switching to use the name or ranges from DWARF-based debug info, the range of which assure it's the real function start. For the split functions, we assume that the real entry function's DWARF name should always match the symbol table name. The switching is also consistent with the body samples' symbol which is from DWARF. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D112282	2021-10-29 09:59:12 -07:00
Wenlei He	a316343e19	[llvm-profgen] Allow generating AutoFDO profile from CSSPGO binary Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe). Differential Revision: https://reviews.llvm.org/D111776	2021-10-14 09:11:56 -07:00
Reid Kleckner	89b57061f7	Move TargetRegistry.(h\|cpp) from Support to MC This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support. This allows us to ensure that Support doesn't have includes from MC/*. Differential Revision: https://reviews.llvm.org/D111454	2021-10-08 14:51:48 -07:00
wlei	31a5cb3292	[llvm-profgen] Filter out invalid debug line Differential Revision: https://reviews.llvm.org/D110081	2021-10-04 19:09:06 -07:00
wlei	46cf7d75d9	[llvm-profgen] Add duplication factor for line-number based profile This change adds duplication factor multiplier while accumulating body samples for line-number based profile. The body sample count will be `duplication-factor * count`. Base discriminator and duplication factor is decoded from the raw discriminator, this requires some refactor works. Differential Revision: https://reviews.llvm.org/D109934	2021-10-04 19:08:55 -07:00
wlei	fb29d812e4	[CSSPGO] Rename the field of SampleContextFrame Differential Revision: https://reviews.llvm.org/D110980	2021-10-04 19:06:59 -07:00
Wenlei He	47d66355ef	[llvm-profgen] Fix alignment in preferred based calculation We used the segment alignment in elf header to assume the loader alignment. However this is incorrect because loader alignment is always the same as page size. If segment needs to be aligned at load time, linker will set aligned address as virtual address in elf header. Differential Revision: https://reviews.llvm.org/D110795	2021-09-29 23:01:10 -07:00
wlei	ce40843a3f	[llvm-profgen][CSSPGO] On-demand function size computation for preinliner Similar to https://reviews.llvm.org/D110465, we can compute function size on-demand for the functions that's hit by samples. Here we leverage the raw range samples' address to compute a set of sample hit function. Then `BinarySizeContextTracker` just works on those function range for the size. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D110466	2021-09-28 09:09:38 -07:00
wlei	091c16f76b	[llvm-profgen] On-demand symbolization Previously we do symbolization for all the functions and actually we only need the symbols that's hit by the samples. This can significantly speed up the time for large size binary. Optimization for per-inliner will come along with next patch. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110465	2021-09-28 09:09:25 -07:00
wlei	28277e9b48	[AutoFDO][llvm-profgen] Report zero count for unexecuted part of function code In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function. This change also extends the current `findDisjointRanges` method which now can support adding zero-value range. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D109713	2021-09-24 14:15:05 -07:00
wlei	c2be2d3284	[llvm-profgen] Fix a bug of assertion The assertion should work on the entire context. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110268	2021-09-22 18:33:27 -07:00
Hongtao Yu	734f4d832c	[llvm-profgen] An option to dump disasm of specified symbols For large app, dumping disasm of the whole program can be slow and result in gianant output. Adding a switch to dump specific symbols only. Reviewed By: wlei Differential Revision: https://reviews.llvm.org/D110079	2021-09-22 10:32:59 -07:00
wlei	964053d56f	[llvm-profgen] Support LBR only perf script This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info. An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`) ``` 40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ... 4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ... ... ``` For implementation: - Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer. - `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded. - Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key. - Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output. Profile generation part will come soon. Differential Revision: https://reviews.llvm.org/D108153	2021-08-31 13:28:17 -07:00
Hongtao Yu	b9db70369b	[CSSPGO] Split context string to deduplicate function name used in the context. Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context. A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing. When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`. Testing This is no-diff change in terms of code quality and profile content (for text profile). For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated. The compile time of ads is dropped by 25%. Differential Revision: https://reviews.llvm.org/D107299	2021-08-30 20:09:29 -07:00
Wenlei He	a6f15e9a49	[CSSPGO] Use probe inline tree to track zero size fully optimized context for pre-inliner This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen. Differential Revision: https://reviews.llvm.org/D108350	2021-08-25 09:01:11 -07:00
Wenlei He	eca03d2768	[CSSPGO] Track and use context-sensitive post-optimization function size to drive global pre-inliner in llvm-profgen This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions. To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context. The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed. Differential Revision: https://reviews.llvm.org/D108180	2021-08-18 22:50:57 -07:00
wlei	856a6a5041	[CSSPGO][llvm-profgen] Trim and merge context beforehand to reduce memory usage Currently we use a centralized string map(StringMap<FunctionSamples> ProfileMap) to store the profile while populating the sample, which might cause the memory usage bottleneck. I saw in an extreme case, there are thousands of samples whose context stack depth is >= 100. The memory consumption can be greater than 100GB. As here the context is used for inlining, we can assume we won't have so many of inlinees keeping inlined at the same root function, so this change tried to cap the context stack and merge the samples for peak memory reduction and this is done after recursion compression. The default value is -1 meaning no depth limit, in the future we can tune to a smaller one. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D107800	2021-08-11 16:02:35 -07:00
jamesluox	ee7d20e846	[CSSPGO] Migrate and refactor the decoder of Pseudo Probe Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D106861	2021-08-04 09:21:34 -07:00
Hongtao Yu	0712038458	[CSSPGO][llvm-profgen] Allow multiple executable load segments. The linker or post-link optimizer can create an ELF image with multiple executable segments each of which will be loaded separately at run time. This breaks the assumption of llvm-profgen that currently only supports one base load address. What it ends up with is that the subsequent mmap events will be treated as an overwrite of the first mmap event which will in turn screw up address mapping. While it is non-trivial to support multiple separate load addresses and given that on x64 those segments will always be loaded at consecutive addresses (though via separate mmap sys calls), I'm adding an error checking logic to bail out if that's violated and keep using a single load address which is the address of the first executable segment. Also changing the disassembly output from printing section offset to printing the virtual address instead, which matches the behavior of objdump. Differential Revision: https://reviews.llvm.org/D103178	2021-07-13 18:22:24 -07:00

1 2

73 Commits