Metadata operands tend to require special conditions, especially on dbg
intrinsics. We also don't have a zero value for metadata.
Replacing callee operands is a little weird, since calling undef/null
doesn't make sense. It also causes tons of invalid reductions when
reducing calls to intrinsics since only arguments to intrinsics can be
of the metadata type.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D113532
Add a new "operands-skip" pass whose goal is to remove instructions in the middle of dependency chains. For instance:
```
%baseptr = alloca i32
%arrayidx = getelementptr i32, i32* %baseptr, i32 %idxprom
store i32 42, i32* %arrayidx
```
might be reducible to
```
%baseptr = alloca i32
%arrayidx = getelementptr ... ; now dead, together with the computation of %idxprom
store i32 42, i32* %baseptr
```
Other passes would either replace `%baseptr` with undef (operands, instructions) or move it to become a function argument (operands-to-args), both of which might fail the interestingness check.
In principle the implementation allows operand replacement with any value or instruction in the function that passes the filter constraints (same type, dominance, "more reduced"), but is limited in this patch to values that are directly or indirectly used to compute the current operand value, motivated by the example above. Additionally, function arguments are added to the candidate set which helps reducing the number of relevant arguments mitigating a concern of too many arguments mentioned in https://reviews.llvm.org/D110274#3025013.
Possible future extensions:
* Instead of requiring the same type, bitcast/trunc/zext could be automatically inserted for some more flexibility.
* If undef is added to the candidate set, "operands-skip"is able to produce any reduction that "operands" can do. Additional candidates might be zero and one, where the "reductive power" classification can prefer one over the other. If undefined behaviour should not be introduced, undef can be removed from the candidate set.
Recommit after resolving conflict with D112651 and reusing
shouldReduceOperand from D113532.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111818
Add a new "operands-skip" pass whose goal is to remove instructions in the middle of dependency chains. For instance:
```
%baseptr = alloca i32
%arrayidx = getelementptr i32, i32* %baseptr, i32 %idxprom
store i32 42, i32* %arrayidx
```
might be reducible to
```
%baseptr = alloca i32
%arrayidx = getelementptr ... ; now dead, together with the computation of %idxprom
store i32 42, i32* %baseptr
```
Other passes would either replace `%baseptr` with undef (operands, instructions) or move it to become a function argument (operands-to-args), both of which might fail the interestingness check.
In principle the implementation allows operand replacement with any value or instruction in the function that passes the filter constraints (same type, dominance, "more reduced"), but is limited in this patch to values that are directly or indirectly used to compute the current operand value, motivated by the example above. Additionally, function arguments are added to the candidate set which helps reducing the number of relevant arguments mitigating a concern of too many arguments mentioned in https://reviews.llvm.org/D110274#3025013.
Possible future extensions:
* Instead of requiring the same type, bitcast/trunc/zext could be automatically inserted for some more flexibility.
* If undef is added to the candidate set, "operands-skip"is able to produce any reduction that "operands" can do. Additional candidates might be zero and one, where the "reductive power" classification can prefer one over the other. If undefined behaviour should not be introduced, undef can be removed from the candidate set.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111818
When reducing functions with very large basic blocks (~ almost 1 million
BBs), the majority of time is spent maintaining the order in the std::set
for the basic blocks to keep.
In those cases, DenseSet<> is much more efficient. Use it instead.
This patch adds a fuzzing helper tool for D demangler by feeding the demangler API with
pseudo-random null terminated strings with the help of libfuzzer heuristics.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D111432
Previously, if the basic-blocks delta pass tried to remove a basic block
that was the last basic block in a function that did not have external
or weak linkage, the resulting IR would become invalid. Since removing
the last basic block in a function is effectively identical to removing
the function body itself, we check explicitly for this case and if we
detect it, we run the same logic as in ReduceFunctionBodies.cpp
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D113486
Sometimes if llvm-reduce is interrupted in the middle of a delta pass on
a large file, it can take quite some time for the tool to start actually
doing new work if it is restarted again on the partially-reduced file. A
lot of time ends up being spent testing large chunks when these large
chunks are very unlikely to actually pass the interestingness test. In
cases like this, the tool will complete faster if the starting
granularity is reduced to a finer amount. Thus, we introduce a command
line flag that automatically divides the chunks into smaller subsets a
fixed, user-specified number of times prior to beginning the core loop.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D112651
A new tool that compares TargetLibraryInfo's opinion of the availability
of library function calls against the functions actually exported by a
specified set of libraries. Can be helpful in verifying the correctness
of TLI for a given target, and avoid mishaps such as had to be addressed
in D107509 and 94b4598d.
The tool currently supports ELF object files only, although it's unlikely
to be hard to add support for other formats.
Re-commits 62dd488 with changes to use pre-generated objects, as not all
bots have ld.lld available.
Differential Revision: https://reviews.llvm.org/D111358
A new tool that compares TargetLibraryInfo's opinion of the availability
of library function calls against the functions actually exported by a
specified set of libraries. Can be helpful in verifying the correctness
of TLI for a given target, and avoid mishaps such as had to be addressed
in D107509 and 94b4598d.
The tool currently supports ELF object files only, although it's unlikely
to be hard to add support for other formats.
Differential Revision: https://reviews.llvm.org/D111358
I am planning to upstream MachOObjectFile code to support Darwin
chained fixups. In order to test the new parser features we need a way
to produce correct (and incorrect) chained fixups. Right now the only
tool that can produce them is the Darwin linker. To avoid having to
check in binary files, this patch allows obj2yaml to print a hexdump
of the raw LINKEDIT and DATA segment, which both allows to
bootstrap the parser and enables us to easily create malformed inputs
to test error handling in the parser.
This patch adds two new options to obj2yaml:
-raw-data-segment
-raw-linkedit-segment
Differential Revision: https://reviews.llvm.org/D113234
The existing code has unnecessary logic to indirectly pass
errors through function calls. This diff gets rid of the fluff.
Test Plan: Existing unit tests
Reviewed By: jhenderson, drodriguez, alexander-shaposhnikov
Differential Revision: https://reviews.llvm.org/D113301
Replace the description and file names for this argument. As far as I understand
this is a positional argument and I don't believe this changes breaks any existing
interfaces.
Reviewed By: hctim, MaskRay
Differential Revision: https://reviews.llvm.org/D113316
Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113238
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.
Reviewed By: #libc, ldionne, mehdi_amini
Differential Revision: https://reviews.llvm.org/D113186
We almost always want to use the default AA pipeline. It's very easy for
users of PassBuilder to forget to customize the AAManager to use the
default AA pipeline (for example, the NewPM C API forgets to do this).
If somebody wants a custom AA pipeline, similar to what is being done
now with the default AA pipeline registration, they can
FAM.registerPass([&] { return std::move(MyAA); });
before calling
PB.registerFunctionAnalyses(FAM);
For example, LTOBackend.cpp and NewPMDriver.cpp do this.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D113210
If a tool wants to introduce new indirections via stubs at link-time in
ORC, it can cause fidelity issues around the address of the function if
some references to the function do not have relocations. This is known
to happen inside the body of the function itself on x86_64 for example,
where a PC-relative address is formed, but without a relocation.
```
_foo:
leaq -7(%rip), %rax ## form pointer to '_foo' without relocation
_bar:
leaq (%rip), %rax ## uses X86_64_RELOC_SIGNED to '_foo'
```
The consequence of introducing a stub for such a function at link time
is that if it forms a pointer to itself without relocation, it will not
have the same value as a pointer from outside the function. If the
function pointer is used as a key, this can cause problems.
This utility provides best-effort support for adding such missing
relocations using MCDisassembler and MCInstrAnalysis to identify the
problematic instructions. Currently it is only implemented for x86_64.
Note: the related issue with call/jump instructions is not handled
here, only forming function pointers.
rdar://83514317
Differential revision: https://reviews.llvm.org/D113038
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D113080
The only binary-format-related field in the BBAddrMap structure is the function address (`Addr`), which will use uint64_t in 64B format and uint32_t in 32B format. This patch changes it to use uint64_t in both formats.
This allows non-templated use of the struct, at the expense of a marginal additional size overhead for the 32-bit format. The size of the BB address map section does not change.
Differential Revision: https://reviews.llvm.org/D112679
As seen in https://bugs.llvm.org/show_bug.cgi?id=52213 llvm-objdump
asserts if either the --debug-vars or the --dwarf options are provided
with invalid values. As suggested, this fix adds use of a default value
to these options and errors when given bad input.
Differential Revision: https://reviews.llvm.org/D112183
By default `llvm::seq` would happily iterate over enums, which may be unsafe if the enum values are not continuous. This patch disable enum iteration with `llvm::seq` and `llvm::seq_inclusive` and adds two new functions: `enum_seq` and `enum_seq_inclusive`.
To make sure enum iteration is safe, we require users to declare their enum types as iterable by specializing `enum_iteration_traits<SomeEnum>`. Because it's not always possible to add these traits next to enum definition (e.g., for enums defined in external libraries), we provide an escape hatch to allow iteration on per-callsite basis by passing `force_iteration_on_noniterable_enum`.
The main benefit of this approach is that these global declarations via traits can appear just next to enum definitions, making easy to spot when enums are miss-labeled, e.g., after introducing new enum values, whereas `force_iteration_on_noniterable_enum` should stand out and be easy to grep for.
This emerged from a discussion with gchatelet@ about reusing llvm's `Sequence.h` in lieu of https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h.
Reviewed By: dblaikie, gchatelet, aaron.ballman
Differential Revision: https://reviews.llvm.org/D107378
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.
Example of warning summary.
```
warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction.
warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function.
```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D111902
These were added to prevent functions from being removed by WPO.
But that doesn't make sense, correct WPO will not remove functions we actually use.
I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112971
This is a new draft of D28234. I previously did the unorthodox thing of
pushing to it when I wasn't the original author, but since this version
- Uses `GNUInstallDirs`, rather than mimics it, as the original author
was hesitant to do but others requested.
- Is much broader, effecting many more projects than LLVM itself.
I figured it was time to make a new revision.
I am using this patch (and many back-ports) as the basis of
https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS). It
looked like people were generally on board in D28234, but I make note of
this here in case extra motivation is useful.
---
As pointed out in the original issue, a central tension is that LLVM
already has some partial support for these sorts of things. For example
`LLVM_LIBDIR_SUFFIX`, or `COMPILER_RT_INSTALL_PATH`. Because it's not
quite clear yet what to do about those, we are holding off on changing
libdirs and `compiler-rt`. for this initial PR.
---
On the advice of @lebedev.ri, I am splitting this up a bit per
subproject, starting with LLVM. To allow it to be more easily reviewed. This and the subsequent patch must be landed together, as this will not build alone. But the rest can be landed on their own.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D100810
Notes generated in OpenBSD core files provide additional information
about the kernel state and CPU registers. These notes are described
in core.5, which can be viewed here: https://man.openbsd.org/core.5
Differential Revision: https://reviews.llvm.org/D111966
(Second try. Need to link against CodeGen and MC libs.)
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527
Allow filling zero count for all the function ranges even there is no samples hitting that function. Add a switch for this.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112858
Use the new sys::path::is_style_posix() and is_style_windows() in a few
places that need to detect the system's native path style.
In llvm/lib/Support/Path.cpp, this patch removes most uses of the
private `real_style()`, where is_style_posix() and is_style_windows()
are just a little tidier.
Elsewhere, this removes `_WIN32` macro checks. Added a FIXME to a
FileManagerTest that seemed fishy, but maintained the existing
behaviour.
Differential Revision: https://reviews.llvm.org/D112289
Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112672
This patch fixes:
llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable
'EndOffset' set but not used [-Werror,-Wunused-but-set-variable]
The last use of the variable was removed on Oct 26 in commit
40ca411251.
The extractBasicBlocksFromModule, extractInstrFromModule, and other
similar functions previously performed very poorly when the number of
such elements in the program to reduce was very high. Previously, we
were creating the set which caches elements to keep by looping through
all elements in the module and adding them to the set. However, since
std::set is an ordered set, this introduces a massive amount of
rebalancing if the order of elements in the program and the order of
their pointers in memory are not the same.
The solution is straightforward: first put all the elements to be kept
in a vector, then use the constructor for std::set which takes a pair of
iterators over a collection. This constructor is optimized to avoid
doing unnecessary work when initializing large sets.
Also in this change, we pass BBsToKeep set to functions
replaceBranchTerminator and removeUninterestingBBsFromSwitch as a const
reference rather than passing it by value. This ought to prevent the
need to copy the collection each time these functions are called, which
is expensive if the collection is large.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D112757
Previous implementation of populating profile symbol list is wrong, it only included the profiled symbols. Actually it should use all symbols, here this switches to use the symbols from debug info. Also turned the flag off by default.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D111824
It happened a bug that some callsite name in the profile is not a real function, it turned out that there're some non-function symbol from the ELF text section, e.g. the global accessible branch label and also recalled that we can have one function being split into multiple ranges. We shouldn't count samples for those are not the entry of the real function.
So this change tried to fix this issue by switching to use the name or ranges from DWARF-based debug info, the range of which assure it's the real function start. For the split functions, we assume that the real entry function's DWARF name should always match the symbol table name.
The switching is also consistent with the body samples' symbol which is from DWARF.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112282
This was checked while counting but not actually when doing the reduction, resulting in crashes.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D112766
It seems that llvm-objcopy stores data temporarily misaligned with the
requirements of the underlaying struct from libBinaryFormat, and UBSan
generates a runtime error.
Instead of trying to reinterpret the memory as the struct itself, simply
access the `char *` pointer that we are interested in, and that do not
have alignment restrictions.
This problem was pointed out in a comment of D111164.
Differential Revision: https://reviews.llvm.org/D112744
Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D112489
**Context:**
This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test
the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication.
**Summary**
Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.
The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.
Reviewed By: alexander-shaposhnikov, #lld-macho
Differential Revision: https://reviews.llvm.org/D111164
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this.
Format of unsymbolized profile:
```
[context stack1] # If it's a CS profile
number of entries in RangeCounter
from_1-to_1:count_1
from_2-to_2:count_2
......
from_n-to_n:count_n
number of entries in BranchCounter
src_1->dst_1:count_1
src_2->dst_2:count_2
......
src_n->dst_n:count_n
[context stack2]
......
```
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D111750
IRBuilder has been updated to support preserving metdata in a more
general manner. This patch adds `LLVMAddMetadataToInst` and
deprecates `LLVMSetInstDebugLocation` in favor of the more
general function.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D93454
We incorrectly use duplication factor for total samples even though we already accumulate samples instead of taking MAX. It causes profile to have bloated total samples for functions with loop unrolled or vectorized. The change fix the issue for total sample, head sample and call target samples.
Differential Revision: https://reviews.llvm.org/D112042
Having non-undef constants in a final llvm-reduce output is nicer than
having undefs.
This splits the existing reduce-operands pass into three, one which does
the same as the current pass of reducing to undef, and two more to
reduce to the constant 1 and the constant 0. Do not reduce to undef if
the operand is a ConstantData, and do not reduce 0s to 1s.
Reducing GEP operands very frequently causes invalid IR (since types may
not match up if we index differently into a struct), so don't touch GEPs.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D111765
When printing names in lldb on windows these names contain the full type information while on linux only the name is contained.
This change introduces a flag in the Microsoft demangler to control if the type information should be included.
With the flag enabled demangled name contains only the qualified name, e.g:
without flag -> with flag
int (*array2d)[10] -> array2d
int (*abc::array2d)[10] -> abc::array2d
const int *x -> x
For globals there is a second inconsistency which is not yet addressed by this change. On linux globals (in global namespace) are prefixed with :: while on windows they are not.
Reviewed By: teemperor, rnk
Differential Revision: https://reviews.llvm.org/D111715
While build llvm-project as a sub-project on windows, met a build error:
libllvm-c.exports /llvm/bin\llvm-nm.exe: error: ...builds/rel64ninja/./lib/LLVMDemangle.lib: no such file or directory
The libllvm-c.exports, libllvm-c.args, and lib/*.lib should under LLVM_BINARY_DIR, using CMAKE_BINARY_DIR will cause 'no such file' error while llvm-project built as a sub-project.
By default, such a non-template variable of non-volatile const-qualified type
having namespace-scope has internal linkage ([basic.link]), so no need for `static`.
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
Right now when we see -O# we add the corresponding 'default<O#>' into
the list of passes to run when translating legacy -pass-name. This has
the side effect of not using the default AA pipeline.
Instead, treat -O# as -passes='default<O#>', but don't allow any other
-passes or -pass-name. I think we can keep `opt -O#` as shorthand for
`opt -passes='default<O#>` but disallow anything more than just -O#.
Tests need to be updated to not use `opt -O# -pass-name`.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112036
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
Currently -W and --wide are treated as two options as they are only
included for gnu readelf compatibility and ignored. This change makes -W
an alias of --wide to be consistent with other option aliases.
Differential Revision: https://reviews.llvm.org/D111731
It can be a bit confusing to stop with no explanation so we should indicate
when further output was prevented by the cycle limit.
Differential Revision: https://reviews.llvm.org/D111753
Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe).
Differential Revision: https://reviews.llvm.org/D111776
The first LBR entry can be an external branch, we should ignore the whole trace.
```
7f7448e889e4 0x7f7448e889e4/0x7f7448e88826/P/-/-/1 0x7f7448e8899f/0x7f7448e889d8/P/-/-/4 ...
```
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D111749
With `ignore-stack-samples`, We can ignore the call stack before the samples aggregation which could reduce some redundant computations.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D111577
SimpleRemoteEPC notionally allowed subclasses to override the
createMemoryManager and createMemoryAccess methods to use custom objects, but
could not actually be subclassed in practice (The construction process in
SimpleRemoteEPC::Create could not be re-used).
Instead of subclassing, this commit adds a SimpleRemoteEPC::Setup class that
can be used by clients to set up the memory manager and memory access members.
A default-constructed Setup object results in no change from previous behavior
(EPCGeneric* memory manager and memory access objects used by default).
Instead of setting operands to undef as the "operands" pass does,
convert the operands to a function argument. This avoids having to
introduce undef values into the IR which have some unpredictability
during optimizations.
For instance,
define void @func() {
entry:
%val = add i32 32, 21
store i32 %val, i32* null
ret void
}
is reduced to
define void @func(i32 %val) {
entry:
%val1 = add i32 32, 21
store i32 %val, i32* null
ret void
}
(note that the instruction %val is renamed to %val1 when printing
the IR to avoid ambiguity; ideally %val1 would be removed by dce or the
instruction reduction pass)
Any call to @func is replaced with a call to the function with the
new signature and filled with undef. This is not ideal for IPA passes,
but those out-of-scope for now.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111503
Adds explicit narrowing casts to JITLinkMemoryManager.cpp.
Honors -slab-address option in llvm-jitlink.cpp, which was accidentally
dropped in the refactor.
This effectively reverts commit 6641d29b70.
This commit substantially refactors the JITLinkMemoryManager API to: (1) add
asynchronous versions of key operations, (2) give memory manager implementations
full control over link graph address layout, (3) enable more efficient tracking
of allocated memory, and (4) support "allocation actions" and finalize-lifetime
memory.
Together these changes provide a more usable API, and enable more powerful and
efficient memory manager implementations.
To support these changes the JITLinkMemoryManager::Allocation inner class has
been split into two new classes: InFlightAllocation, and FinalizedAllocation.
The allocate method returns an InFlightAllocation that tracks memory (both
working and executor memory) prior to finalization. The finalize method returns
a FinalizedAllocation object, and the InFlightAllocation is discarded. Breaking
Allocation into InFlightAllocation and FinalizedAllocation allows
InFlightAllocation subclassses to be written more naturally, and FinalizedAlloc
to be implemented and used efficiently (see (3) below).
In addition to the memory manager changes this commit also introduces a new
MemProt type to represent memory protections (MemProt replaces use of
sys::Memory::ProtectionFlags in JITLink), and a new MemDeallocPolicy type that
can be used to indicate when a section should be deallocated (see (4) below).
Plugin/pass writers who were using sys::Memory::ProtectionFlags will have to
switch to MemProt -- this should be straightworward. Clients with out-of-tree
memory managers will need to update their implementations. Clients using
in-tree memory managers should mostly be able to ignore it.
Major features:
(1) More asynchrony:
The allocate and deallocate methods are now asynchronous by default, with
synchronous convenience wrappers supplied. The asynchronous versions allow
clients (including JITLink) to request and deallocate memory without blocking.
(2) Improved control over graph address layout:
Instead of a SegmentRequestMap, JITLinkMemoryManager::allocate now takes a
reference to the LinkGraph to be allocated. The memory manager is responsible
for calculating the memory requirements for the graph, and laying out the graph
(setting working and executor memory addresses) within the allocated memory.
This gives memory managers full control over JIT'd memory layout. For clients
that don't need or want this degree of control the new "BasicLayout" utility can
be used to get a segment-based view of the graph, similar to the one provided by
SegmentRequestMap. Once segment addresses are assigned the BasicLayout::apply
method can be used to automatically lay out the graph.
(3) Efficient tracking of allocated memory.
The FinalizedAlloc type is a wrapper for an ExecutorAddr and requires only
64-bits to store in the controller. The meaning of the address held by the
FinalizedAlloc is left up to the memory manager implementation, but the
FinalizedAlloc type enforces a requirement that deallocate be called on any
non-default values prior to destruction. The deallocate method takes a
vector<FinalizedAlloc>, allowing for bulk deallocation of many allocations in a
single call.
Memory manager implementations will typically store the address of some
allocation metadata in the executor in the FinalizedAlloc, as holding this
metadata in the executor is often cheaper and may allow for clean deallocation
even in failure cases where the connection with the controller is lost.
(4) Support for "allocation actions" and finalize-lifetime memory.
Allocation actions are pairs (finalize_act, deallocate_act) of JITTargetAddress
triples (fn, arg_buffer_addr, arg_buffer_size), that can be attached to a
finalize request. At finalization time, after memory protections have been
applied, each of the "finalize_act" elements will be called in order (skipping
any elements whose fn value is zero) as
((char*(*)(const char *, size_t))fn)((const char *)arg_buffer_addr,
(size_t)arg_buffer_size);
At deallocation time the deallocate elements will be run in reverse order (again
skipping any elements where fn is zero).
The returned char * should be null to indicate success, or a non-null
heap-allocated string error message to indicate failure.
These actions allow finalization and deallocation to be extended to include
operations like registering and deregistering eh-frames, TLS sections,
initializer and deinitializers, and language metadata sections. Previously these
operations required separate callWrapper invocations. Compared to callWrapper
invocations, actions require no extra IPC/RPC, reducing costs and eliminating
a potential source of errors.
Finalize lifetime memory can be used to support finalize actions: Sections with
finalize lifetime should be destroyed by memory managers immediately after
finalization actions have been run. Finalize memory can be used to support
finalize actions (e.g. with extra-metadata, or synthesized finalize actions)
without incurring permanent memory overhead.
Summary: This patch improves the error message context of the
XCOFF interfaces by providing more details.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D110320
ExecutorProcessControl objects will now have a TaskDispatcher member which
should be used to dispatch work (in particular, handling incoming packets in
the implementation of remote EPC implementations like SimpleRemoteEPC).
The GenericNamedTask template can be used to wrap function objects that are
callable as 'void()' (along with an optional name to describe the task).
The makeGenericNamedTask functions can be used to create GenericNamedTask
instances without having to name the function object type.
In a future patch ExecutionSession will be updated to use the
ExecutorProcessControl's dispatcher, instead of its DispatchTaskFunction.
Use Module& wherever possible.
Since every reduction immediately turns Chunks into an Oracle, directly pass Oracle instead.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D111122
Allow overlap/similarity comparison to use custom hot threshold cutoff, instead of using hard coded 990000 as hot cutoff.
Differential Revision: https://reviews.llvm.org/D111385
When parsing mmap to retrieve PID, deduplicate them before passing PID list to perf script. Perf script would error out when there's duplicated PID in the input, however raw perf data may main duplicated PID for large binary where more than one mmap is needed to load executable segment.
Differential Revision: https://reviews.llvm.org/D111384
This adds the `--dump-blockinfo` flag to `llvm-bcanalyzer`, allowing a sufficiently motivated user to dump (parts of) the `BLOCKINFO_BLOCK` block. The default behavior is unchanged, and `--dump-blockinfo` only takes effect in the same context as other flags that control dump behavior (i.e., requires that `--dump` is also passed).
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D107536
We have a string litteral (via CPP) used to construct `StringRef`, which
is used to construct a `SmallString`. Just construct the latter
directly.
Differential Revision: https://reviews.llvm.org/D111322
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
This reverts commit dfd74db981.
SimpleRemoteEPC should share dispatch with the ExecutionSession, rather than
having two different dispatch systems on the controller side.
SimpleRemoteEPCServer::Dispatch doesn't need to be shared.
Renames SimpleRemoteEPCServer::Dispatcher to SimpleRemoteEPCDispatcher and
moves it into OrcShared. SimpleRemoteEPCServer::ThreadDispatcher is similarly
moved and renamed to DynamicThreadPoolSimpleRemoteEPCDispatcher.
This will allow these classes to be reused by SimpleRemoteEPC on the controller
side of the connection.
A [[ https://reviews.llvm.org/rGf6fa95b77f33c3690e4201e505cb8dce1433abd9 | recent commit ]] removed `<string>` from `ErrorHandling.h`. The removal caused `<string>` to be no longer included for `llvm/tools/llvm-cxxdump/Error.cpp` which uses the string type.
This patch adds `<string>` to `llvm/tools/llvm-cxxdump/Error.cpp`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D111354
For some transformations like hot-cold split or coro split, it can outline its part of function ranges. Since sample loader is the early stage of backend and no split happens at that time, compiler can't recognize those function, so in llvm-profgen we should attribute the sample to the original function. This is already done for the body range samples since we use the symbols from dwarf which is created before the split.
But for branch samples, the call from master function to its outlined function is actually not a call to the original function, we shouldn't add head/callsie samples for it. So instead of dwarf symbol, we use the symbols from symbol table and ignore those functions with special suffixes(like `.cold` ,`.resume`) for accumulating the callsite/head samples.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110864
This change is to keep the help text and command guide of llvm-readelf
in tandem.
- In the help text mention that --section-data, --section-relocations,
--section-symbols and --stack-sizes have no effect on GNU style
output; give the accepted values for --elf-output-style and update
the description of --gnu-hash-table to use the command guide
description.
- In the command guide add the missing options -a,
--dependant-libraries,--no-demangle, --wide and -W. Also update the
description of --symbols so it matches the help text.
Differential Revision: https://reviews.llvm.org/D111240
This change is to add some missing details, clarifies some options and
brings the help text and command guide of objdump closer together.
- Added to the help that --all-headers also outputs symbols and
relocations to match the command guide.
- Added to the help that --debug-vars accepts an optional
ascii/unicode format to match the command guide.
- Changed the help descriptions for --disassemble,
--disassemble-all, --dwarf=<value>, --fault-map-section,
--line-numbers, --no-leading-addr and --source descriptions to
match the command guide.
- Added to the help that --start-address and --stop-address also
effect relocation entries and the symbol table output to match
the command guide.
- Added a note to the command guide that --unwind-info and -u
are not available for the elf format.
Differential Revision: https://reviews.llvm.org/D110633
In the command guide --prefix and --prefix-strip is used in the form
--prefix=<prefix> however currently it is used in the form --prefix
<prefix>. This change fixes these options to match the command guide.
Differential Revision: https://reviews.llvm.org/D110551
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
uint8_t Attribute;
uint32_t SigIndex;
};
```
Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.
In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.
I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.
Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.
This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.
Reviewed By: sbc100, tlively
Differential Revision: https://reviews.llvm.org/D111086
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
Excessive use of the <string> header has a massive impact on compile time; its most commonly included via the ErrorHandling.h header, which has to be included in many key headers, impacting many source files that have no need for std::string.
As an initial step toward removing the <string> include from ErrorHandling.h, this patch proposes to update the fatal_error_handler_t handler to just take a raw const char* instead.
The next step will be to remove the report_fatal_error std::string variant, which will involve a lot of cleanup and better use of Twine/StringRef.
Differential Revision: https://reviews.llvm.org/D111049
This change adds duplication factor multiplier while accumulating body samples for line-number based profile. The body sample count will be `duplication-factor * count`. Base discriminator and duplication factor is decoded from the raw discriminator, this requires some refactor works.
Differential Revision: https://reviews.llvm.org/D109934
This simplifies the code in a number of ways and avoids
having to track functions and their types separately.
Differential Revision: https://reviews.llvm.org/D111104
D104366 introduced a new llvm-cxxfilt test with non-ASCII characters,
which caused a failure on llvm-clang-x86_64-expensive-checks-win
builder, with a stack trace suggesting issue in a call to isalnum.
The argument to isalnum should be either EOF or a value that is
representable in the type unsigned char. The llvm-cxxfilt does not
perform a cast from char to unsigned char before the call, so the
value might be out of valid range.
Replace the call to isalnum with isAlnum from StringExtras, which takes
a char as the argument. This also makes the check independent of the
current locale.
Differential Revision: https://reviews.llvm.org/D110986
With the removal of OrcRPCExecutorProcessControl and OrcRPCTPCServer in
6aeed7b19c the ORC RPC library no longer has any in-tree users.
Clients needing serialization for ORC should move to Simple Packed
Serialization (usually by adopting SimpleRemoteEPC for remote JITing).
We expose the fact that we rely on unsigned wrapping to iterate through
all indexes. This can be confusing. Rather, keeping it as an
implementation detail through an iterator is less confusing and is less
code.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110885
Summary:
for xcoff :
implement the getSymbolFlag and getSymbolType() for option --syms.
llvm-objdump --sym , if the symbol is label, print the containing section for the symbol too.
when using llvm-objdump --sym --symbol--description, print the symbol index and qualname for symbol.
for example:
--symbol-description
00000000000000c0 l .text (csect: (idx: 2) .foov[PR]) (idx: 3) .foov
and without --symbol-description
00000000000000c0 l .text (csect: .foov) .foov
Reviewers: James Henderson,Esme Yi
Differential Revision: https://reviews.llvm.org/D109452
LLVM (llvmorg-14-init) under Debian sid using latest gcc (Debian
10.3.0-9) 10.3.0 fails due to ambiguous overload on operators == and !=:
/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:212:22:
error: ambiguous overload for 'operator!='
(operand types are 'llvm::ELFYAML::ELF_SHF' and 'int')
/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:204:32:
error: ambiguous overload for 'operator!='
(operand types are 'const llvm::yaml::Hex64' and 'int')
/root/src/llvm/src/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp:629:35:
error: ambiguous overload for 'operator=='
(operand types are 'const uint64_t' {aka 'const long unsigned int'} and
'llvm::Register')
Reviewed by: StephenTozer, jmorse, Higuoxing
Differential Revision: https://reviews.llvm.org/D109534
When replacing function calls, skip call instructions where the old
function is not the called function, but e.g. the old function is passed
as an argument.
This fixes a crash due to trying to construct invalid IR for the test
case.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109759
We used the segment alignment in elf header to assume the loader alignment. However this is incorrect because loader alignment is always the same as page size. If segment needs to be aligned at load time, linker will set aligned address as virtual address in elf header.
Differential Revision: https://reviews.llvm.org/D110795
This change enables llvm-profgen to take raw perf data as alternative input format. Sometimes we need to retrieve evenets for processes with matching binary. Using perf data as input allows us to retrieve process Ids from mmap events for matching binary, then filter by process id during perf script generation.
Differential Revision: https://reviews.llvm.org/D110793
This change contains diagnostics improvments, refactoring and preparation for consuming perf data directly.
Diagnostics:
- We now have more detailed diagnostics when no mmap is found.
- We also print warning for abnormal transition to external code.
Refactoring:
- Simplify input perf trace processing to only allow a single input file. This is because 1) using multiple input perf trace (perf script) is error prone because we may miss key mmap events. 2) the functionality is not really being used anyways.
- Make more functions private for Readers, move non-trivial definitions out of header. Cleanup some inconsistency.
- Prepare for consuming perf data as input directly.
Differential Revision: https://reviews.llvm.org/D110729
The ReduceMetadata pass before this patch removed metadata on a per-MDNode (or NamedMDNode) basis. Either all references to an MDNode are kept, or all of them are removed. However, MDNodes are uniqued, meaning that references to MDNodes with the same data become references to the same MDNodes. As a consequence, e.g. tbaa references to the same type will all have the same MDNode reference and hence make it impossible to reduce only keeping metadata on those memory access for which they are interesting.
Moreover, MDNodes can also be referenced by some intrinsics or other MDNodes. These references were not considered for removal leading to the possibility that MDNodes are not actually removed even if selected to be removed by the oracle.
This patch changes ReduceMetadata to reduces based on removable metadata references instead. MDNodes without references implicitly dropped anyway. References by intrinsic calls should be removed by ReduceOperands or ReduceInstructions. References in other MDNodes cannot be removed as it would violate the immutability of MDNodes.
Additionally, ReduceMetadata pass before this patch used `setMetadata(I, NULL)` to remove references, where `I` is the index in the array returned by `getAllMetadata`. However, `setMetadata` expects a MDKind (such as `MD_tbaa`) as first argument. `getAllMetadata` does not return those in consecutive order (otherwise it would not need to be a `std::pair` with `first` representing the MDKind).
Reviewed By: aeubanks, swamulism
Differential Revision: https://reviews.llvm.org/D110534
As for now, llvm-objcopy renames only sections that are specified
explicitly in --rename-section, while GNU objcopy keeps names of
relocation sections in sync with their targets. For example:
> readelf -S test.o
...
[ 1] .foo PROGBITS
[ 2] .rela.foo RELA
> objcopy --rename-section .foo=.bar test.o gnu.o
> readelf -S gnu.o
...
[ 1] .bar PROGBITS
[ 2] .rela.bar RELA
> llvm-objcopy --rename-section .foo=.bar test.o llvm.o
> readelf -S llvm.o
...
[ 1] .bar PROGBITS
[ 2] .rela.foo RELA
This patch makes llvm-objcopy to match the behavior of GNU objcopy better.
Differential Revision: https://reviews.llvm.org/D110352
The slab allocator is frequently used in -noexec tests where we want a
consistent memory layout. In this context we also want to set the effective
page size, rather than using the page size of the host process, since not all
systems use the same page size. The -slab-page-size option allows us to set
the page size for such tests.
The -slab-page-size option will also be honored in exec mode when using the
slab allocator, but will trigger an error if the requested size is not a
multiple of the actual process page size.
This option was motivated by test failures on a ppc64 bot that was returning
zero from sys::Process::getPageSize(), so it also contains a check for errors
and zero results from that function if the -slab-page-size option is absent.
Existing slab allocator tests will be updated to use this option in a follow-up
commit so that we can point the failing bot at this commit and observe errors
associated with sys::Process::getPageSize().
* Add a newline before `DYNAMIC RELOCATION RECORDS` (see D101796)
* Add the missing `OFFSET TYPE VALUE` line
* Align columns
Note: llvm-readobj/ELFDumper.cpp `loadDynamicTable` has sophisticated PT_DYNAMIC
code which is unavailable in llvm-objdump.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D110595
Similar to https://reviews.llvm.org/D110465, we can compute function size on-demand for the functions that's hit by samples.
Here we leverage the raw range samples' address to compute a set of sample hit function. Then `BinarySizeContextTracker` just works on those function range for the size.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D110466
Previously we do symbolization for all the functions and actually we only need the symbols that's hit by the samples.
This can significantly speed up the time for large size binary.
Optimization for per-inliner will come along with next patch.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110465
The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.
Differential Revision: https://reviews.llvm.org/D107969
This change is to add some missing details to the help text and command
guide:
- Added a note to the command guide that --debug-macro also dumps
.debug_macinfo.
- Added a note to the command guide that --debug-frame and --eh_frame
are aliases, and in cases where both sections are present one command
outputs both.
- Changed the wording in the help output for --ignore-case and --regex to
closer match the command guide.