Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle,
this one is certainly a zero-cycle instruction, in addition to both of them
being dependency breaking.
As confirmed by exegesis measurements, and ref docs.
While both the SOG and Agner insist that it is zero-cycle,
i can not confirm that claim. While it clearly breaks the dependency,
i can not come up with a snippet, or measurement approach,
to end up with IPC bigger than 4, which, to me, means that it actually
consumes execution resource of an FP unit for a cycle.
Much like other LLVM binary utilities, `llvm-cov` has a symlink compatibility feature where it runs in `gcov` compatibility mode if the binary name ends in `gcov`. This is identical to invoking `llvm-cov gcov ...`.
Differential Revision: https://reviews.llvm.org/D102299
`__mh_(execute|dylib|dylinker|bundle|preload|object)_header` are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in `__TEXT`, even though their addresses are outside `__TEXT`, and they do not refer to code.
It is normally harmless, but when the first section of `__TEXT` has no other symbols, `__mh_*_header` is considered by the disassembler when determing function boundaries. Since `__mh_*_header` refers to an address outside `__TEXT`, the boundary determination fails and disassembly quits.
Since `__TEXT,__text` normally has symbols, this bug is obscured. Experiments placing `__stubs` and `__stub_helper` first exposed the bug, since neither has symbols.
Differential Revision: https://reviews.llvm.org/D101786
When making compilation relocatable, for example in distributed
compilation scenarios, we want to set compilation dir to a relative
value like `.` but this presents a problem when generating reports
because if the file path is relative as well, for example `..`, you
may end up writing files outside of the output directory.
This change introduces a flag that allows overriding the compilation
directory that's stored inside the profile with a different value that
is absolute.
Differential Revision: https://reviews.llvm.org/D100232
Originally landed in: 6400905a61
Reverted in: 668dccc396
Fix branch coverage merging in FunctionCoverageSummary::get() for instantiation
groups.
This change corrects the implementation for the branch coverage summary to do
the same thing for branches that is done for lines and regions. That is,
across function instantiations in an instantiation group, the maximum branch
coverage found in any of those instantiations is returned, with the total
number of branches being the same across instantiations.
Differential Revision: https://reviews.llvm.org/D102193
groups.
This change corrects the implementation for the branch coverage
summary to do the same thing for branches that is done for lines and regions.
That is, across function instantiations in an instantiation group, the maximum
branch coverage found in any of those instantiations is returned, with the
total number of branches being the same across instantiations.
Differential Revision: https://reviews.llvm.org/D102193
This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96883
There are cases where a concrete DIE with DW_TAG_subprogram can have
abstract_origin attribute, so we handle that situation as well.
Differential Revision: https://reviews.llvm.org/D101025
As confirmed by exegesis measurements, and ref docs.
It does actually execute.
While there, bump latency for MULX32rr, that seems to match measurements.
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
Sometimes disassembler picks _REV variants of instructions
over the plain ones, which in this case exposed an issue
that the _REV variants aren't being modelled as optimizable moves.
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.
Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.
Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
The dwarfdump command guide shows the short options used as aliases but
these are not found in the help text unless --show-hidden is used.
Investigating other tools some follow this pattern, others like
llvm-objdump show aliases with --help. This change fixes the help output
to be consistent with the command guide. This includes updating alias
descriptions in the help output to use "--".
As part of this change I updated cmdline.test, including some options
that were missing testing.
Differential Revision: https://reviews.llvm.org/D101646
PR50160: we currently ignore non-PT_PHDR segments with no sections, not
accounting for its p_offset and p_filesz: this can cause an out-of-bounds write
in `writeSegmentData` if the p_offset+p_filesz is larger than the total file
size.
This can be fixed by setting p_offset=p_filesz=0. The logic nicely unifies with
the logic added in D90897.
Reviewed By: jhenderson, rupprecht
Differential Revision: https://reviews.llvm.org/D101560
The internal `cl::opt` option --x86-asm-syntax sets the AsmParser and AsmWriter
dialect. The option is used by llc and llvm-mc tests to set the AsmWriter dialect.
This patch adds -M {att,intel} as GNU objdump compatible aliases (PR43413).
Note: the dialect is initialized when the MCAsmInfo is constructed.
`MCInstPrinter::applyTargetSpecificCLOption` is called too late and its MCAsmInfo
reference is const, so changing the `cl::opt` in
`MCInstPrinter::applyTargetSpecificCLOption` is not an option, at least without
large amount of refactoring.
Reviewed By: hoy, jhenderson, thakis
Differential Revision: https://reviews.llvm.org/D101695
Fix PR45416: the diagnostic when '=' is missing is misleading.
`FileOutputBuffer::create` returns successfully when the filename is empty
(the temporary file is `.tmp%%%%%%%`), but `FileOutputBuffer::commit` will error when
renaming `.tmp%%%%%%%` to the empty name).
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D101697
Unwind info generated by MSVC tends to have relocations pointing at
static "label" symbols like "$LN4" instead of regular ones based on
the actual function's name. Try to resolve such symbols to a non-label
symbol if possible (ideally to an external symbol), to improve
the readability.
Differential Revision: https://reviews.llvm.org/D101567
When dumping multiple pieces of information (e.g. --all-headers),
there is sometimes no separator between two pieces.
This patch uses the "\nheader:\n" style, which generally improves
compatibility with GNU objdump.
Note: objdump -t/-T does not add a newline before "SYMBOL TABLE:" and "DYNAMIC SYMBOL TABLE:".
We add a newline to be consistent with other information.
`objdump -d` prints two empty lines before the first 'Disassembly of section'.
We print just one with this patch.
Differential Revision: https://reviews.llvm.org/D101796
Reapply 7368624 after revert and fix
Looking at other tools using tablegen for help output, general options
like --help are not separated from other options. This change removes
the "Generic Options" option group so the options are listed together.
the macho specific option group is left unaffected.
The test help.test was modified to reflect this change.
Differential Revision: https://reviews.llvm.org/D101652
Looking at other tools using tablegen for help output, general options
like --help are not separated from other options. This change removes
the "Generic Options" option group so the options are listed together.
the macho specific option group is left unaffected.
The test help.test was modified to reflect this change.
Differential Revision: https://reviews.llvm.org/D101652
Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`.
This is fully built from scratch, from llvm-mca measurements
and documented reference materials.
Nothing was copied from `znver2`/`znver1`.
I believe this is in a reasonable state of completion for inclusion,
probably better than D52779 `bdver2` was :)
Namely:
* uops are pretty spot-on (at least what llvm-mca can measure)
{F16422596}
* latency is also pretty spot-on (at least what llvm-mca can measure)
{F16422601}
* throughput is within reason
{F16422607}
I haven't run much benchmarks with this,
however RawSpeed benchmarks says this is beneficial:
{F16603978}
{F16604029}
I'll call out the obvious problems there:
* i didn't really bother with X87 instructions
* i didn't really bother with obviously-microcoded/system instructions
* There are large discrepancy in throughput for `mr` and `rm` instructions.
I'm not really sure if it's a modelling defect that needs to be fixed,
or it's a defect of measurments.
* Pipe distributions are probably bad :)
I can't do much here until AMD allows that to be fixed
by documenting the appropriate counters and updating libpfm
That being said, as @RKSimon notes:
>>! In D94395#2647381, @RKSimon wrote:
> I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...>
so how much worse this could possibly be?!
Things that aren't there:
* Various tunings: zero idioms, etc. That is follow-ups.
Differential Revision: https://reviews.llvm.org/D94395
The right symbol flag mask is ~0x7, not ~0xf.
Also emit string names for the other flags (we were missing some).
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D101548
This introduces a flag that aborts if we ever reduce to IR that fails
the verifier.
Reviewed By: swamulism, arichardson
Differential Revision: https://reviews.llvm.org/D101279
Early exit from method DispatchStage::isAvailable() if the dispatch group is
already full. Not all instructions declare at least one uOP.
Fixes PR50174.
When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address
For the latter case, the absolute address can optionally be
described with a symbol.
For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.
Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.
Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.
This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).
This also fixes printing of the function name for packed entries in
linked images.
Relanded with a format string fix in the formatSymbol function; one
can't use %X as format string for an uint64_t. That bug has been
present since this code was added in e6971cab30.
Differential Revision: https://reviews.llvm.org/D100305
This reverts commit 3778924088.
The added test fails on at least one buildbot, by printing a reversed
combination, printing "func3_xdata +0x18 (0x8)" while it's supposed to
be "func3_xdata +0x8 (0x18)", see e.g.
https://lab.llvm.org/buildbot/#/builders/107/builds/7269. Currently
no idea how that could happen, but reverting until it can be figured
out.
When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address
For the latter case, the absolute address can optionally be
described with a symbol.
For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.
Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.
Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.
This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).
This also fixes printing of the function name for packed entries in
linked images.
Differential Revision: https://reviews.llvm.org/D100305
Add support for LC_THREAD/LC_UNIXTHREAD
(these load commands can be copied over without any modifications).
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D101384
Add a flag to change dsymutil's behavior and force a static variable to
keep its enclosing function. The test shows a situation where that could
be useful. I'm not convinced this behavior makes sense as a default,
which is why it's behind a flag.
rdar://74918374
Differential revision: https://reviews.llvm.org/D101337
Previously printing R_386_RELATIVE relocations would trigger
`error: can't read an entry at 0x40: it goes past the end of the section (0x40)`
I found this while writing a test case for LLD (D100490).
This also includes some minor cleanup in the elf-dynamic-relcos.test
llvm-objdump test based on the newly added test.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D100489
This has been rather useful in our downstream CHERI target where we want
to run tests both with addrspace(0) and addrspace(200) pointers.
With this patch we can prefix the opt command with
`sed -e 's/addrspace(200)/addrspace(0)/g' -e 's/-A200-P200-G200//g'` to
test both cases using the same IR input.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95137
Add support for LC_THREAD/LC_UNIXTHREAD
(these load commands can be copied over without any modifications).
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D101384
Currently llvm-dwp only handled DW_FORM_string and DW_FORM_GNU_str_index; with this patch it also starts to handle DW_FORM_strx[1-4]?
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D75485
This primarily parses a different set of options and invokes the same
resource compiler as llvm-rc normally. Additionally, it can convert
directly to an object file (which in MSVC style setups is done with the
separate cvtres tool, or by the linker).
(GNU windres also supports other conversions; from coff object file back
to .res, and from .res or object file back to .rc form; that's not yet
implemented.)
The other bigger complication lies in being able to imply or pass the
intended target triple, to let clang find the corresponding mingw sysroot
for finding include files, and for specifying the default output object
machine format.
It can be implied from the tool triple prefix, like
`<triple>-[llvm-]windres` or picked up from the windres option e.g.
`-F pe-x86-64`. In GNU windres, that option takes BFD style format names
such as pe-i386 or pe-x86-64. As libbfd in binutils doesn't support
Windows on ARM, there's no such canonical name for the ARM targets.
Therefore, as an LLVM specific extension, this option is extended to
allow passing full triples, too.
Differential Revision: https://reviews.llvm.org/D100756
1. Add an accessor function to MCSymbolizer to retrieve addresses
referenced by a symbolizable operand, but not resolved to a symbol.
That way, the caller can synthesize labels at those addresses and
then retry disassembling the section.
2. Implement that in AMDGPU -- a failed symbol lookup results in the
address being added to a vector returned by the new function.
3. Use that in llvm-objdump when using MCSymbolizer (which only happens
on AMDGPU) and SymbolizeOperands is on.
Differential Revision: https://reviews.llvm.org/D101145
Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e
Initial (D96045) patch didn't handle split dwarf cases,
so this fixes that bug.
In addition, before applying this patch, we had a slowdown
that happened after the D96045. With this patch,
the slowdown will be fixed as well.
Differential Revision: https://reviews.llvm.org/D100951
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.
Differential Revision: https://reviews.llvm.org/D100528
Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported.
This patch also fixes two existing issues. Pseudo probes are stored in `Address2ProbesMap` and their pointers are used in `PseudoProbeInlineTree`. Previously `std::vector` was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing `std::vector` to `std::list` instead.
The other issue is that all outlined functions shared the same inline frame previously due to the unchanged `Index` value as the dummy inlineSite identifier.
Good results seen for SPEC2017 in general regarding profile quality.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D100235
Allow opting out from preprocessing with a command line argument.
Update tests to pass -no-preprocess to make it not try to use clang
(which isn't a build level dependency of llvm-rc), but add a test that
does preprocessing under clang/test/Preprocessor.
Update a few options to allow them both joined (as -DFOO) and separate
(-D BR), as rc.exe allows both forms of them.
With the verbose flag set, this prints the preprocessing command
used (which differs from what rc.exe does).
Tests under llvm/test/tools/llvm-rc only test constructing the
preprocessor commands, while tests under clang/test/Preprocessor test
actually running the preprocessor.
Differential Revision: https://reviews.llvm.org/D100755
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.
This doesn't seem to have any effect, but it should be more correct.
Differential Revision: https://reviews.llvm.org/D100123
This implements an LLVM tool that's flag- and output-compatible
with macOS's `otool` -- except for bugs, but from testing with both
`otool` and `xcrun otool-classic`, llvm-otool matches vanilla
otool's behavior very well already. It's not 100% perfect, but
it's a very solid start.
This uses the same approach as llvm-objcopy: llvm-objdump uses
a different OptTable when it's invoked as llvm-otool. This
is possible thanks to D100433.
Differential Revision: https://reviews.llvm.org/D100583
Used to model structural hazards on FP issue, where some
instructions take up 2 issue slots and others one as well
as similar structural hazards on load issue, where some
instructions take up two load lanes and others one.
Differential Revision: https://reviews.llvm.org/D98977
The `e_flags` contains a mixture of bitfields and regular ones, ensure all of them can be serialized and deserialized.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100250
This is similar to D83530, but for llvm-objdump.
The motivation is the desire to add an `llvm-otool` symlink to
llvm-objdump that behaves like macOS's `otool`, using the same
technique the at llvm-objcopy uses to behave like `strip` (etc).
This change for the most part preserves behavior. In some cases,
it increases compatibility with GNU objdump a bit. For example,
the long options now require two dashes, and the long options
taking arguments for the most part now require a `=` in front
of the value. Exceptions are flags where tests passed the
value separately, for these the separate form is kept as
an alias to the = form.
The one-letter short form args are now joined or separate
and long longer accept a =, which also matches GNU objdump.
cl::opt<>s in libraries now have to be explicitly plumbed
through. This patch does that for --x86-asm-syntax=, but
there's hope that we can remove that again.
Differential Revision: https://reviews.llvm.org/D100433
This patch fixed the following issues along side with some refactoring:
1. Fix bugs where StringRef for context string out live the underlying std::string. We now keep string table in profile generator to hold std::strings. We also do the same for bracketed context strings in profile writer.
2. Make sure profile output strictly follow (total sample, name) order. Previously, there's inconsistency between ProfileMap's key and FunctionSamples's name, leading to inconsistent ordering. This is now fixed by introducing context profile canonicalization. Assertions are also added to make sure ProfileMap's key and FunctionSamples's name are always consistent.
3. Enhanced error handling for profile writing to make sure we bubble up errors properly for both llvm-profgen and llvm-profdata when string table is not populated correctly for extended binary profile.
4. Keep all internal context representation bracket free. This avoids creating new strings for context trimming, merging and preinline. getNameWithContext API is now simplied accordingly.
5. Factor out the code for context trimming and merging into SampleContextTrimmer in SampleProf.cpp. This enables llvm-profdata to use the trimmer when merging profiles. Changes in llvm-profgen will be in separate patch.
Differential Revision: https://reviews.llvm.org/D100090
The tests compare IPC statistics that MCA provides with IPC values
measured on Cortex-A55 hardware. For hardware tests, each snippet is
run in a loop unrolled by 1000, and IPC is measured by linux-perf.
Several tests do not match the hardware: the skewed ALU is not
supported, LDR seem to be missing a forwarding path.
Differential Revision: https://reviews.llvm.org/D98174
Clang spends a decent amount of time in the LineOffsetMapping::get(...)
function. This function used to be vectorized (through SSE2) then the
optimization got dropped because the sequential version was on-par performance
wise.
This provides an optimization of the sequential version that works on a word at
a time, using (documented) bithacks to provide a portable vectorization.
When preprocessing the sqlite amalgamation, this yields a sweet 3% speedup.
Differential Revision: https://reviews.llvm.org/D99409
Consider the .debug_pubnames and .debug_pubtypes their own kind of
accelerator and stop emitting them together with the Apple-style
accelerator tables. The only reason we were still emitting both was for
(byte-for-byte) compatibility with dsymutil-classic.
- This patch adds a new accelerator table kind "Pub" which can be
specified with --accelerator=Pub.
- This patch removes the ability to emit both pubnames/types and apple
style accelerator tables. I don't think anyone is relying on that but
it's worth pointing out.
- This patch removes the --minimize option and makes this behavior the
default. Specifying the flag will result in a warning but won't abort
the program.
Differential revision: https://reviews.llvm.org/D99907
This way, once there's an error in the snippet file (like in the test),
llvm-exegesis won't crash with an assertion failure,
but print a nice diagnostic about the problem.
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.
Also make sure that if Warning() returns true, we always treat it as an error.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92504
Make variables and text-macro references case-insensitive, to match ml.exe.
Also improve error handling for text-macro expansion.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92503
Encountered a crash while running a debug build, where this code path would be taken due to a mismatch in profile coverage data versions. Without consuming the error, an assert would be triggered inside the destructor of Error.
Differential Revision: https://reviews.llvm.org/D99457
dsymutil is not relocating the DW_AT_low_pc for a DW_TAG_label. This
patch fixes that and adds a test.
Differential revision: https://reviews.llvm.org/D99534
This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen.
It will serve two purposes:
1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size.
2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation.
Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler.
This change only has the framework, with a few TODOs left for follow up patches for a complete implementation:
1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation.
2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR.
Differential Revision: https://reviews.llvm.org/D99146
Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.
The patch fixes PR49712.
Differential Revision: https://reviews.llvm.org/D99339
The option `--prefix-strip` is only used when `--prefix` is not empty.
It removes N initial directories from absolute paths before adding the
prefix.
This matches GNU's objdump behavior.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96679
This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order
When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.
Differential Revision: https://reviews.llvm.org/D98628
This patch renames the "Initial" member of WasmLimits to the name used
in the spec, "Minimum".
In the core WebAssembly specification, the Limits data type has one
required "min" member and one optional "max" member, indicating the
minimum required size of the corresponding table or memory, and the
maximum size, if any.
Although the WebAssembly spec does instantiate locally-defined tables
and memories with the initial size being equal to the minimum size, it
can't impose such a requirement for imports. It doesn't make sense to
require an initial size for a memory import, for example. The compiler
can only sensibly express the minimum and maximum sizes.
See
https://github.com/WebAssembly/js-types/blob/master/proposals/js-types/Overview.md#naming-of-size-limits
for a related discussion that agrees that the right name of "initial" is
"minimum" when querying the type of a table or memory from JavaScript.
(Of course it still makes sense for JS to speak in terms of an initial
size when it explicitly instantiates memories and tables.)
Differential Revision: https://reviews.llvm.org/D99186
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.
Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.
Differential Revision: https://reviews.llvm.org/D99187
Before this patch, register writes were always invalidated by the
RegisterFile at instruction commit stage. So,
the RegisterFile was often losing the knowledge about the `execute
cycle` of writes already committed. While this was not problematic
for non-delayed reads, this was sometimes leading to inaccurate read
latency computations in the presence of negative read-advance cycles.
This patch fixes the issue by changing how the RegisterFile component
internally keeps track of the `execute cycle` information of each
write. On every instruction executed, the RegisterFile gets notified
by the RetireStage, so that it can internally record the execute
cycle of each executed write.
The `execute cycle` information is stored within WriteRef itself, and
it is not invalidated when the write is committed.
Exclude AArch64 mapping symbols ($x and $d) for symtab symbolization as
it was done for ARM since D95916 tom bring bots back to green state.
This is implemented by setting SF_FormatSpecific such that
llvm-symbolizer will ignore them, and use this flag to re-implement
llvm-nm --special-syms option which make it work for both targets.
Differential Revision: https://reviews.llvm.org/D98803
This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack.
This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D96918
Fix spurious warnings for missing symbols with thinLTO. The latter
appends a unique suffix to avoid collisions for exported private
symbols, resulting in dsymutil complaining it couldn't find the symbol
in the object file.
rdar://75434058
Differential revision: https://reviews.llvm.org/D99125
Switch to use cold threshold from profile summary for cold context merging and trimming, instead of relying on hard coded values. Minor refactoring included for switch names, etc.
Differential Revision: https://reviews.llvm.org/D98921
This is a similarity visualization tool that accepts a Module and
passes it to the IRSimilarityIdentifier. The resulting SimilarityGroups
are output in a JSON file.
Tests are found in test/tools/llvm-sim and check for the file not found,
a bad module, and that the JSON is created correctly.
Reviewers: paquette, jroelofs, MaskRay
Recommit of: 15645d044b to fix linking
errors.
Differential Revision: https://reviews.llvm.org/D86974
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.
This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.
Differential Revision: https://reviews.llvm.org/D98823
Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it.
Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding.
Differential Revision: https://reviews.llvm.org/D98226
This allows to check for various globals (metadata/attributes/...) and
also resolves problems with globals (metadata/attributes/...) being
reused across different prefixes.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D94741
The "Inputs" subdirectory is used for all files read by the test, not
only those used as input to the execution - so even though this file is
used as a golden reference for the output of the test, it's still an
input to the test execution (it is read in the process of executing the
test).
This diff introduces --keep-undefined in llvm-objcopy/llvm-strip for Mach-O
which makes the tools preserve undefined symbols.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D97040
Context-sensitive AutoFDO profile has a different name scheme where full calling contexts are encoded as function names. When processing CS proifle, llvm-profdata should use full contexts instead of leaf function names.
Reviewed By: wmi, wenlei, wlei
Differential Revision: https://reviews.llvm.org/D97998
This patch uses the errno python library to print out the correct error messages instead of hardcoding the error message per platform.
Reviewed By: jhenderson, ASDenysPetrov
Differential Revision: https://reviews.llvm.org/D97472
The code was using the standard isalnum function which doesn't handle
values outside the non-ascii range. Switching to using llvm::isAlnum
instead ensures we don't provoke undefined behaviour, which can in some
cases result in crashes.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D97663
The test was showing that when --strip-unneeded is specified for an
executable, all the symbols are stripped. However, the set of symbols
used in the test would be stripped by --strip-unneeded for an ET_REL
object too. Fix this by adding additional symbols that aren't normally
stripped by --strip-unneeded.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D97664
This change adds '-use-interfacestub' option to allow llvm-ifs
to use InterfaceStub lib when generating ELF binary.
Differential Revision: https://reviews.llvm.org/D94461
This patch adds a number of new test cases that cover various
llvm-objcopy and llvm-strip features that had missing test coverage of
various descriptions:
* --add-section - checked the shdr properties, not just the content.
* Dedicated test case for --add-symbol when there are many sections.
* Show that --change-start accepts negative values without overflow.
This was previously present but got lost between review versions.
* --dump-section - show that multiple sections can be dumped
simultaneously to different files, and that an error is reported when
a section cannot be found.
* --globalize-symbol(s) - show that symbols that are not mentioned are
not globalized, if they would otherwise be, and that missing symbols
from the list do not cause problems.
* --keep-global-symbol - show that the --regex option can be used in
conjunction with this option.
* --keep-symbol - show that the --regex option can be used in
conjunction with this option.
* --localize-symbol(s) - show that symbols that are not mentioned are
not localized, if they would otherwise be, and that missing symbols
from the list do not cause problems.
* --prefix-alloc-sections - show the behaviour of an empty string
argument and multiple arguments.
* --prefix-symbols - show the behaviour of an empty string argument and
multiple arguments. Also show the option applies to undefined symbols.
* --redefine-symbol - show that symbols with no name can be renamed,
that it is not an error if a symbol is not specified, and that the
option doesn't chain (i.e. --redefine-sym a=b --redefine-sym b=c does
not redefine a as c).
* --rename-section - show that all section flags are preserved if none
are specified. Also show that the option does not chain.
* --set-section-alignment - show that only specified sections have
their alignments changed.
* --set-section-flags - show which section flags are preserved when this
option is used. Also show that unspecified sections are not affected.
* --preserve-dates - show that -p is an alias of --preserve-dates.
* --strip-symbol - show that --regex works with this option for
llvm-objcopy as well as llvm-strip.
* --strip-unneeded-symbol(s) - show more clearly that needed symbols are
not stripped even if requested by this option.
* --allow-broken-links - show the sh_link of a symbol table is set to 0
when its string table has been removed when this option is specified.
* --weaken-symbol(s) - show that symbols that are not mentioned are not
weakened, if they would otherwise be, and that missing symbols from
the list do not cause problems.
* --wildcard - show the wildcard behaviour for several options that were
previously unchecked.
Reviewed by: alexshap
Differential Revision: https://reviews.llvm.org/D97666
llvm-objdump only uses one MCInstrAnalysis object, so if ARM and Thumb
code is mixed in one object, or if an object is disassembled without
explicitly setting the triple to match the ISA used, then branch and
call targets will be printed incorrectly.
This could be fixed by creating two MCInstrAnalysis objects in
llvm-objdump, like we currently do for SubtargetInfo. However, I don't
think there's any reason we need two separate sub-classes of
MCInstrAnalysis, so instead these can be merged into one, and the ISA
determined by checking the opcode of the instruction.
Differential revision: https://reviews.llvm.org/D97766
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.
In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.
Differential Revision: https://reviews.llvm.org/D94928
The check for whether an extended symbol index table was required
dropped the first SHN_LORESERVE sections from the sections array before
checking whether the remaining sections had symbols. Unfortunately, the
null section header is not present in this list, so the check was
skipping the first section that might be important. If that section
contained a symbol, and no subsequent ones did, the .symtab_shndx
section would not be emitted, leading to a corrupt object.
Also consolidate and expand test coverage in the area to cover this bug
and other aspects of the SYMTAB_SHNDX section.
Reviewed by: alexshap, MaskRay
Differential Revision: https://reviews.llvm.org/D97661
Additionally do some test tidy-ups and improve coverage of symbol
section indexes where the logical section index >= SHN_LORESERVE.
The symbol and section names in the many-section input object were
mostly shared. This patch changes them to be distinct, enabling
different operations such as --add-symbol, to be more targeted, when
using the object. It also makes the test less confusing and removes some
oddness in the symbol table order, presumably caused by the duplicate
names.
The input object was built from assembly that was of the form:
.section s1
sym1:
.section s2
sym2:
...
with a total of 65536 such occurrences. llvm-objcopy was then used to
remove the empty .text section automatically generated by MC, and
incidentally to move .strtab to the end of the object. This ensured that
the section/symbol indexes matched their name (i.e. section index 1 was
s1, section index 2 was s2 etc, and sym1 was in s1, sym2 in s2 etc).
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D97660
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.
To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.
If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D95962
IR symbol table does not parse inline asm. A symbol only referenced by inline
asm is not in the IR symbol table, so LTO does not know that the definition (in
another translation unit) is referenced and may internalize it, even if that
definition has `__attribute__((used))` (which lowers to `llvm.compiler.used` on
ELF targets since D97446).
```
// cabac.c
__attribute__((used)) const uint8_t ff_h264_cabac_tables[...] = {...};
// h264_cabac.c
asm("lea ff_h264_cabac_tables(%rip), %0" : ...);
```
`__attribute__((used))` is the recommended way to tell the compiler there may
be inline asm references, so the usage is perfectly fine. This patch
conservatively sets the `FB_used` bit on `llvm.compiler.used` symbols to work
around the IR symbol table limitation. Note: before D97446, Clang never emitted
symbols in the `llvm.compiler.used` list, so this change does not punish any
Clang emitted global object.
Without the patch, `ff_h264_cabac_tables` may be assigned to a non-external
partition and get internalized. Then we will get a linker error because the
`cabac.c` definition is not exposed.
Differential Revision: https://reviews.llvm.org/D97755
See original comment in 560ce2c70f
Baiscally the default seed value results in less collision, but changes the
iteration order, which matters for a few test cases.
Differential Revision: https://reviews.llvm.org/D97396
This makes the behavior similar to cp
```
chmod u+s,g+s,o+x a
sudo llvm-strip a -o b
// With this patch, b drops set-user-ID and set-group-ID bits.
// sudo cp a b => b does not have set-user-ID or set-group-ID bits.
```
This also changes the behavior for the following case:
```
chmod u+s,g+s,o+x a
llvm-strip a
// a preserves set-user-ID and set-group-ID bits.
// This matches binutils<2.36 and probably >=2.37. 2.36 and 2.36.1 have some compatibility issues.
```
Differential Revision: https://reviews.llvm.org/D97253
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.
Differential Revision: https://reviews.llvm.org/D92074
As discussed in D95511, this allows us to encode invalid BBAddrMap
sections to be used in more rigorous testing.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96831
The current getFoldedSizeOf() implementation uses naive recursion, which
could be really slow when the input structure type is too complex.
This issue was first brought up in
http://llvm.org/bugs/show_bug.cgi?id=8281; this change fixes it by
adding memoization.
Differential Revision: https://reviews.llvm.org/D6594
This patch fixed a bug when elbabi was supplied with a tbe file
contains no non-local symbol. Before this patch, it wrote 0 to
sh_info of the .dynsym section, making the ELF stub file invalid.
This patch fixed this issue.
Differential Revision: https://reviews.llvm.org/D96930
The presence or absence of an inline variable (as well as formal
parameter) with only an abstract_origin ref (without DW_AT_location)
should not change the location coverage.
It means, for both:
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000004e "f")
DW_AT_low_pc (0x0000000000000010)
DW_AT_high_pc (0x0000000000000013)
DW_TAG_formal_parameter
DW_AT_abstract_origin (0x0000005a "b")
and,
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000004e "f")
DW_AT_low_pc (0x0000000000000010)
DW_AT_high_pc (0x0000000000000013)
we should report 0% location coverage. If we add DW_AT_location,
for both cases the coverage should be improved.
Differential Revision: https://reviews.llvm.org/D96045
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.
Differential Revision: https://reviews.llvm.org/D96993
Some instructions defined in table-gen files sets usesCustomInserter
bit, which means it has to be lowered by target code and isn't actually
valid instruction at MC level. So we should treat them like pseudo
instructions.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D94898
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
ST_Data is used to model BFD `BFD_OBJECT`.
A STT_TLS symbol does not have the `BFD_OBJECT` flag in BFD.
This makes sense because a STT_TLS symbol is like in a different address space,
normal data/object properties do not apply on them.
With this change, a STT_TLS symbol will not be displayed as 'O'.
This new behavior matches objdump.
Differential Revision: https://reviews.llvm.org/D96735
As discussed in D95511, this allows us to encode invalid BBAddrMap
sections to be used in more rigorous testing.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96831
lld already marks shared library defs as ExportDynamic, which prevents
potentially unsafe devirtualization of symbols defined in shared
libraries. Match that behavior in the gold plugin, and add the same
test.
Depends on D96721.
Differential Revision: https://reviews.llvm.org/D96722
1. Emit warnings for files without symbols.
2. Add -no_warning_for_no_symbols.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D95843
When the `DWOPath` is absolute, we want to use `DWOPath` as is, without prepending any other
components to the path. The `sys::path::append` does not join, but rather unconditionally appends
the paths, so something like `sys::path::append("/tmp", "/tmp/banana")` will result in
`/tmp/tmp/banana` rather than the desired `/tmp/banana`.
This then causes `llvm-dwp` to fail in a following situation:
```
$ clang -gsplit-dwarf /tmp/banana/test.c -c -o /tmp/outdir/foo.o
$ clang outdir/foo.o -o outdir/hm
$ llvm-dwarfdump outdir/hm | grep -C2 foo.dwo
DW_AT_comp_dir ("/tmp")
DW_AT_GNU_pubnames (true)
DW_AT_GNU_dwo_name ("/tmp/outdir/foo.dwo")
DW_AT_GNU_dwo_id (0xde4d396f3bf0e257)
DW_AT_low_pc (0x0000000000401100)
$ strace -o trace llvm-dwp -e outdir/hm -o outdir/hm.dwp
error: No such file or directory
$ cat trace | grep foo.dwo
openat(AT_FDCWD, "/tmp/tmp/outdir/foo.dwo", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
```
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D96678
These directives force the associated address to be interpreted as a
function or data respectively. CODE is the default when not specified.
Differential Revision: https://reviews.llvm.org/D96712
Reviewed by: MaskRay
The few options are niche. They solved a problem which was traditionally solved
with more shell commands (`llvm-readelf -n` fetches the Build ID. Then
`ln` is used to hard link the file to a directory derived from the Build ID.)
Due to limitation, they are no longer used by Fuchsia and they don't appear to
be used elsewhere (checked with Google Search and Debian Code Search). So delete
them without a transition period.
Announcement: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148446.html
Differential Revision: https://reviews.llvm.org/D96310
Some of these options have a degree of incidental coverage, or are for
Mach-O only. This patch adds dedicated ELF (where applicable) coverage.
Differential Revision: https://reviews.llvm.org/D96602
Reviewed by: rupprecht, Higuoxing
This adds colons to separate the file name from the message, removes a
duplicate space, and removes a trailing full stop from some messages.
These help bring the error messages into line with other tools, as well
as making all llvm-nm message more self-consistent.
Differential Revision: https://reviews.llvm.org/D96601
Reviewed by: Higuoxing, rupprecht, MaskRay
This version of the patch includes a fix for the cfi failures.
(undoes the revert commit 7db390cc77)
It also undoes reverts of follow-up patches that also needed reverting
originally:
* [LTO] Add option enable NewPM with LTOCodeGenerator.
(undoes revert commit 0a17664b47)
* [LTOCodeGenerator] Use lto::Config for options (NFC)."
(undoes revert commit b0a8e41cff)
The new tests added by 1487747e99 for lld
and gold plugin were largely equivalent, but the gold one was missing
one of the cases added to lld. Add that test to the gold plugin version.
It appears some instructions doesn't have the debug location info and the symbolizer will return an empty call stack for them which will cause some crash later in profile unwinding. Actually we do not record the sample info for them, so this change just filter out those instruction.
As those instruction would appears at the begin and end of the instruction list, without them we need to add the boundary check for IP `advance` and `backward`.
Also for pseudo probe based profile, we actually don't need the symbolized location info, so here just change to use an empty stack for it. This could save half of the binary loading time.
Differential Revision: https://reviews.llvm.org/D96434
This include some changes related with PerfReader's the input check and command line change:
1) It appears there might be thousands of leading MMAP-Event line in the perfscript for large workload. For this case, the 4k threshold is not eligible to determine it's a hybrid sample. This change renovated the `isHybridPerfScript` by going through the script without threshold limitation checking whether there is a non-empty call stack immediately followed by a LBR sample. It will stop once it find a valid one.
2) Added several input validations for the command line switches in PerfReader.
3) Changed the command line `show-disassembly` to `show-disassembly-only`, it will print to stdout and exit early which leave an empty output profile.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D96387
The gold LTO plugin uses a set of hooks to implements emit-llvm and capture intermediate file generated during LTO. The hooks are called by each lto backend thread with a taskID as argument to differentiate between threads and tasks. Currently, all threads are overwriting the same file which results into only the intermediate output of the last backend thread to be preserved. This diff encodes the taskID into the filename.
Reviewed By: tejohnson, wenlei
Differential Revision: https://reviews.llvm.org/D96173
To align with https://reviews.llvm.org/D95547, we need to add brackets for context id before initializing the `SampleContext`.
Also added test cases for extended binary format from llvm-profgen side.
Differential Revision: https://reviews.llvm.org/D95929
Merging directories and files may produce different results on different
platforms.
Merging "./Inputs" and "source-interleave-x86_64.c" will use different
separators in POSIX and Windows.
Dedicated tests are needed for dealing with removing trailing separators
for POSIX (consider only '/') and Windows (consider '/' and '\').
Fixes D85024.
Fixes PR46368.
Reviewed By: jhenderson, MaskRay
Differential revision: https://reviews.llvm.org/D95513
The current support only printed coredump notes, but most binaries also
contain notes. This change adds names for four FreeBSD-specific notes and
pretty-prints three of them:
NT_FREEBSD_ABI_TAG:
This note holds a 32-bit (decimal) integer containing the value of the
__FreeBSD_version macro, which is defined in crt1.o and will hold a value
such as 1300076 for a binary build on a FreeBSD 13 system.
NT_FREEBSD_ARCH_TAG:
A string containing the value of the build-time MACHINE_ARCH
NT_FREEBSD_FEATURE_CTL: A 32-bit flag that indicates to the kernel that
the binary wants certain bevahiour. Examples include setting
NT_FREEBSD_FCTL_ASLR_DISABLE which tells the kernel to disable ASLR.
After this change llvm-readobj also no longer decodes coredump-only
FreeBSD notes in non-coredump files. I've also converted the
note-freebsd.s test to use yaml2obj instead of llvm-mc.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D74393
Currently, if the note name is known, but the value isn't we don't print
the contents.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D74367
when we skip the call stack starting with an external address, we should also skip the bottom LBR entry, otherwise it will cause a truncated context issue.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D95480
This change allows merging and trimming cold context profile in llvm-profgen to solve profile size bloat problem. Currently when the profile's total sample is below threshold(supported by a switch), it will be considered cold and merged into a base context-less profile, which will at least keep the profile quality as good as the baseline(non-cs).
For example, two input profiles:
[main @ foo @ bar]:60
[main @ bar]:50
Under threshold = 100, the two profiles will be merge into one with the base context, get result:
[bar]:110
Added two switches:
`--csprof-cold-thres=<value>`: Specified the total samples threshold for a context profile to be considered cold, with 100 being the default. Any cold context profiles will be merged into context-less base profile by default.
`--csprof-keep-cold`: Force profile generation to keep cold context profiles instead of dropping them. By default, any cold context will not be written to output profile.
Results:
Though not yet evaluating it with the latest CSSPGO, our internal branch shows neutral on performance but significantly reduce the profile size. Detailed evaluation on llvm-profgen with CSSPGO will come later.
Differential Revision: https://reviews.llvm.org/D94111
Warnings have been added for three cases (PR41905): (1) missing debug info, (2)
the source file cannot be found, (3) the debug info points at a line beyond the
end of the file.
(1) is probably less useful. This was brought up once on
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141264.html and two
internal users mentioned it to me that it was annoying. (I personally
find the warning confusing, too.)
Users specify --source to get additional information if sources happen to be
available. If sources are not available, it should be obvious as the output
will have no interleaved source lines. The warning can be especially annoying
when using llvm-objdump -S on a bunch of files.
This patch drops the warning when there is no debug info.
(If LLVMSymbolizer::symbolizeCode returns an `Error`, there will still be
an error. There is currently no test for an `Error` return value.
The only code path is probably a broken symbol table, but we probably already emit a warning
in that case)
`source-interleave-prefix.test` has an inappropriate "malformed" test - the test simply has no
.debug_* because new llc does not produce debug info when the filename is empty (invalid).
I have tried tampering the header of .debug_info/.debug_line but llvm-symbolizer does not warn.
This patch does not intend to add the missing test coverage.
Differential Revision: https://reviews.llvm.org/D88715
This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]
Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.
Differential Revision: https://reviews.llvm.org/D93556
This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]
Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.
Differential Revision: https://reviews.llvm.org/D93556
This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.
Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.
**Implementation**
- Extended `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- `populateBodySamplesWithProbes` reading range counter is responsible for recording function body samples and inferring caller's body samples.
- `populateBoundarySamplesWithProbes` reading branch counter is responsible for recording call site target samples.
- Each sample is recorded with its calling context(named `ContextId`). Remind that the probe based context key doesn't include the leaf frame probe info, so the `ContextId` string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
- Added regression test
Test Plan:
ninja & ninja check-llvm
Differential Revision: https://reviews.llvm.org/D92998
On z/OS, other error messages are not matched correctly in lit tests.
```
EDC5121I Invalid argument.
EDC5111I Permission denied.
```
This patch adds a lit substitution to fix it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D95808
In binutils, the flag is defined for ELFOSABI_GNU and ELFOSABI_FREEBSD.
It can be used to mark a section as a GC root.
In practice, the flag has generic semantics and can be applied to many
EI_OSABI values, so we consider it generic.
Differential Revision: https://reviews.llvm.org/D95728
For x86-64 the REX.w prefix takes precedence over any other size
override (i.e. 0x66). Therefore, for x86-64 when REX.w is present set
'hasOpSize' to false to ensure that any size override is ignored.
Fixes PR48901.
Differential Revision: https://reviews.llvm.org/D95682
The switch controls both unused prefix warnings, and warnings about
functions which differ under different runs for a prefix, and, thus, end
up not having asserts for that prefix.
(If the latter case spans to all functions, then the former case kicks
in)
The switch is on by default, and can be disabled.
Differential Revision: https://reviews.llvm.org/D95829
This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata.
Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D95767
This is consistent with BFD objcopy.
Previously llvm objcopy would allocate space for SHT_NOBITS sections
often resulting in enormous binary files.
New test case (binary-paddr.test %t6).
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D95569
Part of the gold test added in 1487747e99
relies on more recent fixes to gold that fix the plugin behavior with
--export-dynamic-symbol and --dynamic-list. Extract those parts of the
new test into a v1.16 test.
Current dsymutil implementation of hasLiveMemoryLocation()/hasLiveAddressRange()
and applyValidRelocs() assume that calls should be done in certain order
(from first Dies to last). Multi-thread implementation might call these methods
in other order(it might process compilation units in order other than they are physically
located), so we remove restriction that searching for relocations should be done
in ascending order. This change does not introduce noticable performance degradation.
The testing results for clang binary:
golden-dsymutil/dsymutil 23787992
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
build-Release/bin/dsymutil 23855616
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
Differential Revision: https://reviews.llvm.org/D93106
Fixes https://bugs.llvm.org/show_bug.cgi?id=48882.
If the input file does not exist (or has a reading error), the
following code will crash if there are two or more input addresses.
```
auto ResOrErr = Symbolizer.symbolizeInlinedCode(
ModuleName, {Offset, object::SectionedAddress::UndefSection});
Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get().getFrame(0));
```
For the first address, `symbolizeInlinedCode` returns an error.
For the second address, `symbolizeInlinedCode` returns an empty result
(not an error) and `.getFrame(0)` will crash.
Differential revision: https://reviews.llvm.org/D95609
This patch updates LTOCodeGenerator to use the utilities provided by
LTOBackend to run middle-end optimizations and backend code generation.
This is a first step towards unifying the code used by libLTO's C API
and the newer, C++ interface (see PR41541).
The immediate motivation is to allow using the new pass manager when
doing LTO using libLTO's C API, which is used on Darwin, among others.
With the changes, there are no codegen/stats differences when building
MultiSource/SPEC2000/SPEC2006 on Darwin X86 with LTO, compared
to without the patch.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D94487
Compact unwind entries have 8 bits for the encoding-table offset:
* offsets 0..126 reference the global commmon-encodings table, while
* offsets 127..255 reference a per-second-level-page table.
This diff teaches `llvm-objdump` to print this per-page encodings table.
Differential Revision: https://reviews.llvm.org/D93265
On z/OS, the following error message is not matched correctly in lit tests.
```
EDC5129I No such file or directory.
```
This patch uses a lit config substitution to check for platform specific error messages.
Reviewed By: muiez, jhenderson
Differential Revision: https://reviews.llvm.org/D95246
Fixes https://bugs.llvm.org/show_bug.cgi?id=43543
Currently we report "The file was not recognized as a valid object file" for BC files.
Also, we terminate dumping.
Instead we could report a better warning and try to continue dumping other files.
This is what this patch implements.
Differential revision: https://reviews.llvm.org/D95605
This patch adds the ability to evaluate the state machine for CIE and FDE unwind objects and produce a UnwindTable with all UnwindRow objects needed to unwind registers. It will also dump the UnwindTable for each CIE and FDE when dumping DWARF .debug_frame or .eh_frame sections in llvm-dwarfdump or llvm-objdump. This allows users to see what the unwind rows actually look like for a given CIE or FDE instead of just seeing a list of opcodes.
This patch adds new classes: UnwindLocation, RegisterLocations, UnwindRow, and UnwindTable.
UnwindLocation is a class that describes how to unwind a register or Call Frame Address (CFA).
RegisterLocations is a class that tracks registers and their UnwindLocations. It gets populated when parsing the DWARF call frame instruction opcodes for a unwind row. The registers are mapped from their register numbers to the UnwindLocation in a map.
UnwindRow contains the result of evaluating a row of DWARF call frame instructions for the CIE, or a row from a FDE. The CIE can produce a set of initial instructions that each FDE that points to that CIE will use as the seed for the state machine when parsing FDE opcodes. A UnwindRow for a CIE will not have a valid address, whille a UnwindRow for a FDE will have a valid address.
The UnwindTable is a class that contains a sorted (by address) vector of UnwindRow objects and is the result of parsing all opcodes in a CIE, or FDE. Parsing a CIE should produce a UnwindTable with a single row. Parsing a FDE will produce a UnwindTable with one or more UnwindRow objects where all UnwindRow objects have valid addresses. The rows in the UnwindTable will be sorted from lowest Address to highest after parsing the state machine, or an error will be returned if the table isn't sorted. To parse a UnwindTable clients can use the following methods:
static Expected<UnwindTable> UnwindTable::create(const CIE *Cie);
static Expected<UnwindTable> UnwindTable::create(const FDE *Fde);
A valid table will be returned if the DWARF call frame instruction opcodes have no encoding errors. There are a few things that can go wrong during the evaluation of the state machine and these create functions will catch and return them.
Differential Revision: https://reviews.llvm.org/D89845
A simple refactoring patch which let us use `DataExtractor::getSLEB128` rather than using a lambda function.
Differential Revision: https://reviews.llvm.org/D95158
Currently we don't allow the following definition:
```
Sections:
- Type: SectionHeaderTable
- Name: .foo
Type: SHT_PROGBITS
```
We report an error: "SectionHeaderTable can't be empty. Use 'NoHeaders' key to drop the section header table".
It was implemented in this way earlier, when `SectionHeaderTable`
was a dedicated key outside of the `Sections` list. And we did not
allow to select where the table is written.
Currently it makes sense to allow it, because a user might
want to place the default section header table at an arbitrary position,
e.g. before other sections. In this case it is not convenient and error prone
to require specifying all sections:
```
Sections:
- Type: SectionHeaderTable
Sections:
- Name: .foo
- Name: .strtab
- Name: .shstrtab
- Name: .foo
Type: SHT_PROGBITS
```
This patch allows empty SectionHeaderTable definitions.
Differential revision: https://reviews.llvm.org/D95341
This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D95547
Identify dynamically exported symbols (--export-dynamic[-symbol=],
--dynamic-list=, or definitions needed to preempt shared objects) and
prevent their LTO visibility from being upgraded.
This helps avoid use of whole program devirtualization when there may
be overrides in dynamic libraries.
Differential Revision: https://reviews.llvm.org/D91583
Imported functions and variable get the visibility from the module supplying the
definition. However, non-imported definitions do not get the visibility from
(ELF) the most constraining visibility among all modules (Mach-O) the visibility
of the prevailing definition.
This patch
* adds visibility bits to GlobalValueSummary::GVFlags
* computes the result visibility and propagates it to all definitions
Protected/hidden can imply dso_local which can enable some optimizations (this
is stronger than GVFlags::DSOLocal because the implied dso_local can be
leveraged for ELF -shared while default visibility dso_local has to be cleared
for ELF -shared).
Note: we don't have summaries for declarations, so for ELF if a declaration has
the most constraining visibility, the result visibility may not be that one.
Differential Revision: https://reviews.llvm.org/D92900
Before this change, when reading ELF file, elfabi determines number of
entries in .dynsym by reading the .gnu.hash section. This change makes
elfabi read section headers directly first. This change allows elfabi
works on ELF files which do not have .gnu.hash sections.
Differential Revision: https://reviews.llvm.org/D93362
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
We already set the `sh_entsize` field in a single place
for all non-implicit sections.
This patch reorders the logic slightly and with it
we finally have the only one place where the `sh_entsize` is set.
obj2yaml will not dump the `EntSize` key for `SHT_DYNSYM/SHT_SYMTAB` sections anymore,
when the value of `sh_entsize` is equal to `sizeof(Elf_Sym)`
Note that this also seems revealed an issue in llvm-objcopy:
Previously yaml2obj set the `sh_entsize` for the `.symtab` section to 0x18,
now we it sets it for `SHT_SYMTAB` sections, i.e. by type.
But the `llvm-objcopy/ELF/only-keep-debug.test` has a `.symtab` section of type `SHT_STRTAB`,
and now yaml2obj sets the `sh_entsize` to 0 for it.
I had to update the corresponding check lines for `ES`, but the behavior of
`llvm-objcopy` should be fixed instead I think.
I've added a TODO and a comment.
Differential revision: https://reviews.llvm.org/D95364
A default version (@@) is only available for defined symbols.
Currently we use "@@" for undefined symbols too.
This patch fixes the issue and improves our test case.
Differential revision: https://reviews.llvm.org/D95219
The llvm-dwp tool hard-codes the target triple to x86. Instead, deduce the
target triple from the object files being read.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93749
This was discussed in D93678 thread.
Currently we have one special chunk - Fill.
This patch re implements the "SectionHeaderTable" key to become a special chunk too.
With that we are able to place the section header table at any location,
just like we place sections.
Differential revision: https://reviews.llvm.org/D95140
In c042aff886, unused FileCheck prefixes became an error, which exposed some testing bugs in four exegesis tests. I've tried my best to either fix the testing bugs, or expand the testing to cover more scenarios.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D95287
On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match the end period successfully.
```
EDC5129I No such file or directory.
```
Differential Revision: https://reviews.llvm.org/D94239
This makes the following improvements.
For `SHT_GNU_versym`:
* yaml2obj: set `sh_link` to index of `.dynsym` section automatically.
For `SHT_GNU_verdef`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version definitions.
For `SHT_GNU_verneed`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version dependencies.
Also, simplifies few test cases.
Differential revision: https://reviews.llvm.org/D94956
On z/OS, the error message "EDC5111I Permission denied." is not matched correctly in lit tests. This patch updates the check expression to match successfully.
Differential Revision: https://reviews.llvm.org/D94432
This patch handles cases where we have to save/restore the link register
into the stack and and load/store instruction which use the stack are
part of the outlined region. It checks that there will be no overflow
introduced by the new offset and fixup these instructions accordingly.
Differential Revision: https://reviews.llvm.org/D92934
On z/OS, the following error message is not matched correctly in lit tests. This patch updates the CHECK expression to match successfully.
```
EDC5129I No such file or directory.
```
Reviewed By: muiez
Differential Revision: https://reviews.llvm.org/D94239