This change adds tests specifically for --parent-recurse-depth, --quiet
and -o. The test for -o found a typo in an error message which is also
fixed in this change.
Differential Revision: https://reviews.llvm.org/D103250
Summary: Under the option --section-headers, we can only
print the section types of TEXT, DATA, and BSS for now.
This patch adds the DEBUG type.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D102603
In objdump, many targets support `-M no-aliases`. Instead of having a
`-*-no-aliases` for each target when LLVM adds the support, it makes more sense
to introduce objdump style `-M`.
-riscv-arch-reg-names is removed. -riscv-no-aliases has too many uses and thus is retained for now.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D103004
Conservatively use the instruction latency to compute the last write-back cycle.
Before this patch, the last write cycle computation was incorrect for store
instructions that didn't declare any register writes.
Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate.
Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.
Match whats documented in the Intel AOM - the non-immediate variants of the PSLL*/PSRA*/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
llvm-profgen uses profile summary based cold threshold to merge and trim cold context profile. This is to strike a good balance between profile size and performance.
We've been using 99.9% as the cutoff to save profile size without affecting performance. This change switch to use 99.9% instead of 99.9999% as default cold threshold cutoff for llvm-profgen.
Redundant switch csprof-cold-thres is also removed and tests cleaned up.
Differential Revision: https://reviews.llvm.org/D103071
The parseInputFile function returns an empty unique_ptr to signal an
error, like when the input file doesn't exist, or is malformed. In this
case, the tool should exit immediately rather than segfault by
dereferencing the unique_ptr later.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D102891
We are using TOCEntry symbols like `LC..0` in TOC loads,
this is hard to read , at least requiring an additional step to figure
out the loaded symbols.
We should print out the name in comments.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D102949
Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
Fixing an issue where samples collected for an untrackable frame is not reported. An untrackable frame refers to a frame whose caller is untrackable due to missing debug info or pseudo probe. Though the frame is connected to its parent frame through the frame pointer chain at runtime, the compiler cannot build the connection without debug info or pseudo probe. In such case we just need to report the untrackable frame as the base frame and all of its child frames.
With more samples reported I'm seeing this improves the performance of an internal benchmark by 2.5%.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D102961
True is a bad default: the useful symbol names and `@GOTPCREL` are scrubbed.
Change the default and add global variable tests to x86-basic.ll
(renamed from x86_function_name.ll since we now also test variables).
I updated some tests to show the differences.
Updated LCPI regex to include Darwin style `LCPI_[0-9]+_[0-9]+` (no
leading dot).
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D102588
[Debugify][Original DI] Test dbg var loc preservation
This is an improvement of [0]. This adds checking of
original llvm.dbg.values()/declares() instructions in
optimizations.
We have picked a real issue that has been found with
this (actually, picked one variable location missing
from [1] and resolved the issue), and the result is
the fix for that -- D100844.
Before applying the D100844, using the options from [0]
(but with this patch applied) on the compilation of GDB 7.11,
the final HTML report for the debug-info issues can be found
at [1] (please scroll down, and look for
"Summary of Variable Location Bugs"). After applying
the D100844, the numbers has improved a bit -- please take
a look into [2].
[0] https://llvm.org/docs/HowToUpdateDebugInfo.html#\
test-original-debug-info-preservation-in-optimizations
[1] https://djolertrk.github.io/di-check-before-adce-fix/
[2] https://djolertrk.github.io/di-check-after-adce-fix/
Differential Revision: https://reviews.llvm.org/D100845
The Unit test was failing because the pass from the test that
modifies the IR, in its runOnFunction() didn't return 'true',
so the expensive-check configuration triggered an assertion.
Match whats documented in the Intel AOM - these are all fadd/fcmp use Port1 and fmul uses Port1, but in many cases BOTH ports are required - this was being incorrectly modelled as EITHER port.
Discovered while investigating the correct fptoui costs to fix the regressions in D101555.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
This is an improvement of [0]. This adds checking of
original llvm.dbg.values()/declares() instructions in
optimizations.
We have picked a real issue that has been found with
this (actually, picked one variable location missing
from [1] and resolved the issue), and the result is
the fix for that -- D100844.
Before applying the D100844, using the options from [0]
(but with this patch applied) on the compilation of GDB 7.11,
the final HTML report for the debug-info issues can be found
at [1] (please scroll down, and look for
"Summary of Variable Location Bugs"). After applying
the D100844, the numbers has improved a bit -- please take
a look into [2].
[0] https://llvm.org/docs/HowToUpdateDebugInfo.html\
[1] https://djolertrk.github.io/di-check-before-adce-fix/
[2] https://djolertrk.github.io/di-check-after-adce-fix/
Differential Revision: https://reviews.llvm.org/D100845
This will allow to use llvm-strip with file names that begin with dashes.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102825
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.
These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).
This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.
In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:
.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.
My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)
My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).
As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.
Differential Revision: https://reviews.llvm.org/D102709
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
This will allow to use llvm-objcopy with file names that begin with dashes.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102665
In many cases it is helpful to know at what address the resolved function starts.
This patch adds a new StartAddress member to the DILineInfo structure.
Reviewed By: jhenderson, dblaikie
Differential Revision: https://reviews.llvm.org/D102316
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits.
Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.
While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.
Unlike it's legacy SSE XMM XORPS version, which measures as being 1-cycle,
this one is certainly a zero-cycle instruction, in addition to both of them
being dependency breaking.
As confirmed by exegesis measurements, and ref docs.
While both the SOG and Agner insist that it is zero-cycle,
i can not confirm that claim. While it clearly breaks the dependency,
i can not come up with a snippet, or measurement approach,
to end up with IPC bigger than 4, which, to me, means that it actually
consumes execution resource of an FP unit for a cycle.
Much like other LLVM binary utilities, `llvm-cov` has a symlink compatibility feature where it runs in `gcov` compatibility mode if the binary name ends in `gcov`. This is identical to invoking `llvm-cov gcov ...`.
Differential Revision: https://reviews.llvm.org/D102299
`__mh_(execute|dylib|dylinker|bundle|preload|object)_header` are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in `__TEXT`, even though their addresses are outside `__TEXT`, and they do not refer to code.
It is normally harmless, but when the first section of `__TEXT` has no other symbols, `__mh_*_header` is considered by the disassembler when determing function boundaries. Since `__mh_*_header` refers to an address outside `__TEXT`, the boundary determination fails and disassembly quits.
Since `__TEXT,__text` normally has symbols, this bug is obscured. Experiments placing `__stubs` and `__stub_helper` first exposed the bug, since neither has symbols.
Differential Revision: https://reviews.llvm.org/D101786
When making compilation relocatable, for example in distributed
compilation scenarios, we want to set compilation dir to a relative
value like `.` but this presents a problem when generating reports
because if the file path is relative as well, for example `..`, you
may end up writing files outside of the output directory.
This change introduces a flag that allows overriding the compilation
directory that's stored inside the profile with a different value that
is absolute.
Differential Revision: https://reviews.llvm.org/D100232
Originally landed in: 6400905a61
Reverted in: 668dccc396
Fix branch coverage merging in FunctionCoverageSummary::get() for instantiation
groups.
This change corrects the implementation for the branch coverage summary to do
the same thing for branches that is done for lines and regions. That is,
across function instantiations in an instantiation group, the maximum branch
coverage found in any of those instantiations is returned, with the total
number of branches being the same across instantiations.
Differential Revision: https://reviews.llvm.org/D102193
groups.
This change corrects the implementation for the branch coverage
summary to do the same thing for branches that is done for lines and regions.
That is, across function instantiations in an instantiation group, the maximum
branch coverage found in any of those instantiations is returned, with the
total number of branches being the same across instantiations.
Differential Revision: https://reviews.llvm.org/D102193
This patch adds JSON output style to llvm-symbolizer to better support CLI automation by providing a machine readable output.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96883
There are cases where a concrete DIE with DW_TAG_subprogram can have
abstract_origin attribute, so we handle that situation as well.
Differential Revision: https://reviews.llvm.org/D101025
As confirmed by exegesis measurements, and ref docs.
It does actually execute.
While there, bump latency for MULX32rr, that seems to match measurements.
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
Sometimes disassembler picks _REV variants of instructions
over the plain ones, which in this case exposed an issue
that the _REV variants aren't being modelled as optimizable moves.
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.
Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
I've verified this with llvm-exegesis.
This is not limited to zero registers.
Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.
Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
The dwarfdump command guide shows the short options used as aliases but
these are not found in the help text unless --show-hidden is used.
Investigating other tools some follow this pattern, others like
llvm-objdump show aliases with --help. This change fixes the help output
to be consistent with the command guide. This includes updating alias
descriptions in the help output to use "--".
As part of this change I updated cmdline.test, including some options
that were missing testing.
Differential Revision: https://reviews.llvm.org/D101646
PR50160: we currently ignore non-PT_PHDR segments with no sections, not
accounting for its p_offset and p_filesz: this can cause an out-of-bounds write
in `writeSegmentData` if the p_offset+p_filesz is larger than the total file
size.
This can be fixed by setting p_offset=p_filesz=0. The logic nicely unifies with
the logic added in D90897.
Reviewed By: jhenderson, rupprecht
Differential Revision: https://reviews.llvm.org/D101560
The internal `cl::opt` option --x86-asm-syntax sets the AsmParser and AsmWriter
dialect. The option is used by llc and llvm-mc tests to set the AsmWriter dialect.
This patch adds -M {att,intel} as GNU objdump compatible aliases (PR43413).
Note: the dialect is initialized when the MCAsmInfo is constructed.
`MCInstPrinter::applyTargetSpecificCLOption` is called too late and its MCAsmInfo
reference is const, so changing the `cl::opt` in
`MCInstPrinter::applyTargetSpecificCLOption` is not an option, at least without
large amount of refactoring.
Reviewed By: hoy, jhenderson, thakis
Differential Revision: https://reviews.llvm.org/D101695
Fix PR45416: the diagnostic when '=' is missing is misleading.
`FileOutputBuffer::create` returns successfully when the filename is empty
(the temporary file is `.tmp%%%%%%%`), but `FileOutputBuffer::commit` will error when
renaming `.tmp%%%%%%%` to the empty name).
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D101697
Unwind info generated by MSVC tends to have relocations pointing at
static "label" symbols like "$LN4" instead of regular ones based on
the actual function's name. Try to resolve such symbols to a non-label
symbol if possible (ideally to an external symbol), to improve
the readability.
Differential Revision: https://reviews.llvm.org/D101567
When dumping multiple pieces of information (e.g. --all-headers),
there is sometimes no separator between two pieces.
This patch uses the "\nheader:\n" style, which generally improves
compatibility with GNU objdump.
Note: objdump -t/-T does not add a newline before "SYMBOL TABLE:" and "DYNAMIC SYMBOL TABLE:".
We add a newline to be consistent with other information.
`objdump -d` prints two empty lines before the first 'Disassembly of section'.
We print just one with this patch.
Differential Revision: https://reviews.llvm.org/D101796
Reapply 7368624 after revert and fix
Looking at other tools using tablegen for help output, general options
like --help are not separated from other options. This change removes
the "Generic Options" option group so the options are listed together.
the macho specific option group is left unaffected.
The test help.test was modified to reflect this change.
Differential Revision: https://reviews.llvm.org/D101652
Looking at other tools using tablegen for help output, general options
like --help are not separated from other options. This change removes
the "Generic Options" option group so the options are listed together.
the macho specific option group is left unaffected.
The test help.test was modified to reflect this change.
Differential Revision: https://reviews.llvm.org/D101652
Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`.
This is fully built from scratch, from llvm-mca measurements
and documented reference materials.
Nothing was copied from `znver2`/`znver1`.
I believe this is in a reasonable state of completion for inclusion,
probably better than D52779 `bdver2` was :)
Namely:
* uops are pretty spot-on (at least what llvm-mca can measure)
{F16422596}
* latency is also pretty spot-on (at least what llvm-mca can measure)
{F16422601}
* throughput is within reason
{F16422607}
I haven't run much benchmarks with this,
however RawSpeed benchmarks says this is beneficial:
{F16603978}
{F16604029}
I'll call out the obvious problems there:
* i didn't really bother with X87 instructions
* i didn't really bother with obviously-microcoded/system instructions
* There are large discrepancy in throughput for `mr` and `rm` instructions.
I'm not really sure if it's a modelling defect that needs to be fixed,
or it's a defect of measurments.
* Pipe distributions are probably bad :)
I can't do much here until AMD allows that to be fixed
by documenting the appropriate counters and updating libpfm
That being said, as @RKSimon notes:
>>! In D94395#2647381, @RKSimon wrote:
> I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...>
so how much worse this could possibly be?!
Things that aren't there:
* Various tunings: zero idioms, etc. That is follow-ups.
Differential Revision: https://reviews.llvm.org/D94395
The right symbol flag mask is ~0x7, not ~0xf.
Also emit string names for the other flags (we were missing some).
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D101548
This introduces a flag that aborts if we ever reduce to IR that fails
the verifier.
Reviewed By: swamulism, arichardson
Differential Revision: https://reviews.llvm.org/D101279
Early exit from method DispatchStage::isAvailable() if the dispatch group is
already full. Not all instructions declare at least one uOP.
Fixes PR50174.
When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address
For the latter case, the absolute address can optionally be
described with a symbol.
For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.
Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.
Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.
This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).
This also fixes printing of the function name for packed entries in
linked images.
Relanded with a format string fix in the formatSymbol function; one
can't use %X as format string for an uint64_t. That bug has been
present since this code was added in e6971cab30.
Differential Revision: https://reviews.llvm.org/D100305
This reverts commit 3778924088.
The added test fails on at least one buildbot, by printing a reversed
combination, printing "func3_xdata +0x18 (0x8)" while it's supposed to
be "func3_xdata +0x8 (0x18)", see e.g.
https://lab.llvm.org/buildbot/#/builders/107/builds/7269. Currently
no idea how that could happen, but reverting until it can be figured
out.
When looking up data referenced from pdata/xdata structures, the
referenced data can be found in two different ways:
- For an unrelocated object file, it's located via a relocation
- For a relocated, linked image, the data is referenced with an
(image relative) absolute address
For the latter case, the absolute address can optionally be
described with a symbol.
For the case of an object file, there's two offsets involved; one
immediate offset encoded in the data location that is modified by
the relocation, and a section offset in the symbol.
Previously, for the ExceptionRecord field, we printed the offset
from the symbol (only) but used the immediate offset ignoring
the symbol's address (using only the symbol's section) for printing
the exception data.
Add a helper method for doing the lookup and address calculation,
for simplifying the calling code and making all the cases consistent.
This addresses an existing FIXME comment, fixing printing of the
exception data for cases where relocations point at individual
symbols in the xdata section (which is what MSVC generates) instead of
all relocations pointing at the start of the xdata section (which is
what LLVM generates).
This also fixes printing of the function name for packed entries in
linked images.
Differential Revision: https://reviews.llvm.org/D100305
Add support for LC_THREAD/LC_UNIXTHREAD
(these load commands can be copied over without any modifications).
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D101384
Add a flag to change dsymutil's behavior and force a static variable to
keep its enclosing function. The test shows a situation where that could
be useful. I'm not convinced this behavior makes sense as a default,
which is why it's behind a flag.
rdar://74918374
Differential revision: https://reviews.llvm.org/D101337
Previously printing R_386_RELATIVE relocations would trigger
`error: can't read an entry at 0x40: it goes past the end of the section (0x40)`
I found this while writing a test case for LLD (D100490).
This also includes some minor cleanup in the elf-dynamic-relcos.test
llvm-objdump test based on the newly added test.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D100489
This has been rather useful in our downstream CHERI target where we want
to run tests both with addrspace(0) and addrspace(200) pointers.
With this patch we can prefix the opt command with
`sed -e 's/addrspace(200)/addrspace(0)/g' -e 's/-A200-P200-G200//g'` to
test both cases using the same IR input.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95137
Add support for LC_THREAD/LC_UNIXTHREAD
(these load commands can be copied over without any modifications).
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D101384
Currently llvm-dwp only handled DW_FORM_string and DW_FORM_GNU_str_index; with this patch it also starts to handle DW_FORM_strx[1-4]?
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D75485
This primarily parses a different set of options and invokes the same
resource compiler as llvm-rc normally. Additionally, it can convert
directly to an object file (which in MSVC style setups is done with the
separate cvtres tool, or by the linker).
(GNU windres also supports other conversions; from coff object file back
to .res, and from .res or object file back to .rc form; that's not yet
implemented.)
The other bigger complication lies in being able to imply or pass the
intended target triple, to let clang find the corresponding mingw sysroot
for finding include files, and for specifying the default output object
machine format.
It can be implied from the tool triple prefix, like
`<triple>-[llvm-]windres` or picked up from the windres option e.g.
`-F pe-x86-64`. In GNU windres, that option takes BFD style format names
such as pe-i386 or pe-x86-64. As libbfd in binutils doesn't support
Windows on ARM, there's no such canonical name for the ARM targets.
Therefore, as an LLVM specific extension, this option is extended to
allow passing full triples, too.
Differential Revision: https://reviews.llvm.org/D100756
1. Add an accessor function to MCSymbolizer to retrieve addresses
referenced by a symbolizable operand, but not resolved to a symbol.
That way, the caller can synthesize labels at those addresses and
then retry disassembling the section.
2. Implement that in AMDGPU -- a failed symbol lookup results in the
address being added to a vector returned by the new function.
3. Use that in llvm-objdump when using MCSymbolizer (which only happens
on AMDGPU) and SymbolizeOperands is on.
Differential Revision: https://reviews.llvm.org/D101145
Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e
Initial (D96045) patch didn't handle split dwarf cases,
so this fixes that bug.
In addition, before applying this patch, we had a slowdown
that happened after the D96045. With this patch,
the slowdown will be fixed as well.
Differential Revision: https://reviews.llvm.org/D100951
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.
Differential Revision: https://reviews.llvm.org/D100528
Report dangling probes for frames that have real samples collected. Dangling probes are the probes associated to an empty block. When reported, sample count on a dangling probe will not be trusted by the compiler and we will rely on the counts inference algorithm to get the probe a reasonable count. This actually fixes a bug where previously only those dangling probes with samples collected were reported.
This patch also fixes two existing issues. Pseudo probes are stored in `Address2ProbesMap` and their pointers are used in `PseudoProbeInlineTree`. Previously `std::vector` was used to store probes and the pointers to probes may get obsolete as the vector grows. I'm changing `std::vector` to `std::list` instead.
The other issue is that all outlined functions shared the same inline frame previously due to the unchanged `Index` value as the dummy inlineSite identifier.
Good results seen for SPEC2017 in general regarding profile quality.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D100235
Allow opting out from preprocessing with a command line argument.
Update tests to pass -no-preprocess to make it not try to use clang
(which isn't a build level dependency of llvm-rc), but add a test that
does preprocessing under clang/test/Preprocessor.
Update a few options to allow them both joined (as -DFOO) and separate
(-D BR), as rc.exe allows both forms of them.
With the verbose flag set, this prints the preprocessing command
used (which differs from what rc.exe does).
Tests under llvm/test/tools/llvm-rc only test constructing the
preprocessor commands, while tests under clang/test/Preprocessor test
actually running the preprocessor.
Differential Revision: https://reviews.llvm.org/D100755
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.
This doesn't seem to have any effect, but it should be more correct.
Differential Revision: https://reviews.llvm.org/D100123
This implements an LLVM tool that's flag- and output-compatible
with macOS's `otool` -- except for bugs, but from testing with both
`otool` and `xcrun otool-classic`, llvm-otool matches vanilla
otool's behavior very well already. It's not 100% perfect, but
it's a very solid start.
This uses the same approach as llvm-objcopy: llvm-objdump uses
a different OptTable when it's invoked as llvm-otool. This
is possible thanks to D100433.
Differential Revision: https://reviews.llvm.org/D100583
Used to model structural hazards on FP issue, where some
instructions take up 2 issue slots and others one as well
as similar structural hazards on load issue, where some
instructions take up two load lanes and others one.
Differential Revision: https://reviews.llvm.org/D98977
The `e_flags` contains a mixture of bitfields and regular ones, ensure all of them can be serialized and deserialized.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100250
This is similar to D83530, but for llvm-objdump.
The motivation is the desire to add an `llvm-otool` symlink to
llvm-objdump that behaves like macOS's `otool`, using the same
technique the at llvm-objcopy uses to behave like `strip` (etc).
This change for the most part preserves behavior. In some cases,
it increases compatibility with GNU objdump a bit. For example,
the long options now require two dashes, and the long options
taking arguments for the most part now require a `=` in front
of the value. Exceptions are flags where tests passed the
value separately, for these the separate form is kept as
an alias to the = form.
The one-letter short form args are now joined or separate
and long longer accept a =, which also matches GNU objdump.
cl::opt<>s in libraries now have to be explicitly plumbed
through. This patch does that for --x86-asm-syntax=, but
there's hope that we can remove that again.
Differential Revision: https://reviews.llvm.org/D100433
This patch fixed the following issues along side with some refactoring:
1. Fix bugs where StringRef for context string out live the underlying std::string. We now keep string table in profile generator to hold std::strings. We also do the same for bracketed context strings in profile writer.
2. Make sure profile output strictly follow (total sample, name) order. Previously, there's inconsistency between ProfileMap's key and FunctionSamples's name, leading to inconsistent ordering. This is now fixed by introducing context profile canonicalization. Assertions are also added to make sure ProfileMap's key and FunctionSamples's name are always consistent.
3. Enhanced error handling for profile writing to make sure we bubble up errors properly for both llvm-profgen and llvm-profdata when string table is not populated correctly for extended binary profile.
4. Keep all internal context representation bracket free. This avoids creating new strings for context trimming, merging and preinline. getNameWithContext API is now simplied accordingly.
5. Factor out the code for context trimming and merging into SampleContextTrimmer in SampleProf.cpp. This enables llvm-profdata to use the trimmer when merging profiles. Changes in llvm-profgen will be in separate patch.
Differential Revision: https://reviews.llvm.org/D100090
The tests compare IPC statistics that MCA provides with IPC values
measured on Cortex-A55 hardware. For hardware tests, each snippet is
run in a loop unrolled by 1000, and IPC is measured by linux-perf.
Several tests do not match the hardware: the skewed ALU is not
supported, LDR seem to be missing a forwarding path.
Differential Revision: https://reviews.llvm.org/D98174
Clang spends a decent amount of time in the LineOffsetMapping::get(...)
function. This function used to be vectorized (through SSE2) then the
optimization got dropped because the sequential version was on-par performance
wise.
This provides an optimization of the sequential version that works on a word at
a time, using (documented) bithacks to provide a portable vectorization.
When preprocessing the sqlite amalgamation, this yields a sweet 3% speedup.
Differential Revision: https://reviews.llvm.org/D99409
Consider the .debug_pubnames and .debug_pubtypes their own kind of
accelerator and stop emitting them together with the Apple-style
accelerator tables. The only reason we were still emitting both was for
(byte-for-byte) compatibility with dsymutil-classic.
- This patch adds a new accelerator table kind "Pub" which can be
specified with --accelerator=Pub.
- This patch removes the ability to emit both pubnames/types and apple
style accelerator tables. I don't think anyone is relying on that but
it's worth pointing out.
- This patch removes the --minimize option and makes this behavior the
default. Specifying the flag will result in a warning but won't abort
the program.
Differential revision: https://reviews.llvm.org/D99907
This way, once there's an error in the snippet file (like in the test),
llvm-exegesis won't crash with an assertion failure,
but print a nice diagnostic about the problem.
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.
Also make sure that if Warning() returns true, we always treat it as an error.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92504
Make variables and text-macro references case-insensitive, to match ml.exe.
Also improve error handling for text-macro expansion.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92503
Encountered a crash while running a debug build, where this code path would be taken due to a mismatch in profile coverage data versions. Without consuming the error, an assert would be triggered inside the destructor of Error.
Differential Revision: https://reviews.llvm.org/D99457
dsymutil is not relocating the DW_AT_low_pc for a DW_TAG_label. This
patch fixes that and adds a test.
Differential revision: https://reviews.llvm.org/D99534
This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen.
It will serve two purposes:
1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size.
2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation.
Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler.
This change only has the framework, with a few TODOs left for follow up patches for a complete implementation:
1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation.
2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR.
Differential Revision: https://reviews.llvm.org/D99146
Instructions that have more uops than the processor's IssueWidth are
issued in multiple cycles.
The patch fixes PR49712.
Differential Revision: https://reviews.llvm.org/D99339
The option `--prefix-strip` is only used when `--prefix` is not empty.
It removes N initial directories from absolute paths before adding the
prefix.
This matches GNU's objdump behavior.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96679
This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order
When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.
Differential Revision: https://reviews.llvm.org/D98628
This patch renames the "Initial" member of WasmLimits to the name used
in the spec, "Minimum".
In the core WebAssembly specification, the Limits data type has one
required "min" member and one optional "max" member, indicating the
minimum required size of the corresponding table or memory, and the
maximum size, if any.
Although the WebAssembly spec does instantiate locally-defined tables
and memories with the initial size being equal to the minimum size, it
can't impose such a requirement for imports. It doesn't make sense to
require an initial size for a memory import, for example. The compiler
can only sensibly express the minimum and maximum sizes.
See
https://github.com/WebAssembly/js-types/blob/master/proposals/js-types/Overview.md#naming-of-size-limits
for a related discussion that agrees that the right name of "initial" is
"minimum" when querying the type of a table or memory from JavaScript.
(Of course it still makes sense for JS to speak in terms of an initial
size when it explicitly instantiates memories and tables.)
Differential Revision: https://reviews.llvm.org/D99186
Coyp SchedRW from pseudos to real instructions so that llvm-mca has
access to it. This is NFC for normal compiler codegen, which schedules
pseudos not real instructions.
Add an llvm-mca test for some high latency double-precision instructions
as a smoke test.
Differential Revision: https://reviews.llvm.org/D99187
Before this patch, register writes were always invalidated by the
RegisterFile at instruction commit stage. So,
the RegisterFile was often losing the knowledge about the `execute
cycle` of writes already committed. While this was not problematic
for non-delayed reads, this was sometimes leading to inaccurate read
latency computations in the presence of negative read-advance cycles.
This patch fixes the issue by changing how the RegisterFile component
internally keeps track of the `execute cycle` information of each
write. On every instruction executed, the RegisterFile gets notified
by the RetireStage, so that it can internally record the execute
cycle of each executed write.
The `execute cycle` information is stored within WriteRef itself, and
it is not invalidated when the write is committed.
Exclude AArch64 mapping symbols ($x and $d) for symtab symbolization as
it was done for ARM since D95916 tom bring bots back to green state.
This is implemented by setting SF_FormatSpecific such that
llvm-symbolizer will ignore them, and use this flag to re-implement
llvm-nm --special-syms option which make it work for both targets.
Differential Revision: https://reviews.llvm.org/D98803
This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack.
This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D96918
Fix spurious warnings for missing symbols with thinLTO. The latter
appends a unique suffix to avoid collisions for exported private
symbols, resulting in dsymutil complaining it couldn't find the symbol
in the object file.
rdar://75434058
Differential revision: https://reviews.llvm.org/D99125
Switch to use cold threshold from profile summary for cold context merging and trimming, instead of relying on hard coded values. Minor refactoring included for switch names, etc.
Differential Revision: https://reviews.llvm.org/D98921
This is a similarity visualization tool that accepts a Module and
passes it to the IRSimilarityIdentifier. The resulting SimilarityGroups
are output in a JSON file.
Tests are found in test/tools/llvm-sim and check for the file not found,
a bad module, and that the JSON is created correctly.
Reviewers: paquette, jroelofs, MaskRay
Recommit of: 15645d044b to fix linking
errors.
Differential Revision: https://reviews.llvm.org/D86974
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.
This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.
Differential Revision: https://reviews.llvm.org/D98823
Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it.
Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding.
Differential Revision: https://reviews.llvm.org/D98226
This allows to check for various globals (metadata/attributes/...) and
also resolves problems with globals (metadata/attributes/...) being
reused across different prefixes.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D94741
The "Inputs" subdirectory is used for all files read by the test, not
only those used as input to the execution - so even though this file is
used as a golden reference for the output of the test, it's still an
input to the test execution (it is read in the process of executing the
test).
This diff introduces --keep-undefined in llvm-objcopy/llvm-strip for Mach-O
which makes the tools preserve undefined symbols.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D97040
Context-sensitive AutoFDO profile has a different name scheme where full calling contexts are encoded as function names. When processing CS proifle, llvm-profdata should use full contexts instead of leaf function names.
Reviewed By: wmi, wenlei, wlei
Differential Revision: https://reviews.llvm.org/D97998
This patch uses the errno python library to print out the correct error messages instead of hardcoding the error message per platform.
Reviewed By: jhenderson, ASDenysPetrov
Differential Revision: https://reviews.llvm.org/D97472
The code was using the standard isalnum function which doesn't handle
values outside the non-ascii range. Switching to using llvm::isAlnum
instead ensures we don't provoke undefined behaviour, which can in some
cases result in crashes.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D97663
The test was showing that when --strip-unneeded is specified for an
executable, all the symbols are stripped. However, the set of symbols
used in the test would be stripped by --strip-unneeded for an ET_REL
object too. Fix this by adding additional symbols that aren't normally
stripped by --strip-unneeded.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D97664
This change adds '-use-interfacestub' option to allow llvm-ifs
to use InterfaceStub lib when generating ELF binary.
Differential Revision: https://reviews.llvm.org/D94461
This patch adds a number of new test cases that cover various
llvm-objcopy and llvm-strip features that had missing test coverage of
various descriptions:
* --add-section - checked the shdr properties, not just the content.
* Dedicated test case for --add-symbol when there are many sections.
* Show that --change-start accepts negative values without overflow.
This was previously present but got lost between review versions.
* --dump-section - show that multiple sections can be dumped
simultaneously to different files, and that an error is reported when
a section cannot be found.
* --globalize-symbol(s) - show that symbols that are not mentioned are
not globalized, if they would otherwise be, and that missing symbols
from the list do not cause problems.
* --keep-global-symbol - show that the --regex option can be used in
conjunction with this option.
* --keep-symbol - show that the --regex option can be used in
conjunction with this option.
* --localize-symbol(s) - show that symbols that are not mentioned are
not localized, if they would otherwise be, and that missing symbols
from the list do not cause problems.
* --prefix-alloc-sections - show the behaviour of an empty string
argument and multiple arguments.
* --prefix-symbols - show the behaviour of an empty string argument and
multiple arguments. Also show the option applies to undefined symbols.
* --redefine-symbol - show that symbols with no name can be renamed,
that it is not an error if a symbol is not specified, and that the
option doesn't chain (i.e. --redefine-sym a=b --redefine-sym b=c does
not redefine a as c).
* --rename-section - show that all section flags are preserved if none
are specified. Also show that the option does not chain.
* --set-section-alignment - show that only specified sections have
their alignments changed.
* --set-section-flags - show which section flags are preserved when this
option is used. Also show that unspecified sections are not affected.
* --preserve-dates - show that -p is an alias of --preserve-dates.
* --strip-symbol - show that --regex works with this option for
llvm-objcopy as well as llvm-strip.
* --strip-unneeded-symbol(s) - show more clearly that needed symbols are
not stripped even if requested by this option.
* --allow-broken-links - show the sh_link of a symbol table is set to 0
when its string table has been removed when this option is specified.
* --weaken-symbol(s) - show that symbols that are not mentioned are not
weakened, if they would otherwise be, and that missing symbols from
the list do not cause problems.
* --wildcard - show the wildcard behaviour for several options that were
previously unchecked.
Reviewed by: alexshap
Differential Revision: https://reviews.llvm.org/D97666
llvm-objdump only uses one MCInstrAnalysis object, so if ARM and Thumb
code is mixed in one object, or if an object is disassembled without
explicitly setting the triple to match the ISA used, then branch and
call targets will be printed incorrectly.
This could be fixed by creating two MCInstrAnalysis objects in
llvm-objdump, like we currently do for SubtargetInfo. However, I don't
think there's any reason we need two separate sub-classes of
MCInstrAnalysis, so instead these can be merged into one, and the ISA
determined by checking the opcode of the instruction.
Differential revision: https://reviews.llvm.org/D97766
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.
In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.
Differential Revision: https://reviews.llvm.org/D94928
The check for whether an extended symbol index table was required
dropped the first SHN_LORESERVE sections from the sections array before
checking whether the remaining sections had symbols. Unfortunately, the
null section header is not present in this list, so the check was
skipping the first section that might be important. If that section
contained a symbol, and no subsequent ones did, the .symtab_shndx
section would not be emitted, leading to a corrupt object.
Also consolidate and expand test coverage in the area to cover this bug
and other aspects of the SYMTAB_SHNDX section.
Reviewed by: alexshap, MaskRay
Differential Revision: https://reviews.llvm.org/D97661