Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Adds new symbolizer symbols to `global_symbols.txt`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D113717
This change adds options to llvm-ifs to allow it to generate multiple
types of stub files at a single invocation.
Differential Revision: https://reviews.llvm.org/D115024
Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool.
Three changes in this diff:
1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch.
2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold.
3. Change to accumulate total samples per byte instead of per instruction.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115013
This change allows to trim the profile if it's considered to be cold for baseline AutoFDO. We reuse the cold threshold from `ProfileSummaryBuilder::getColdCountThreshold(..)` which can be set by percent(--profile-summary-cutoff-cold) or by value(--profile-summary-cold-count).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113785
This reverts commit 02cc8d698c because it
caused buildbot failures. The issue appears to be simply that we need to
only enable debuginfod when the HTTPClient has been initialized by the
running tool, since InitLLVM does not do the initialization step anymore.
Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D113717
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:
- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly
Some notes:
We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.
For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.
For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.
This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.
Epilogue emission code adjusted in a similar manner.
PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.
note on tail calls:
If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.
When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.
One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.
Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov
- Ties Stuij
Reviewed By: danielkiss
Differential Revision: https://reviews.llvm.org/D112429
This reverts commit 8cd61aac00.
I had missed one place in a test that needed updating; it passed on my
dirty build tree but not on a clean one.
Original commit message:
D114478 identified testing gaps; this patch fills them.
Differential Revision: https://reviews.llvm.org/D114913
Ignore undefined symbols; other minor code cleanup.
Replace test objects and their asm source with a yaml equivalent.
Differential Revision: https://reviews.llvm.org/D114478
The first 8b of each raw profile section need to be aligned to 8b since
the first item in each section is a u64 count of the number of items in
the section.
Summary of changes:
* Assert alignment when reading counts.
* Update test to check alignment, relax some size checks to allow padding.
* Update raw binary inputs for llvm-profdata tests.
Differential Revision: https://reviews.llvm.org/D114826
In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen:
1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary.
2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`.
3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData.
Reviewed By: xur, hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113296
The memprof profile reader tests rely on binary data which is generated
from and meant to be interpreted on little endian architectures. Add a
REQUIRES: x86_64-linux clause to both tests to ensure they don't fail on big
endian targets such as ppc.
This commit adds initial support to llvm-profdata to read and print
summaries of raw memprof profiles.
Summary of changes:
* Refactor shared defs to MemProfData.inc
* Extend show_main to display memprof profile summaries.
* Add a simple raw memprof profile reader.
* Add a couple of tests to tools/llvm-profdata.
Differential Revision: https://reviews.llvm.org/D114286
AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler.
We define the density of a profile Prof as follows:
- For each function A in the profile, density(A) = total_samples(A) / sizeof(A).
- density(Prof) = min(density(A)) for all functions A that are warm (defined below).
A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the `ProfileSummaryBuilder::getHotCountThreshold(..)` as threshold which can be set by percent(`--profile-summary-cutoff-hot`) or by value(`--profile-summary-hot-count`).
We also introduce `--hot-function-density-threshold` to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples.
This also applies for CS profile with all profiles merged into base.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113781
These are some final test changes for using instruction referencing on X86:
* Most of these tests just have the flag switched so that they run with
instr-ref, and just work: these tests were fixed by earlier patches.
* There are some spurious differences in textual outputs,
* A few have different temporary labels in the output because more
MCSymbols are printed to the output.
Differential Revision: https://reviews.llvm.org/D114588
The UpdateTestChecks tool itself does not care about which pass
manager that is used in the opt invocation. So the lit tests that
are verifying the behavior of the UpdateTestChecks tool is updated
to use the new-PM syntax (-passes=) when specifying the pass pipeline
in the test cases that are used for verifying the UpdateTestChecks
tool.
Differential Revision: https://reviews.llvm.org/D114517
Renamed the option for llvm-cov and changed variable names to use more
inclusive terms. Also changed the binary for the test.
Reviewed By: alanphipps
Differential Revision: https://reviews.llvm.org/D112816
Introduced/discussed in https://reviews.llvm.org/D38719
The header validation logic was also explicitly building the DWARFUnits
to validate. But then other calls, like "Units.getUnitForOffset" creates
the DWARFUnits again in the DWARFContext proper - so, let's avoid
creating the DWARFUnits twice by walking the DWARFContext's units rather
than building a new list explicitly.
This does reduce some verifier power - it means that any unit with a
header parsing failure won't get further validation, whereas the
verifier-created units were getting some further validation despite
invalid headers. I don't think this is a great loss/seems "right" in
some ways to me that if the header's invalid we should stop there.
Exposing the raw DWARFUnitVectors from DWARFContext feels a bit
sub-optimal, but gave simple access to the getUnitForOffset to keep the
rest of the code fairly similar.
With link-time optimizations enabled, resulting DWARF mayend up containing
cross CU references (through the DW_AT_abstract_origin attribute).
Consider the following example:
// sum.c
__attribute__((always_inline)) int sum(int a, int b)
{
return a + b;
}
// main.c
extern int sum(int, int);
int main()
{
int a = 5, b = 10, c = sum(a, b);
return 0;
}
Compiled as follows:
$ clang -g -flto -fuse-ld=lld main.c sum.c -o main
Results in the following DWARF:
-- sum.c CU: abstract instance tree
...
0x000000b0: DW_TAG_subprogram
DW_AT_name ("sum")
DW_AT_decl_file ("sum.c")
DW_AT_decl_line (1)
DW_AT_prototyped (true)
DW_AT_type (0x000000d3 "int")
DW_AT_external (true)
DW_AT_inline (DW_INL_inlined)
0x000000bc: DW_TAG_formal_parameter
DW_AT_name ("a")
DW_AT_decl_file ("sum.c")
DW_AT_decl_line (1)
DW_AT_type (0x000000d3 "int")
0x000000c7: DW_TAG_formal_parameter
DW_AT_name ("b")
DW_AT_decl_file ("sum.c")
DW_AT_decl_line (1)
DW_AT_type (0x000000d3 "int")
...
-- main.c CU: concrete inlined instance tree
...
0x0000006d: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x00000000000000b0 "sum")
DW_AT_low_pc (0x00000000002016ef)
DW_AT_high_pc (0x00000000002016f1)
DW_AT_call_file ("main.c")
DW_AT_call_line (5)
DW_AT_call_column (0x19)
0x00000081: DW_TAG_formal_parameter
DW_AT_location (DW_OP_reg0 RAX)
DW_AT_abstract_origin (0x00000000000000bc "a")
0x00000088: DW_TAG_formal_parameter
DW_AT_location (DW_OP_reg2 RCX)
DW_AT_abstract_origin (0x00000000000000c7 "b")
...
Note that each entry within the concrete inlined instance tree in
the main.c CU has a DW_AT_abstract_origin attribute which
refers to a corresponding entry within the abstract instance
tree in the sum.c CU.
llvm-dwarfdump --statistics did not properly report
DW_TAG_formal_parameters/DW_TAG_variables from concrete inlined
instance trees which had 0% location coverage and which
referred to a different CU, mainly because information about abstract
instance trees and their parameters/variables was stored
locally - just for the currently processed CU,
rather than globally - for all CUs.
In particular, if the concrete inlined instance tree from
the example above was to look like this
(i.e. parameter b has 0% location coverage, hence why it's missing):
0x0000006d: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x00000000000000b0 "sum")
DW_AT_low_pc (0x00000000002016ef)
DW_AT_high_pc (0x00000000002016f1)
DW_AT_call_file ("main.c")
DW_AT_call_line (5)
DW_AT_call_column (0x19)
0x00000081: DW_TAG_formal_parameter
DW_AT_location (DW_OP_reg0 RAX)
DW_AT_abstract_origin (0x00000000000000bc "a")
llvm-dwarfdump --statistics would have not reported b as such.
Patch by Dimitrije Milosevic.
Differential revision: https://reviews.llvm.org/D113465
This patch adds parallel processing of chunks. When reducing very large
inputs, e.g. functions with 500k basic blocks, processing chunks in
parallel can significantly speed up the reduction.
To allow modifying clones of the original module in parallel, each clone
needs their own LLVMContext object. To achieve this, each job parses the
input module with their own LLVMContext. In case a job successfully
reduced the input, it serializes the result module as bitcode into a
result array.
To ensure parallel reduction produces the same results as serial
reduction, only the first successfully reduced result is used, and
results of other successful jobs are dropped. Processing resumes after
the chunk that was successfully reduced.
The number of threads to use can be configured using the -j option.
It defaults to 1, which means serial processing.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D113857
Add an alias of `addi [x], zero, imm` to generate pseudo
instruction li, which makes assembly mush more readable.
For existed tests, users can update them by running script
`llvm/utils/update_llc_test_checks.py`.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D112692
There are various tests that need to be adjusted to test the right
thing with instruction referencing -- usually because the internal
representation of variables is different, sometimes that location lists
change. This patch makes a bunch of tests explicitly not use
instruction referencing, so that a check-llvm test with instruction
referencing on for x86_64 doesn't fail. I'll then convert the tests
to have instr-ref CHECK lines, and similar.
Differential Revision: https://reviews.llvm.org/D113194
This is another attempt at D59351 which attempted to add --update-section, but
with some heuristics for adjusting segment/section offsets/sizes in the event
the data copied into the section is larger than the original size of the section.
We are opting to not support this case. GNU's objcopy was able to do this because
the linker and objcopy are tightly coupled enough that segment reformatting was
simpler. This is not the case with llvm-objcopy and lld where they like to be separated.
This will attempt to copy data into the section without changing any other
properties of the parent segment (if the section is part of one).
Differential Revision: https://reviews.llvm.org/D112116
Textual LLVM IR files are much bigger and take longer to write to disk.
To avoid the extra cost incurred by serializing to text, this patch adds
an option to save temporary files as bitcode instead.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D113858
Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools.
Differential Revision: https://reviews.llvm.org/D113727
The file seems to have been put in the wrong place in its original
commit. This had the effect of marking all llvm-nm tests as unsupported,
unless X86 was enabled, even for tests that weren't X86 specific.
Fixes https://bugs.llvm.org/show_bug.cgi?id=52506.
Reviewed by: mstorsjo
Differential Revision: https://reviews.llvm.org/D113882
Add support for demangling Rust v0 symbols to llvm-nm by reusing
nonMicrosoftDemangle which supports both Itanium and Rust mangling.
Reviewed By: dblaikie, jhenderson
Differential Revision: https://reviews.llvm.org/D111937
Metadata operands tend to require special conditions, especially on dbg
intrinsics. We also don't have a zero value for metadata.
Replacing callee operands is a little weird, since calling undef/null
doesn't make sense. It also causes tons of invalid reductions when
reducing calls to intrinsics since only arguments to intrinsics can be
of the metadata type.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D113532
Add a new "operands-skip" pass whose goal is to remove instructions in the middle of dependency chains. For instance:
```
%baseptr = alloca i32
%arrayidx = getelementptr i32, i32* %baseptr, i32 %idxprom
store i32 42, i32* %arrayidx
```
might be reducible to
```
%baseptr = alloca i32
%arrayidx = getelementptr ... ; now dead, together with the computation of %idxprom
store i32 42, i32* %baseptr
```
Other passes would either replace `%baseptr` with undef (operands, instructions) or move it to become a function argument (operands-to-args), both of which might fail the interestingness check.
In principle the implementation allows operand replacement with any value or instruction in the function that passes the filter constraints (same type, dominance, "more reduced"), but is limited in this patch to values that are directly or indirectly used to compute the current operand value, motivated by the example above. Additionally, function arguments are added to the candidate set which helps reducing the number of relevant arguments mitigating a concern of too many arguments mentioned in https://reviews.llvm.org/D110274#3025013.
Possible future extensions:
* Instead of requiring the same type, bitcast/trunc/zext could be automatically inserted for some more flexibility.
* If undef is added to the candidate set, "operands-skip"is able to produce any reduction that "operands" can do. Additional candidates might be zero and one, where the "reductive power" classification can prefer one over the other. If undefined behaviour should not be introduced, undef can be removed from the candidate set.
Recommit after resolving conflict with D112651 and reusing
shouldReduceOperand from D113532.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111818
Add a new "operands-skip" pass whose goal is to remove instructions in the middle of dependency chains. For instance:
```
%baseptr = alloca i32
%arrayidx = getelementptr i32, i32* %baseptr, i32 %idxprom
store i32 42, i32* %arrayidx
```
might be reducible to
```
%baseptr = alloca i32
%arrayidx = getelementptr ... ; now dead, together with the computation of %idxprom
store i32 42, i32* %baseptr
```
Other passes would either replace `%baseptr` with undef (operands, instructions) or move it to become a function argument (operands-to-args), both of which might fail the interestingness check.
In principle the implementation allows operand replacement with any value or instruction in the function that passes the filter constraints (same type, dominance, "more reduced"), but is limited in this patch to values that are directly or indirectly used to compute the current operand value, motivated by the example above. Additionally, function arguments are added to the candidate set which helps reducing the number of relevant arguments mitigating a concern of too many arguments mentioned in https://reviews.llvm.org/D110274#3025013.
Possible future extensions:
* Instead of requiring the same type, bitcast/trunc/zext could be automatically inserted for some more flexibility.
* If undef is added to the candidate set, "operands-skip"is able to produce any reduction that "operands" can do. Additional candidates might be zero and one, where the "reductive power" classification can prefer one over the other. If undefined behaviour should not be introduced, undef can be removed from the candidate set.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111818
`llvm-otool -tV foo.o` and `llvm-objdump --macho -d foo.o` would
previously fail on object files containing @TLVPPAGE or @TLVPPAGEOFF relocs.
Move llvm-objdump-specific test from
llvm/test/MC/AArch64/arm64-tls-modifiers-darwin.s to new
llvm/test/tools/llvm-objdump/MachO/disassemble-arm64-tlv-modifers.test
and put test for this fix to that new file.
Fixes PR52356.
Differential Revision: https://reviews.llvm.org/D112843
Summary: Fix the build failure on MSVC by making the `T` and `U` of the function
'T llvm::Optional<T>::getValueOr<llvm::yaml::Hex32>(U &&) const &' the same.
Differential Revision: https://reviews.llvm.org/D111487
This patch fixes some issues introduced in
https://reviews.llvm.org/D108942:
1) Remove the default label to fix the bots that use
-Werror,-Wcovered-switch-default
2) Modify the malformed test to fix the bots that are
built without zlib support
3) Modify some error messages in malformed profiles
Previously, if the basic-blocks delta pass tried to remove a basic block
that was the last basic block in a function that did not have external
or weak linkage, the resulting IR would become invalid. Since removing
the last basic block in a function is effectively identical to removing
the function body itself, we check explicitly for this case and if we
detect it, we run the same logic as in ReduceFunctionBodies.cpp
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D113486
Sometimes if llvm-reduce is interrupted in the middle of a delta pass on
a large file, it can take quite some time for the tool to start actually
doing new work if it is restarted again on the partially-reduced file. A
lot of time ends up being spent testing large chunks when these large
chunks are very unlikely to actually pass the interestingness test. In
cases like this, the tool will complete faster if the starting
granularity is reduced to a finer amount. Thus, we introduce a command
line flag that automatically divides the chunks into smaller subsets a
fixed, user-specified number of times prior to beginning the core loop.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D112651
If profile data is malformed for any kind of reason, we generate
an error that only reports "malformed instrumentation profile data"
without any further information. This patch extends InstrProfError
class to receive an optional error message argument, so that we can
do better error reporting.
Differential Revision: https://reviews.llvm.org/D108942
It is often useful to know which die is the parent of the current die.
This patch adds information about parent offset into the dump:
0x0000000b: DW_TAG_compile_unit
DW_AT_producer ("by_hand")
0x00000014: DW_TAG_base_type (0x0000000b) <<<<<<<<<<<<<<
DW_AT_name ("int")
Now it is easy to see which die is the parent of the current die.
This patch makes that behaviour to be default.
We can make it to be opt-in if neccessary.
This functionality differs from already existed "--show-parents"
in that sence that parent information is shown for all dies and
only link to the immediate parent is shown.
Differential Revision: https://reviews.llvm.org/D113406
Summary:
This patch adds yaml2obj supporting for the auxiliary
file header of XCOFF.
Reviewed By: DiggerLin, jhenderson
Differential Revision: https://reviews.llvm.org/D111487
A new tool that compares TargetLibraryInfo's opinion of the availability
of library function calls against the functions actually exported by a
specified set of libraries. Can be helpful in verifying the correctness
of TLI for a given target, and avoid mishaps such as had to be addressed
in D107509 and 94b4598d.
The tool currently supports ELF object files only, although it's unlikely
to be hard to add support for other formats.
Re-commits 62dd488 with changes to use pre-generated objects, as not all
bots have ld.lld available.
Differential Revision: https://reviews.llvm.org/D111358
A new tool that compares TargetLibraryInfo's opinion of the availability
of library function calls against the functions actually exported by a
specified set of libraries. Can be helpful in verifying the correctness
of TLI for a given target, and avoid mishaps such as had to be addressed
in D107509 and 94b4598d.
The tool currently supports ELF object files only, although it's unlikely
to be hard to add support for other formats.
Differential Revision: https://reviews.llvm.org/D111358
I am planning to upstream MachOObjectFile code to support Darwin
chained fixups. In order to test the new parser features we need a way
to produce correct (and incorrect) chained fixups. Right now the only
tool that can produce them is the Darwin linker. To avoid having to
check in binary files, this patch allows obj2yaml to print a hexdump
of the raw LINKEDIT and DATA segment, which both allows to
bootstrap the parser and enables us to easily create malformed inputs
to test error handling in the parser.
This patch adds two new options to obj2yaml:
-raw-data-segment
-raw-linkedit-segment
Differential Revision: https://reviews.llvm.org/D113234
Replace the description and file names for this argument. As far as I understand
this is a positional argument and I don't believe this changes breaks any existing
interfaces.
Reviewed By: hctim, MaskRay
Differential Revision: https://reviews.llvm.org/D113316
Matching a recent clang change I've made, now 'int[3]' is formatted
without the space between the type and array bound. This commit updates
libDebugInfoDWARF/llvm-dwarfdump to match that formatting.
Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113238
Actually we can, for now, remove the explicit "operator" handling
entirely - since clang currently won't try to flag any of these as
rebuildable. That seems like a reasonable state for now, but it could be
narrowed down to only apply to conversion operators, most likely - but
would need more nuance for op> and op>> since they would be incorrectly
flagged as already having their template arguments (due to the trailing
'>').
As seen in https://bugs.llvm.org/show_bug.cgi?id=52213 llvm-objdump
asserts if either the --debug-vars or the --dwarf options are provided
with invalid values. As suggested, this fix adds use of a default value
to these options and errors when given bad input.
Differential Revision: https://reviews.llvm.org/D112183
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.
Example of warning summary.
```
warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction.
warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function.
```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D111902
Notes generated in OpenBSD core files provide additional information
about the kernel state and CPU registers. These notes are described
in core.5, which can be viewed here: https://man.openbsd.org/core.5
Differential Revision: https://reviews.llvm.org/D111966
(Second try. Need to link against CodeGen and MC libs.)
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527
Allow filling zero count for all the function ranges even there is no samples hitting that function. Add a switch for this.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112858
with tests generated by yaml2obj.
Summary: Because yaml2obj supports basic transforming for XCOFF,
some of the binary inputs used in the tests of llvm-readobj
can be replaced with yaml files.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D111699
Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112672
Previous implementation of populating profile symbol list is wrong, it only included the profiled symbols. Actually it should use all symbols, here this switches to use the symbols from debug info. Also turned the flag off by default.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D111824
This was checked while counting but not actually when doing the reduction, resulting in crashes.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D112766
Functions in different sections (common in object files - inline
functions, -ffunction-sections, etc) can't overlap, so factor in the
section when diagnosing overlapping address ranges.
This removes a major false-positive when running llvm-dwarfdump on
unlinked code.
Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D112489
GNU sed offers the `,+4d` to delete the line a next four lines, but BSD
sed doesn't seem to support it (at least in macOS 10.15, but seems to do
in my 11.6 version).
Replace the usage of the extension with the equivalent syntax that works
both in BSD and GNU sed. I don't have a macOS 10.15 to check, but this
works in both my macOS 11.6 and Linux machines.
Differential Revision: https://reviews.llvm.org/D112583
**Context:**
This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test
the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication.
**Summary**
Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.
The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.
Reviewed By: alexander-shaposhnikov, #lld-macho
Differential Revision: https://reviews.llvm.org/D111164
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this.
Format of unsymbolized profile:
```
[context stack1] # If it's a CS profile
number of entries in RangeCounter
from_1-to_1:count_1
from_2-to_2:count_2
......
from_n-to_n:count_n
number of entries in BranchCounter
src_1->dst_1:count_1
src_2->dst_2:count_2
......
src_n->dst_n:count_n
[context stack2]
......
```
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D111750
Some dwarf loaders in LLVM are hard-coded to only accept 4-byte and 8-byte address sizes. This patch generalizes acceptance into `DWARFContext::isAddressSizeSupported` and provides a common way to generate rejection errors.
The MSP430 target has been given new tests to cover dwarf loading cases that previously failed due to 2-byte addresses.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D111953
We incorrectly use duplication factor for total samples even though we already accumulate samples instead of taking MAX. It causes profile to have bloated total samples for functions with loop unrolled or vectorized. The change fix the issue for total sample, head sample and call target samples.
Differential Revision: https://reviews.llvm.org/D112042
Having non-undef constants in a final llvm-reduce output is nicer than
having undefs.
This splits the existing reduce-operands pass into three, one which does
the same as the current pass of reducing to undef, and two more to
reduce to the constant 1 and the constant 0. Do not reduce to undef if
the operand is a ConstantData, and do not reduce 0s to 1s.
Reducing GEP operands very frequently causes invalid IR (since types may
not match up if we index differently into a struct), so don't touch GEPs.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D111765
Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.
CMOVGE reads SF and OF. CMOVNS only reads SF. This matches with
other recent changes to use a single flag where possible. It also
matches gcc codegen.
I believe this technically changes whether the conditioanl move happens
on INT_MIN, but for INT_MIN both registers are the same so it doesn't
matter.
Differential Revision: https://reviews.llvm.org/D111826
POSIX does not define the exact output from od tool.
While most implementations use lower case characters in hex output,
the z/OS USS implementation uses upper case characters.
To avoid LIT failures, the FileCheck option to ignore the case must
be used when checking hex bytes.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D111427
Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe).
Differential Revision: https://reviews.llvm.org/D111776
The first LBR entry can be an external branch, we should ignore the whole trace.
```
7f7448e889e4 0x7f7448e889e4/0x7f7448e88826/P/-/-/1 0x7f7448e8899f/0x7f7448e889d8/P/-/-/4 ...
```
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D111749
Instead of setting operands to undef as the "operands" pass does,
convert the operands to a function argument. This avoids having to
introduce undef values into the IR which have some unpredictability
during optimizations.
For instance,
define void @func() {
entry:
%val = add i32 32, 21
store i32 %val, i32* null
ret void
}
is reduced to
define void @func(i32 %val) {
entry:
%val1 = add i32 32, 21
store i32 %val, i32* null
ret void
}
(note that the instruction %val is renamed to %val1 when printing
the IR to avoid ambiguity; ideally %val1 would be removed by dce or the
instruction reduction pass)
Any call to @func is replaced with a call to the function with the
new signature and filled with undef. This is not ideal for IPA passes,
but those out-of-scope for now.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111503
Summary: This patch improves the error message context of the
XCOFF interfaces by providing more details.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D110320
We would like to start pushing -mcpu=generic towards enabling the set of
features that improves performance for some CPUs, without hurting any
others. A blend of the performance options hopefully beneficial to all
CPUs. The largest part of that is enabling in-order scheduling using the
Cortex-A55 schedule model. This is similar to the Arm backend change
from eecb353d0e which made -mcpu=generic perform in-order scheduling
using the cortex-a8 schedule model.
The idea is that in-order cpu's require the most help in instruction
scheduling, whereas out-of-order cpus can for the most part out-of-order
schedule around different codegen. Our benchmarking suggests that
hypothesis holds. When running on an in-order core this improved
performance by 3.8% geomean on a set of DSP workloads, 2% geomean on
some other embedded benchmark and between 1% and 1.8% on a set of
singlecore and multicore workloads, all running on a Cortex-A55 cluster.
On an out-of-order cpu the results are a lot more noisy but show flat
performance or an improvement. On the set of DSP and embedded
benchmarks, run on a Cortex-A78 there was a very noisy 1% speed
improvement. Using the most detailed results I could find, SPEC2006 runs
on a Neoverse N1 show a small increase in instruction count (+0.127%),
but a decrease in cycle counts (-0.155%, on average). The instruction
count is very low noise, the cycle count is more noisy with a 0.15%
decrease not being significant. SPEC2k17 shows a small decrease (-0.2%)
in instruction count leading to a -0.296% decrease in cycle count. These
results are within noise margins but tend to show a small improvement in
general.
When specifying an Apple target, clang will set "-target-cpu apple-a7"
on the command line, so should not be affected by this change when
running from clang. This also doesn't enable more runtime unrolling like
-mcpu=cortex-a55 does, only changing the schedule used.
A lot of existing tests have updated. This is a summary of the important
differences:
- Most changes are the same instructions in a different order.
- Sometimes this leads to very minor inefficiencies, such as requiring
an extra mov to move variables into r0/v0 for the return value of a test
function.
- misched-fusion.ll was no longer fusing the pairs of instructions it
should, as per D110561. I've changed the schedule used in the test
for now.
- neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to
the different latencies. This seems fine to me.
- Some SVE tests do not always remove movprfx where they did before due
to different register allocation giving different destructive forms.
- The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll
produce two LDR where they previously produced an LDP due to
store-pair-suppress kicking in.
- arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD.
- Some tests such as arm64-neon-mul-div.ll and
ragreedy-local-interval-cost.ll have more, less or just different
spilling.
- In aarch64_generated_funcs.ll.generated.expected one part of the
function is no longer outlined. Interestingly if I switch this to use
any other scheduled even less is outlined.
Some of these are expected to happen, such as differences in outlining
or register spilling. There will be places where these result in worse
codegen, places where they are better, with the SPEC instruction counts
suggesting it is not a decrease overall, on average.
Differential Revision: https://reviews.llvm.org/D110830
This patch modifies the testcase to use error substitution so it will pass on all platforms.
Reviewed By: fanbo-meng, muiez
Differential Revision: https://reviews.llvm.org/D111320
For some transformations like hot-cold split or coro split, it can outline its part of function ranges. Since sample loader is the early stage of backend and no split happens at that time, compiler can't recognize those function, so in llvm-profgen we should attribute the sample to the original function. This is already done for the body range samples since we use the symbols from dwarf which is created before the split.
But for branch samples, the call from master function to its outlined function is actually not a call to the original function, we shouldn't add head/callsie samples for it. So instead of dwarf symbol, we use the symbols from symbol table and ignore those functions with special suffixes(like `.cold` ,`.resume`) for accumulating the callsite/head samples.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110864
In the command guide --prefix and --prefix-strip is used in the form
--prefix=<prefix> however currently it is used in the form --prefix
<prefix>. This change fixes these options to match the command guide.
Differential Revision: https://reviews.llvm.org/D110551
This is to account for the change that made CountersPtr in __profd_
relative which landed in a1532ed275.
That change hasn't updated the raw profile version, and while the
profile layout stayed the same, profiles generated by tip-of-tree
LLVM are incompatible with 13.x tooling.
Differential Revision: https://reviews.llvm.org/D111123
The test llvm\test\tools\llvm-cxxfilt\delimiters.test started failling when run
from cmd.exe on Windows after D110986 which added a unicode character (⦙) to it.
Piping the unicode character in cmd.exe causes it to be converted to a '?'.
That causes the test to fail because the llvm-cxxfilt output becomes Foo?Bar
rather than the expected Foo⦙Bar.
Redirect the echo output to and from a temporary file to get around this
problem.
It's not entirely clear what the root cause is, but two separate downstream
builders are tripping up on this, so we are landing the work around for the
time being.
Differential Revision: https://reviews.llvm.org/D111072
This change adds duplication factor multiplier while accumulating body samples for line-number based profile. The body sample count will be `duplication-factor * count`. Base discriminator and duplication factor is decoded from the raw discriminator, this requires some refactor works.
Differential Revision: https://reviews.llvm.org/D109934
Both ports are required for BitScan ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.
D104366 introduced a new llvm-cxxfilt test with non-ASCII characters,
which caused a failure on llvm-clang-x86_64-expensive-checks-win
builder, with a stack trace suggesting issue in a call to isalnum.
The argument to isalnum should be either EOF or a value that is
representable in the type unsigned char. The llvm-cxxfilt does not
perform a cast from char to unsigned char before the call, so the
value might be out of valid range.
Replace the call to isalnum with isAlnum from StringExtras, which takes
a char as the argument. This also makes the check independent of the
current locale.
Differential Revision: https://reviews.llvm.org/D110986
Summary:
for xcoff :
implement the getSymbolFlag and getSymbolType() for option --syms.
llvm-objdump --sym , if the symbol is label, print the containing section for the symbol too.
when using llvm-objdump --sym --symbol--description, print the symbol index and qualname for symbol.
for example:
--symbol-description
00000000000000c0 l .text (csect: (idx: 2) .foov[PR]) (idx: 3) .foov
and without --symbol-description
00000000000000c0 l .text (csect: .foov) .foov
Reviewers: James Henderson,Esme Yi
Differential Revision: https://reviews.llvm.org/D109452
When replacing function calls, skip call instructions where the old
function is not the called function, but e.g. the old function is passed
as an argument.
This fixes a crash due to trying to construct invalid IR for the test
case.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109759
The ReduceMetadata pass before this patch removed metadata on a per-MDNode (or NamedMDNode) basis. Either all references to an MDNode are kept, or all of them are removed. However, MDNodes are uniqued, meaning that references to MDNodes with the same data become references to the same MDNodes. As a consequence, e.g. tbaa references to the same type will all have the same MDNode reference and hence make it impossible to reduce only keeping metadata on those memory access for which they are interesting.
Moreover, MDNodes can also be referenced by some intrinsics or other MDNodes. These references were not considered for removal leading to the possibility that MDNodes are not actually removed even if selected to be removed by the oracle.
This patch changes ReduceMetadata to reduces based on removable metadata references instead. MDNodes without references implicitly dropped anyway. References by intrinsic calls should be removed by ReduceOperands or ReduceInstructions. References in other MDNodes cannot be removed as it would violate the immutability of MDNodes.
Additionally, ReduceMetadata pass before this patch used `setMetadata(I, NULL)` to remove references, where `I` is the index in the array returned by `getAllMetadata`. However, `setMetadata` expects a MDKind (such as `MD_tbaa`) as first argument. `getAllMetadata` does not return those in consecutive order (otherwise it would not need to be a `std::pair` with `first` representing the MDKind).
Reviewed By: aeubanks, swamulism
Differential Revision: https://reviews.llvm.org/D110534
Cortex-A55 has 2 64bit NEON vector units, meaning a 128bit instruction
requires taking both units (and can only be issued as the first
instruction in a dual issue pair). This patch models that by splitting
the WriteV SchedWrite into two - the WriteVd that reads/writes only
64bit operands, and the WriteVq that read/writes 128bit registers. The
A55 schedule then uses this distinction to model the WriteVq as taking
both resource units, and starting a Schedule Group and WriteVd as taking
one as before.
I believe this is more correct, even if it does not lead to much better
performance.
Differential Revision: https://reviews.llvm.org/D108766
As for now, llvm-objcopy renames only sections that are specified
explicitly in --rename-section, while GNU objcopy keeps names of
relocation sections in sync with their targets. For example:
> readelf -S test.o
...
[ 1] .foo PROGBITS
[ 2] .rela.foo RELA
> objcopy --rename-section .foo=.bar test.o gnu.o
> readelf -S gnu.o
...
[ 1] .bar PROGBITS
[ 2] .rela.bar RELA
> llvm-objcopy --rename-section .foo=.bar test.o llvm.o
> readelf -S llvm.o
...
[ 1] .bar PROGBITS
[ 2] .rela.foo RELA
This patch makes llvm-objcopy to match the behavior of GNU objcopy better.
Differential Revision: https://reviews.llvm.org/D110352
Some tests with binary IDs would fail with error: no profile can be merged.
This is because raw profiles could have unaligned headers when emitting binary
IDs. This means padding should be emitted after binary IDs are emitted to
ensure everything else is aligned. This patch adds padding after each binary ID
to ensure the next binary ID size is 8-byte aligned. This also adds extra
checks to ensure we aren't reading corrupted data when printing binary IDs.
Differential Revision: https://reviews.llvm.org/D110365
* Add a newline before `DYNAMIC RELOCATION RECORDS` (see D101796)
* Add the missing `OFFSET TYPE VALUE` line
* Align columns
Note: llvm-readobj/ELFDumper.cpp `loadDynamicTable` has sophisticated PT_DYNAMIC
code which is unavailable in llvm-objdump.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D110595
On MIPS, functions with exception handling code emits an additional
temporary label at the start of the function (due to UseAssignmentForEHBegin):
_Z8do_catchv: # @_Z8do_catchv
.Ltmp3:
.set .Lfunc_begin0, .Ltmp3
.cfi_startproc
.cfi_personality 128, DW.ref.__gxx_personality_v0
.cfi_lsda 0, .Lexception0
.frame $c11,48,$c17
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
.set noat
# %bb.0: # %entry
The `[^:]*` regex was terminating the search after .Ltmp<N>: and therefore
not detecting functions with exception handling.
Reviewed By: atanasyan, MaskRay
Differential Revision: https://reviews.llvm.org/D100027
The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.
Differential Revision: https://reviews.llvm.org/D107969
Clang will encode names that should be able to be simplified as
"_STNname|<template, args>" (eg: "_STNt1|<int>") - this verification
mode will detect these names, decode them, create the original name
("t1<int>") and the simple name ("t1") - letting the simple name run
through the usual rebuilding logic - then compare the two sources of the
full name - the rebuilt and the _STN encoding.
This helps ensure that -gsimple-template-names is lossless.
When determining the incompleteness of a DIE based on its children, make
sure we propagate it across union types. See test case for an example.
Without this patch we never emit the definition of Container_ivars.
Differential revision: https://reviews.llvm.org/D110443
In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function.
This change also extends the current `findDisjointRanges` method which now can support adding zero-value range.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D109713
This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-fprofile-sample-use=[profile]`.
After range and branch counters are extracted from the LBR sample, here we go through each addresses for symbolization, create FunctionSamples and populate its sub fields like TotalSamples, BodySamples and HeadSamples etc. For inlined code, as we need to map back to original code, so we always add body samples to the leaf frame's function sample.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109551
Similar to https://reviews.llvm.org/D109637, there is a whole invalid line of message in perfscript.
```
warning: Invalid address in LBR record at line 14118674: Processed 14138923 events and lost 1 chunks!
warning: Invalid address in LBR record at line 14118676: Check IO/CPU overload!
```
This only happened for LBR only perfscript, hybridperfscript have a check of " 0x" to make sure it's the LBR perf line.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110424
Testing on a SLM box suggests these can run on either port, but the throughput is 4cy on either (inc MMX versions). Confirmed with Intel AoM / Agner / InstLatX64.
Without preinliner, we need to tune down the cold count cutoff to merge/trim more context to limit profile size for large components. However it doesn't make sense for cold threshold to be higher than hot threshold, so we now change to use hot threshold as merging/trimming cut off instead.
Differential Revision: https://reviews.llvm.org/D110212
For large app, dumping disasm of the whole program can be slow and result in gianant output. Adding a switch to dump specific symbols only.
Reviewed By: wlei
Differential Revision: https://reviews.llvm.org/D110079
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110209
Make the update_llc_test_checks script test independant of llc behavior
by using cat with static files to simulate llc output.
This allows changing llc without breaking the script test case.
The update script is executed in a temporary directory, so the
llc-generated assembly files are copied there. %T is deprecated, but it
allows copying a file with a predictable filename.
Differential Revision: https://reviews.llvm.org/D110143
Previously the script emitted output using plain CHECK directives. This
can result in a test passing even if there are some instructions between
CHECK directives that should have been removed. It also makes debugging
tests that have the output in a different order more difficult since
FileCheck can match with a later line and then complain about the "wrong"
directive not being found.
This will cause quite large diffs when updating existing tests, but I'm not sure we need an opt-in flag here.
Depends on D109765 (pre-commit tests)
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D109767
This should probably be rendered as "std::nullptr_t" but for now clang
uses the unqualified name (which is ambiguous with possible user defined
name in the global namespace), so match that here.
Both ports are required in most cases. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well.
Noticed while trying to improve fp costs for vectorization via the D103695 helper script.
Move most type tests to a pre-generated assembly file to make it easier
to add more weird cases without having to hand craft more DWARF.
Move the novel array types that aren't reachable via clang-generated
DWARF to a separate file for easy maintenance.
Both ports are required, for reg and mem variants - we can also use the WriteFComX class directly and remove the unnecessary InstRW overrides. Matches what Intel AoM / Agner / InstLatX64 report as well.
The MMX pack/unpck shuffles don't need an override - they have the same behaviour as other shuffles (Port0 only).
The SSE pslldq/psrldq shuffles don't need an override - they have the same behaviour as other shuffles (Port0 only).
The SSE pshufb shuffles use 4uops (+1 load).
Noticed the pslldq/psrldq issue while trying to improve reduction costs via the D103695 helper script, and fixed the others while reviewing. Confirmed with Intel AoM / Agner / InstLatX64.
Turn on `use-context-cost-for-preinliner` to use context-sensitive byte size cost for preinliner decisions by default.
This is a more accurate proxy of inline cost than profile size. We tested on our large workload that it delivers measureable CPU improvement.
Differential Revision: https://reviews.llvm.org/D109893
This check should ensure we don't reproduce the problem fixed by
02df443d28
More accurately, it checks every llvm::Any::TypeId symbol in libLLVM-x.so and
make sure they have weak linkage and are not local to the library, which would
lead to duplicate definition if another weak version of the symbol is defined in
another linked library.
Differential Revision: https://reviews.llvm.org/D109252
Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D109638
This syncs parts from the x86 implementation to the ARMWinEH
implementation.
Currently, neither of the compilers targeting COFF/arm64 (MSVC, LLVM)
produce such relocations, but LLVM might after a later patch.
Differential Revision: https://reviews.llvm.org/D109650
This is the same as we do on arm64 already for the MSVC style label
symbols, but also handle the way GCC produces it - with all relocations
pointing at the .text section symbol, with various offsets.
Differential Revision: https://reviews.llvm.org/D109649
Summary: Add the SectionIndex field for symbol.
1: a symbol can reference a section by SectionName or SectionIndex.
2: a symbol can reference a section by both SectionName and SectionIndex.
3: if both Section and SectionIndex are specified, but the two values refer
to different sections, an error will be reported.
4: an invalid SectionIndex is allowed.
5: if a symbol references a non-existent section by SectionName, an error will be reported.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D109566
Summary: The patch adds support for yaml2obj customizing the string table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D107421
The packed variants of the instructions had been modelled as the same as the scalar variants.
Reported during a run of llvm-exegesis on a cheap SLM box and matches what Agner / InstLatX64 report as well.
The re-use of this struct across iterations of the loop was causing
fields (specifically Name) to be incorrectly shared between multiple
sections.
Differential Revision: https://reviews.llvm.org/D108984
D78776 removed is{Call,Branch,UnconditionalBranch} guards in objdump
before calling MCInstrAnalysis::evaluateBranch. This is fine for other
architectures as they gracefully handle evaluateBranch being called on
non-branches. However, the Lanai MCInstrAnalysis implementation didn't
and that change caused it to crash.
This inserts the same guards back into Lanai's evaluateBranch
implementation and adds a smoke test that exercises `llc | objdump` so
this kind of regression is hopefully caught next time.
Reviewed By: jpienaar, MaskRay
Differential Revision: https://reviews.llvm.org/D107593
If the number of directories was 6 (equal to the DEBUG_DIRECTORY
index), patchDebugDirectory() was run even though the debug directory
is actually the 7th entry. Use <= in the comparison to fix that.
This fixes https://llvm.org/PR51243
Differential Revision: https://reviews.llvm.org/D106940
Reviewed by: jhenderson
Allow variable number of directories, as allowed by the
specification. NumberOfRvaAndSize will default to 16 if not specified,
as in the past.
Reviewed by: jhenderson
Differential Revision: https://reviews.llvm.org/D108825
format.
Currently when we add a new section in the profile format and generate a profile
containing the new section, older compiler which reads the new profile will
issue an error. The forward incompatibility can cause unnecessary churn when
extending the profile. This patch removes the incompatibility when adding a new
section for extbinary format.
Differential Revision: https://reviews.llvm.org/D109398
Print relocations interleaved with disassembled instructions for
executables with relocatable sections, e.g. those built with "-Wl,-q".
Differential Revision: https://reviews.llvm.org/D109016
Currently native clusterization simply groups all benchmarks
by the opcode of key instruction, but that is suboptimal in certain cases,
e.g. where we can already tell that the particular instructions
already resolve into different sched classes.
Move inverse_throughput, latency and uops to sub-directories (like we already do for lbr), which require libpfm, so we can relax the lit limits for analysis tests in the x86 root directory.
Differential Revision: https://reviews.llvm.org/D109353
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.
This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.
We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.
Reviewers: paquette, yroux
Differential Revision: https://reviews.llvm.org/D106989
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.
I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.
But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.
I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.
But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.
For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM:
"The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and
store instructions go through addresses generation phase in program order to avoid on-the-fly memory
ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions."
Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.
These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with).
Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.