This change adds debug information about whether PGO is being used or
not.
Microsoft performance tooling (e.g. xperf, WPA) uses this information to
show whether functions are optimized with PGO or not, as well as whether
PGO information is invalid.
This information is useful for validating whether training scenarios are
providing good coverage of real world scenarios, showing if profile data
is out of date, etc.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99994
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.
(Already remarked by jhenderson on D70769.)
No behavior change.
Differential Revision: https://reviews.llvm.org/D100957
We saw some big compiling time impact after enabling the debug entry value
feature for X86 platform(D73534). Compiling time goes from 900s->1600s with
our testcase. It is caused by allocating/freeing the memory busily.
'using FwdRegWorklist = MapVector<unsigned, SmallVector<FwdRegParamInfo, 2>>;'
The value for this map is vector, and we miss the reference when access the
element. The same happens for `auto CalleesMap = MF->getCallSitesInfo();` which is a DenseMap.
Reviewed by: djtodoro, flychen50
Differential Revision: https://reviews.llvm.org/D100162
Summary: Set the default DwarfInlinedStrings as inlined strings for DBX, due to DBX does not support .dwstr section for now.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D99933
Negative numbers are represented using DW_OP_consts along with signed representation
of the number as the argument.
Test case IR is generated using Fortran front-end.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99273
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.
This patch attempts to clarify the situation by separating them and introducing
new names:
- shouldRealignStack - true if there is any reason the stack should be
realigned
- canRealignStack - true if we are still able to realign the stack (e.g. we
can still reserve/have reserved a frame pointer)
- hasStackRealignment = shouldRealignStack && canRealignStack (not target
customisable)
Targets can now override shouldRealignStack to indicate that stack realignment
is required.
This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).
Differential Revision: https://reviews.llvm.org/D98716
Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
This is needed for Fortran assumed shape arrays whose dimensions are
defined as,
- 'count' is taken from array descriptor passed as parameter by
caller, access from descriptor is defined by type DIExpression.
- 'lowerBound' is defined by callee.
The current alternate way represents using upperBound in place of
count, where upperBound is calculated in callee in a temp variable
using lowerBound and count
Representation with count (DIExpression) is not only clearer as
compared to upperBound (DIVariable) but it has another advantage that
variable count is accessed by being parameter has better chance of
survival at higher optimization level than upperBound being local
variable.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99335
Empty functions (functions with no real code) are irrelevant for propeller optimizations and their addresses sometimes conflict with other functions which obfuscates the analysis.
This simple change skips the BB address map emission for such functions.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D99395
This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack.
This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D96918
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
For locally scoped lambdas like this there's no particular benefit to
explicitly listing captures - or avoiding capturing this. Switch to [&]
and make it all easier to maintain.
(& driveby change std::function to llvm::function_ref)
This patch allows DBG_VALUE_LIST instructions to be emitted to DWARF with valid
DW_AT_locations. This change mainly affects DbgEntityHistoryCalculator, which
now tracks multiple registers per value, and DwarfDebug+DwarfExpression, which
can now emit multiple machine locations as part of a DWARF expression.
Differential Revision: https://reviews.llvm.org/D83495
A symbol being redefined as a label is something that can happen as a result of
ordinary input, so it shouldn't cause a fatal error. Also adjust the error
message to match the one you get when a symbol is redefined as a variable.
Differential Revision: https://reviews.llvm.org/D98181
Rewrites test to use correct architecture triple; fixes incorrect
reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour
so as to not be dependent on changes elsewhere in the patch stack.
This reverts commit d2000b45d0.
This patch adds a new instruction that can represent variadic debug values,
DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set
of basic code changes in MachineInstr and a few adjacent areas, but does not
correctly handle variadic debug values outside of these areas, nor does it
generate them at any point.
The new instruction is similar to the existing DBG_VALUE instruction, with the
following differences: the operands are in a different order, any number of
values may be used in the instruction following the Variable and Expression
operands (these are referred to in code as “debug operands”) and are indexed
from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a
DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the
expression.
The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a
DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the
index given by the argument onto the Expression stack. For example the
sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at
index 0 onto the expression stack.”
Differential Revision: https://reviews.llvm.org/D82363
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.
* Kill LLVMContext inlineasm diagnose handler and migrate it to use
DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
handler; if LLVMContext is not available, MCContext uses its own default
diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D97449
remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we
can make caller of emitDwarfUnitLength easier.
Reviewed By: MaskRay, dblaikie, ikudrin
Differential Revision: https://reviews.llvm.org/D96409
We may need to do some customization for DWARF unit length in DWARF
section headers for some targets for some code generation path.
For example, for XCOFF in assembly path, AIX assembler does not require
the debug section containing its debug unit length in the header.
Move emitDwarfUnitLength to MCStreamer class so that we can do
customization in different Streamers
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D95932
D94835 added support for WinEH to export public symbols pointing to
basic blocks which are catchret targets for use with Windows CET.
Wasm currently doesn't support public symbols to non-function code
addresses (they get treated like new functions in asm but then don't
lower to object files correctly).
It created them unconditionally for all catchret targets.
This change disables those symbols unless the exceptionHandlingType
is WinEH (since they aren't used with ExceptionHandling::Wasm)
Differential Revision: https://reviews.llvm.org/D96824
This allows the option to affect the LTO output. Module::Max helps to
generate debug info for all modules in the same format.
Differential Revision: https://reviews.llvm.org/D96597
Basic block sections enables function sections implicitly, this is not needed
and is inefficient with "=list" option.
We had basic block sections enable function sections implicitly in clang. This
is particularly inefficient with "=list" option as it places functions that do
not have any basic block sections in separate sections. This causes unnecessary
object file overhead for large applications.
This patch disables this implicit behavior. It only creates function sections
for those functions that require basic block sections.
Further, there was an inconistent behavior with llc as llc was not turning on
function sections by default. This patch makes llc and clang consistent and
tests are added to check the new behavior.
This is the first of two patches and this adds functionality in LLVM to
create a new section for the entry block if function sections is not
enabled.
Differential Revision: https://reviews.llvm.org/D93876
This change introduces support for zero flag ELF section groups to LLVM.
LLVM already supports COMDAT sections, which in ELF are a special type
of ELF section groups. These are generally useful to enable linker GC
where you want a group of sections to always travel together, that is to
be either retained or discarded as a whole, but without the COMDAT
semantics. Other ELF assemblers already support zero flag ELF section
groups and this change helps us reach feature parity.
Differential Revision: https://reviews.llvm.org/D95851
This patch enables AsmPrinter support for complex expression with
entry values. It shouldn't AsmPrinter's call whether these are safe or
not but the pass who introduces the DW_OP_LLVM_entry_value. This patch
on its own has no effect on clang.
Differential Revision: https://reviews.llvm.org/D96559
In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling.
This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker.
This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag.
The change includes a test that when the module flag is enabled the section is correctly generated.
The set of exception continuation information includes returns from exceptional control flow (catchret in llvm).
In order to collect catchret we:
1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation,
2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and
3) Combines these targets with the other EHCont targets that were already being collected.
Change originally authored by Daniel Frampton <dframpto@microsoft.com>
For more details, see MSVC documentation for `/guard:ehcont`
https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D94835
This patch hides the logic for setting the location kind of an entry
value inside the begin/finalize/cancel functions. This way we get rid
the strange workaround that is currently in setLocation().
In the future, this will allow us to set the location kind of the
entry value independently from the location kind of the main
expression.
Differential Revision: https://reviews.llvm.org/D96554
Originally landed in ddc2f1e3fb and reverted in d32deaab4d because of
a Generic test objecting. That was fixed up in 013613964f. Original
landing commit message follows:
[DWARF] Location-less inlined variables should not have DW_TAG_variable
Discussed in this thread:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html
DwarfDebug::collectEntityInfo accidentally distinguishes between variable
locations that never have a location specified, and variable locations that
have an empty location specified. The latter leads to the creation of an
empty variable referring to the abstract origin.
Fix this by seeking a non-empty location before producing a concrete
entity, to guarantee a DW_AT_location will be produced. Other loops in
collectEntityInfo and endFunctionImpl take care of examining the
retainedNodes collection and ensuring optimised-out variables are created.
Differential Revision: https://reviews.llvm.org/D95617
Maskray has reported a fault with .debug_gnu_pubnames in the comments on
D94976, caused by this patch, reverting to investigate.
This reverts commit 8998f58435.
Backing out this workaround to focus on fixing whatever's wrong with
.debug_gnu_pubnames, I'll revert the cause, (8998f584) in the next commit.
This reverts commit 56fa34ae35.
GNU ld>=2.36 supports mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an
output section, so we can set SHF_LINK_ORDER if -fbinutils-version=2.36 or above.
If -fno-function-sections or older binutils, drop unique ID for -fno-unique-section-names.
The users can just specify -fbinutils-version=2.36 or above to allow GC with both GNU ld and LLD.
(LLD does not support garbage collection of non-group non-SHF_LINK_ORDER .gcc_except_table sections.)
This matches GCC behavior when the configure-time binutils is new. GNU ld<2.36
did not support mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an
output section, so we conservatively disable SHF_LINK_ORDER for <2.36.
Previously the code split the string at the first '<', which
incorrectly truncated names like `operator<`.
Differential Revision: https://reviews.llvm.org/D95893
`-flto -gsplit-dwarf -g -O[123]` may create .debug_gnu_pubnames with 0 DIE
offset entries. llvm-dwarfdump -debug-gnu-pubnames/ld.lld --gdb-index errors for that.
```
.section .debug_gnu_pubnames,"",@progbits
.long .LpubNames_end2-.LpubNames_begin2 # Length of Public Names Info
.LpubNames_begin2:
.short 2 # DWARF Version
.long .Lcu_begin2 # Offset of Compilation Unit Info
.long 57 # Compilation Unit Length
.long 0 # DIE offset
.byte 16 # Attributes: TYPE, EXTERNAL
.asciz "absl" # External Name
.long 0 # DIE offset
.byte 16 # Attributes: TYPE, EXTERNAL
.asciz "absl::base_internal" # External Name
.long 0 # End Mark
```
This modified patch avoids redirecting the unit in which a subprogram is
created if type units are enabled -- DIEs were getting children allocated
from different units memory pools. Original commit message:
[DWARF] Create subprogram's DIE in DISubprogram's unit
This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted
to be shared between CUs. However, the creation of a subprogram DIE can be
triggered early, from other CUs. The subprogram definition is then created
in one CU, and when the function is actually emitted children are attached
to the subprogram that expect to be in another CU. This breaks internal CU
references in the children.
Fix this by redirecting the creation of subprogram DIEs in
getOrCreateContextDIE to the CU specified by it's DISubprogram definition.
This ensures that the subprogram DIE is always created in the correct CU.
Differential Revision: https://reviews.llvm.org/D94976
Discussed in this thread:
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html
DwarfDebug::collectEntityInfo accidentally distinguishes between variable
locations that never have a location specified, and variable locations that
have an empty location specified. The latter leads to the creation of an
empty variable referring to the abstract origin.
Fix this by seeking a non-empty location before producing a concrete
entity, to guarantee a DW_AT_location will be produced. Other loops in
collectEntityInfo and endFunctionImpl take care of examining the
retainedNodes collection and ensuring optimised-out variables are created.
Differential Revision: https://reviews.llvm.org/D95617
This reverts commit ef0dcb5063.
This change is causing a lot of compiler crashes inside, sorry I don't have a
small repro/stacktrace with symbols to share right now.
Differential Revision: https://reviews.llvm.org/D95622
Experimental, using non-existent DWARF support to use an expr for the
location involving an addr_index (to compute address + offset so
addresses can be reused in more places).
The global variable debug info had to be deferred until the end of the
module (so bss variables would all be emitted first - so their labels
would have the relevant section). Non-bss variables seemed to not have
their label assigned to a section even at the end of the module, so I
didn't know what to do there.
Also, the hashing code is broken - doesn't know how to hash these
expressions (& isn't hashing anything inside subprograms, which seems
problematic), so for test purposes this change just skips the hash
computation. (GCC's actually overly sensitive in its hash function, it
seems - I'm forgetting the specific case right now - anyway, we might
want to just use the frontend-known file hash and give up on optimistic
.dwo/.dwp reuse)
This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted
to be shared between CUs. However, the creation of a subprogram DIE can be
triggered early, from other CUs. The subprogram definition is then created
in one CU, and when the function is actually emitted children are attached
to the subprogram that expect to be in another CU. This breaks internal CU
references in the children.
Fix this by redirecting the creation of subprogram DIEs in
getOrCreateContextDIE to the CU specified by it's DISubprogram definition.
This ensures that the subprogram DIE is always created in the correct CU.
Differential Revision: https://reviews.llvm.org/D94976
static_cast for uint64_t to unsigned gives a MS VC build warning
for Windows:
warning C4309: 'static_cast': truncation of constant value
Use an explicit cast instead.
Change-Id: I692d335b4913070686a102780c1fb05b893a2f69
Differential Revision: https://reviews.llvm.org/D94592
The size of spill/reload may be unknown for scalable vector types.
When the size is unknown, print it as "Unknown-size" instead of a very
large number.
Differential Revision: https://reviews.llvm.org/D94299
This removes `exnref` type and `br_on_exn` instruction. This is
effectively NFC because most uses of these were already removed in the
previous CLs.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94041
A struct in C passed by value did not get debug information. Such values are currently
lowered to a Wasm local even in -O0 (not to an alloca like on other archs), which becomes
a Target Index operand (TI_LOCAL). The DWARF writing code was not emitting locations
in for TI's specifically if the location is a single range (not a list).
In addition, the ExplicitLocals pass which removes the ARGUMENT pseudo instructions did
not update the associated DBG_VALUEs, and couldn't even find these values since the code
assumed such instructions are adjacent, which is not the case here.
Also fixed asm printing of TIs needed by a test.
Differential Revision: https://reviews.llvm.org/D94140
When using dbg.declare, the debug-info is generated from a list of
locals rather than through DBG_VALUE instructions in the MIR.
This patch is different from D90020 because it emits the DWARF
location expressions from that list of locals directly.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D90044
Given the ability provided by DWARFv5 rnglists to reuse addresses in the
address pool, it can be advantageous to object file size to use range
encodings even when the range could be described by a direct low/high
pc.
Add a flag to allow enabling this in DWARFv5 for the purpose of
experimentation/data gathering.
It might be that it makes sense to enable this functionality by default
for DWARFv5 + Split DWARF at least, where the tradeoff/desire to
optimize for .o file size is more explicit and .o bytes are higher
priority than .dwo bytes.
Current implementation assumes that, each MachineConstantPoolValue takes
up sizeof(MachineConstantPoolValue::Ty) bytes. For PowerPC, we want to
lump all the constants with the same type as one MachineConstantPoolValue
to save the cost that calculate the TOC entry for each const. So, we need
to extend the MachineConstantPoolValue that break this assumption.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D89108
The idea is that the CC1 default for ELF should set dso_local on default
visibility external linkage definitions in the default -mrelocation-model pic
mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar.
The refactoring is made available by 2820a2ca3a.
Currently only x86 supports local aliases. We move the decision to the driver.
There are three CC1 states:
* -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable.
* (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local
* -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out.
Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.
Currently using DW_OP_implicit_value in fragments produces invalid DWARF
expressions. (Such a case can occur in complex floats, for example.)
This problem manifests itself as a missing DW_OP_piece operation after
the last fragment. This happens because the function for printing
constant float value skips printing the accompanying DWARF expression,
as that would also print DW_OP_stack_value (which is not desirable in
this case). However, this also results in DW_OP_piece being skipped.
The reason that DW_OP_piece is missing only for the last piece is that
the act of printing the next fragment corrects this. However, it does
that for the wrong reason -- the code emitting this DW_OP_piece thinks
that the previous fragment was missing, and so it thinks that it needs
to skip over it in order to be able to print itself.
In a simple scenario this works out, but it's likely that in a more
complex setup (where some pieces are in fact missing), this logic would
go badly wrong. In a simple setup gdb also seems to not mind the fact
that the DW_OP_piece is missing, but it would also likely not handle
more complex use cases.
For this reason, this patch disables the usage of DW_OP_implicit_value
in the frament scenario (we will use DW_OP_const*** instead), until we
figure out the right way to deal with this. This guarantees that we
produce valid expressions, and gdb can handle both kinds of inputs
anyway.
Differential Revision: https://reviews.llvm.org/D92013
The main change is to add a 'IsDecl' field to DIModule so
that when IsDecl is set to true, the debug info entry generated
for the module would be marked as a declaration. That way, the debugger
would look up the definition of the module in the gloabl scope.
Please see the comments in llvm/test/DebugInfo/X86/dimodule.ll
for what the debug info entries would look like.
Differential Revision: https://reviews.llvm.org/D93462
SUMMARY:
In order for the runtime on AIX to find the compact unwind section(EHInfo table),
we would need to set the following on the traceback table:
The 6th byte's longtbtable field to true to signal there is an Extended TB Table Flag.
The Extended TB Table Flag to be 0x08 to signal there is an exception handling info presents.
Emit the offset between ehinfo TC entry and TOC base after all other optional portions of traceback table.
The patch is authored by Jason Liu.
Reviewers: David Tenty, Digger Lin
Differential Revision: https://reviews.llvm.org/D92766
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.
**ELF object emission**
The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
The format of `.pseudo_probe_desc` section looks like:
```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```
To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
**Assembling**
Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
A example assembly looks like:
```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```
With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```
**Disassembling**
We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
An example disassembly looks like:
```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91878
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.
**ELF object emission**
The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
The format of `.pseudo_probe_desc` section looks like:
```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```
To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
**Assembling**
Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
A example assembly looks like:
```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```
With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```
**Disassembling**
We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
An example disassembly looks like:
```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91878
If a function parameter is marked as "undef", prevent creation
of CallSiteInfo for that parameter.
Without this patch, the parameter's call_site_value would be incorrect.
The incorrect call_value case reported in PR39716,
addressed in D85111.
Patch by Nikola Tesic
Differential revision: https://reviews.llvm.org/D92471
This patch makes DWARF writer emit DW_AT_string_length using
the stringLengthExp operand of DIStringType.
This is part of the effort to add debug info support for
Fortran deferred length strings.
Also updated the tests to exercise the change.
Differential Revision: https://reviews.llvm.org/D92412
Notes about a few declarations:
* LiveVariables::RegisterDefIsDead: deleted by r47927
* createForwardControlFlowIntegrityPass, createJumpInstrTablesPass: deleted by r230780
* RegScavenger::setLiveInsUsed: deleted by r292543
* ScheduleDAGInstrs::{toggleKillFlag,startBlockForKills}: deleted by r304055
* Localizer::shouldLocalize: remnant of D75207
* DwarfDebug::addSectionLabel: deleted by r373273
Summary:
Not all system assembler supports `.uleb128 label2 - label1` form.
When the target do not support this form, we have to take
alternative manual calculation to get the offsets from them.
Reviewed By: hubert.reinterpretcast
Diffierential Revision: https://reviews.llvm.org/D92058
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
It doesn't use the GCC personality rountine, because the
interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D91455
In https://reviews.llvm.org/D89072 I added static const data members
to the debug subsection for globals. It skipped emitting an S_CONSTANT if it
didn't have a value, which meant the subsection could be empty.
This patch fixes the empty subsection issue.
Differential Revision: https://reviews.llvm.org/D92049
This reapplies 36c64af9d7 in updated
form.
Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)
The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.
Differential Revision: https://reviews.llvm.org/D87448
This patch moves the selection of the style used to emit the numbers
(DW_OP_implicit_value vs. DW_OP_const+DW_OP_stack_value) into
DwarfExpression::addUnsignedConstant. This logic is not FP-specific, and
it will be needed for large integers too.
The refactor also makes DW_OP_implicit_value (DW_OP_stack_value worked
already) be used for floating point constants other than float and
double, so I've added a _Float16 test for it.
Split off from D90916.
Differential Revision: https://reviews.llvm.org/D91058
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.
The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.
Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.
Reviewed By: tejohnson, MaskRay, jpienaar
Differential Revision: https://reviews.llvm.org/D91410
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.
When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.
In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
advantage of `dso_local_equivalent` for relative references
This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.
See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.
Differential Revision: https://reviews.llvm.org/D77248
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D87544
This patch uses the new `getMnemonic` helper from D90039
to display mnemonics instead of the internal opcodes.
The main motivation behind using the mnemonics is that they
are more user-friendly and more directly related to the assembly
the users will be presented.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D90040
The test fails on Mac, see comment on the code review.
> This option was in a rather convoluted place, causing global parameters
> to be set in awkward and undesirable ways to try to account for it
> indirectly. Add tests for the -disable-debug-info option and ensure we
> don't print unintended markers from unintended places.
>
> Reviewed By: dstenb
>
> Differential Revision: https://reviews.llvm.org/D91083
This reverts commit 9606ef03f0.
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
This option was in a rather convoluted place, causing global parameters
to be set in awkward and undesirable ways to try to account for it
indirectly. Add tests for the -disable-debug-info option and ensure we
don't print unintended markers from unintended places.
Reviewed By: dstenb
Differential Revision: https://reviews.llvm.org/D91083
This broke both Firefox and Chromium (PR47905) due to what seems like dllimport
function not being handled correctly.
> This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
> Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
>
> Reviewed By: rnk
>
> Differential Revision: https://reviews.llvm.org/D87544
This reverts commit cfd8481da1.