Similar to D97585.
D25456 used `S_ATTR_LIVE_SUPPORT` to ensure the data variable will be retained
or discarded as a unit with the counter variable, so llvm.compiler.used is
sufficient. It allows ld to dead strip unneeded profc and profd variables.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D105445
The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D107850
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).
If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.
The fix is straightforward: just keep non-private profd in such a `DataReferencedByCode` case.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.
Reviewed By: xur
Differential Revision: https://reviews.llvm.org/D108432
This reverts commit f653beea88.
It broke Windows coverage-inline.cpp because link.exe has a limitation
that external symbols in IMAGE_COMDAT_SELECT_ASSOCIATIVE don't work.
It essentially dropped the previous size optimization for coverage
because coverage doesn't rename comdat by default.
Needs more investigation what we should do.
The NS==0 condition used by D103717 missed a corner case: if the current copy
does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a
different CFG) may exist. This is super rare, but is possible with pre-inlining
PGO instrumentation (which can make a weak_odr function inlines its callees
differently, sometimes with value profiling while sometimes without).
If the current copy with private profd is prevailing, the non-prevailing copy
may get an undefined symbol if a caller inlining the non-prevailing function
references its profd. If the other copy with non-private profd is prevailing,
the current copy may cause a "relocation to discarded section" linker error.
The fix is straightforward: just keep non-private profd in this case.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 0.08% larger (172431496/172286720-1).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 0.026% larger.
The majority of D103717's benefits remains.
Reviewed By: xur
Differential Revision: https://reviews.llvm.org/D108432
The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D107850
The IRPGOFlag symbol (__llvm_profile_raw_version) is dropped when
identified as non-prevailing for either regular or thin LTO during
the mixed-LTO mode compilation. This happens in the module where
IRPGOFlag is marked as non-prevailing. This variable
is emitted in the final object from the prevailing module.
This is still problematic because we currently query this symbol
to coordinate some actions between PGOInstrumentation pass
and InstrProfiling lowering pass, like whether to do value
profiling, whether to do comdat renaming.
This problem is bought up by YolandaCY in
https://reviews.llvm.org/D107034
YolandCY reported unresolved symbol linker errors in
CSPGO instrumentation build for chromium.
This patch let LTO retain IRPGOFlag decl by adding it to
CompilerUsed list and relax the check in isIRPGOFlagSet() when
doing the InstrProfiling lowering.
The test case in the patch is from D107034
<https://reviews.llvm.org/D107034>.
Differential Revision: https://reviews.llvm.org/D108581
The COFF specific `DataReferencedByCode` complexity (D103372 D103717) is due to
a link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE is
not really dropped, so it can cause duplicate definition error.
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).
A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"
Differential Revision: https://reviews.llvm.org/D105007
This very occasionally causes to an assertion failure in the compiler.
Turning off until we can get to the bottom of this.
Reviewed By: hctim
Differential Revision: https://reviews.llvm.org/D108282
AttributeList::hasAttribute() is confusing. In an attempt to change the
name to something that suggests using other methods, fix up some
existing uses.
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX)
due to source changes (e.g. `#if` code runs for profile generation but not for profile use)
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.
Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.
Differential Revision: https://reviews.llvm.org/D104431
When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX).
To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option.
In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option.
Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D104431
When none of the translation units in the binary have been instrumented
we shouldn't need to link the profile runtime. However, because we pass
-u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still
be pulled in and incur some overhead. On Fuchsia which uses runtime
counter relocation, it also means that we cannot reference the bias
variable unconditionally.
This change modifies the InstrProfiling pass to pull in the profile
runtime only when needed by declaring the __llvm_profile_runtime symbol
in the translation unit only when needed. For now we restrict this only
for Fuchsia, but this can be later expanded to other platforms. This
approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation,
but that limitation may no longer apply, and it certainly doesn't apply
on platforms like Fuchsia.
Differential Revision: https://reviews.llvm.org/D98061
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
For a very large module, __llvm_gcov_reset can become very large.
__llvm_gcov_reset previously emitted stores to a bunch of globals in one
huge basic block. MemCpyOpt would turn many of these stores into
memsets, and updating MemorySSA would be extremely slow.
Verified that this makes the compile time of certain files go down
drastically (20min -> 5min).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107538
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.
Differential Revision: https://reviews.llvm.org/D107377
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.
```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo
# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo
# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```
(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059 # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```
The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.
Differential Revision: https://reviews.llvm.org/D104556
`StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant.
Reviewed By: vitalybuka, MTC
Differential Revision: https://reviews.llvm.org/D106741
This removes an abuse of ELF linker behaviors while keeping Mach-O/COFF linker
behaviors unchanged.
ELF: when module_ctor is in a comdat, this patch removes reliance on a linker
abuse (an SHT_INIT_ARRAY in a section group retains the whole group) by using
SHF_GNU_RETAIN. No linker behavior difference when module_ctor is not in a comdat.
Mach-O: module_ctor gets `N_NO_DEAD_STRIP`. No linker behavior difference
because module_ctor is already referenced by a `S_MOD_INIT_FUNC_POINTERS`
section (GC root).
PE/COFF: no-op. SanitizerCoverage already appends module_ctor to `llvm.used`.
Other sanitizers: llvm.used for local linkage is not implemented in
`TargetLoweringObjectFileCOFF::emitLinkerDirectives` (once implemented or
switched to a non-local linkage, COFF can use module_ctor in comdat (i.e.
generalize ELF-specific rL301586)).
There is no object file size difference.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D106246
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.
In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics. The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D106319
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted in which case the symbol selected
by the linker is going to depend on the order of inputs which can be
fragile.
This change replaces the use of weak definition inside the runtime with
a weak alias. We place the compiler generated symbol inside a COMDAT
group so dead definition can be garbage collected by the linker.
We also disable the use of runtime counter relocation on Darwin since
Mach-O doesn't support weak external references, but Darwin already uses
a different continous mode that relies on overmapping so runtime counter
relocation isn't needed there.
Differential Revision: https://reviews.llvm.org/D105176
SystemZ ABI requires zero-extending function parameters to 64-bit. The
compiler is free to optimize the code around this assumption, e.g.
failing to zero-extend __tsan_atomic32_load()'s morder may cause
crashes in to_mo() switch table lookup.
Fix by adding zeroext attributes to TSan's FunctionCallees, similar to
how it was done in commit 3bc439bdff ("[MSan] Add instrumentation for
SystemZ"). This is a no-op on arches that don't need it.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D105629
This patch fixes code that incorrectly handled dbg.values with duplicate
location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in
question were caused by either applying an update to dbg.value multiple
times when the update is only valid once, or by updating the
DIExpression for only the first instance of a value that appears
multiple times.
Differential Revision: https://reviews.llvm.org/D105831
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.
This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.
Differential Revision: https://reviews.llvm.org/D105129
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.
Differential Revision: https://reviews.llvm.org/D105395
We need the compiler generated variable to override the weak symbol of
the same name inside the profile runtime, but using LinkOnceODRLinkage
results in weak symbol being emitted which leads to an issue where the
linker might choose either of the weak symbols potentially disabling the
runtime counter relocation.
This change replaces the use of weak definition inside the runtime with
an external weak reference to address the issue. We also place the
compiler generated symbol inside a COMDAT group so dead definition can
be garbage collected by the linker.
Differential Revision: https://reviews.llvm.org/D105176
This allows application code checks if origin tracking is on before
printing out traces.
-dfsan-track-origins can be 0,1,2.
The current code only distinguishes 1 and 2 in compile time, but not at runtime.
Made runtime distinguish 1 and 2 too.
Reviewed By: browneee
Differential Revision: https://reviews.llvm.org/D105128
The code was previously relying on the fact that an incorrectly
typed global would result in the insertion of a BitCast constant
expression. With opaque pointers, this is no longer the case, so
we should check the type explicitly.
This enable no_sanitize C++ attribute to exclude globals from hwasan
testing, and automatically excludes other sanitizers' globals (such as
ubsan location descriptors).
Differential Revision: https://reviews.llvm.org/D104825
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:
* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code
The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.
Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.
Reviewed By: gbalats, stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104481
This reverts commit 76d0747e08.
If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.
Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.
```
group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
[Index] Name
[ 67] __llvm_prf_cnts
[ 68] __llvm_prf_vals
[ 69] __llvm_prf_data
[ 70] .rela__llvm_prf_data
```
The current naming scheme adds the `dfs$` prefix to all
DFSan-instrumented functions. This breaks mangling and prevents stack
trace printers and other tools from automatically demangling function
names.
This new naming scheme is mangling-compatible, with the `.dfsan`
suffix being a vendor-specific suffix:
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure
With this fix, demangling utils would work out-of-the-box.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104494
For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.
Reviewed By: gbalats, stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104481
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.
Reviewed By: aeubanks, wenlei, hoy
Differential Revision: https://reviews.llvm.org/D103867
Also:
- add driver test (fsanitize-use-after-return.c)
- add basic IR test (asan-use-after-return.cpp)
- (NFC) cleaned up logic for generating table of __asan_stack_malloc
depending on flag.
for issue: https://github.com/google/sanitizers/issues/1394
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D104076
Adds the basic instrumentation needed for stack tagging.
Currently does not support stack short granules or TLS stack histories,
since a different code path is followed for the callback instrumentation
we use.
We may simply wait to support these two features until we switch to
a custom calling convention.
Patch By: xiangzhangllvm, morehouse
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D102901
This allows for using the frame record feature (which uses __hwasan_tls)
independently from however the user wants to access the shadow base, which
prior was only usable if shadow wasn't accessed through the TLS variable or ifuncs.
Frame recording can be explicitly set according to ShadowMapping::WithFrameRecord
in ShadowMapping::init. Currently, it is only enabled on Fuchsia and if TLS is
used, so this should mimic the old behavior.
Added an extra case to prologue.ll that covers this new case.
Differential Revision: https://reviews.llvm.org/D103841
Complete support for fast8:
- amend shadow size and mapping in runtime
- remove fast16 mode and -dfsan-fast-16-labels flag
- remove legacy mode and make fast8 mode the default
- remove dfsan-fast-8-labels flag
- remove functions in dfsan interface only applicable to legacy
- remove legacy-related instrumentation code and tests
- update documentation.
Reviewed By: stephan.yichao.zhao, browneee
Differential Revision: https://reviews.llvm.org/D103745
Needs to be discussed more.
This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.
(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)
On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.
Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:
#BEFORE
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867
$ du -cksh base_unittests.exe
82M base_unittests.exe
82M total
# AFTER
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499
$ du -cksh base_unittests.exe
78M base_unittests.exe
78M total
```
The change is NFC for Mach-O.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103372
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.
(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)
On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.
Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:
#BEFORE
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867
$ du -cksh base_unittests.exe
82M base_unittests.exe
82M total
# AFTER
$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499
$ du -cksh base_unittests.exe
78M base_unittests.exe
78M total
```
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103372
Calls must properly match argument ABI attributes with the callee.
Found via D103412.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D103414
The linkage/visibility of `__profn_*` variables are derived
from the profiled functions.
extern_weak => linkonce
available_externally => linkonce_odr
internal => private
extern => private
_ => unchanged
The linkage/visibility of `__profc_*`/`__profd_*` variables are derived from
`__profn_*` with linkage/visibility wrestling for Windows.
The changes can be folded to the following without changing semantics.
```
if (TT.isOSBinFormatCOFF() && !NeedComdat) {
Linkage = GlobalValue::InternalLinkage;
Visibility = GlobalValue::DefaultVisibility;
}
```
That said, I think we can just delete the code block.
An extern/internal function will now use private `__profc_*`/`__profd_*`
variables, instead of internal ones. This saves some symbol table entries.
A non-comdat {linkonce,weak}_odr function will now use hidden external
`__profc_*`/`__profd_*` variables instead of internal ones. There is potential
object file size increase because such symbols need `/INCLUDE:` directives.
However such non-comdat functions are rare (note that non-comdat weak
definitions don't prevent duplicate definition error).
The behavior changes match ELF.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D103355
Without this change, a callsite like:
[[clang::musttail]] return func_call(x);
will cause an error like:
fatal error: error in backend: failed to perform tail call elimination
on a call site marked musttail
due to DFSan inserting instrumentation between the musttail call and
the return.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D103542
DFSan has flags to control flows between pointers and objects referred
by pointers. For example,
a = *p;
L(a) = L(*p) when -dfsan-combine-pointer-labels-on-load = false
L(a) = L(*p) + L(p) when -dfsan-combine-pointer-labels-on-load = true
*p = b;
L(*p) = L(b) when -dfsan-combine-pointer-labels-on-store = false
L(*p) = L(b) + L(p) when -dfsan-combine-pointer-labels-on-store = true
The question is what to do with p += c.
In practice we found many confusing flows if we propagate labels from c
to p. So a new flag works like this
p += c;
L(p) = L(p) when -dfsan-propagate-via-pointer-arithmetic = false
L(p) = L(p) + L(c) when -dfsan-propagate-via-pointer-arithmetic = true
Reviewed-by: gbalats
Differential Revision: https://reviews.llvm.org/D103176
Arguments need to have the proper ABI parameter attributes set.
Followup to D101806.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D103288
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
This reduces the size of chrome.dll.pdb built with optimizations,
coverage, and line table info from 4,690,210,816 to 2,181,128,192, which
makes it possible to fit under the 4GB limit.
This change can greatly reduce binary size in coverage builds, which do
not need value profiling. IR PGO builds are unaffected. There is a minor
behavior change for frontend PGO.
PGO and coverage both use InstrProfiling to create profile data with
counters. PGO records the address of each function in the __profd_
global. It is used later to map runtime function pointer values back to
source-level function names. Coverage does not appear to use this
information.
Recording the address of every function with code coverage drastically
increases code size. Consider this program:
void foo();
void bar();
inline void inlineMe(int x) {
if (x > 0)
foo();
else
bar();
}
int getVal();
int main() { inlineMe(getVal()); }
With code coverage, the InstrProfiling pass runs before inlining, and it
captures the address of inlineMe in the __profd_ global. This greatly
increases code size, because now the compiler can no longer delete
trivial code.
One downside to this approach is that users of frontend PGO must apply
the -mllvm -enable-value-profiling flag globally in TUs that enable PGO.
Otherwise, some inline virtual method addresses may not be recorded and
will not be able to be promoted. My assumption is that this mllvm flag
is not popular, and most frontend PGO users don't enable it.
Differential Revision: https://reviews.llvm.org/D102818
In LAM model X86_64 will use bits 57-62 (of 0-63) as HWASAN tag.
So here we make sure the tag shift position and tag mask is correct for x86-64.
Differential Revision: https://reviews.llvm.org/D102472
Currently 1 byte global object has a ridiculous 63 bytes redzone.
This patch reduces the redzone size to be less than 32 if the size of global object is less than or equal to half of 32 (the minimal size of redzone).
A 12 bytes object has a 20 bytes redzone, a 20 bytes object has a 44 bytes redzone.
Reviewed By: MaskRay, #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D102469
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.
SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102032
Add address sanitizer instrumentation support for accesses to global
and constant address spaces in AMDGPU. It strictly avoids instrumenting
the stack and assumes x86 as the host.
Reviewed by: vitalybuka
Differential Revision: https://reviews.llvm.org/D99071
The problem is the following. With fast8, we broke an important
invariant when loading shadows. A wide shadow of 64 bits used to
correspond to 4 application bytes with fast16; so, generating a single
load was okay since those 4 application bytes would share a single
origin. Now, using fast8, a wide shadow of 64 bits corresponds to 8
application bytes that should be backed by 2 origins (but we kept
generating just one).
Let’s say our wide shadow is 64-bit and consists of the following:
0xABCDEFGH. To check if we need the second origin value, we could do
the following (on the 64-bit wide shadow) case:
- bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000)
- push the result along with the first origin load to the shadow/origin vectors
- load the second 32-bit origin of the 64-bit wide shadow
- push the wide shadow along with the second origin to the shadow/origin vectors.
The combineOrigins would then select the second origin if the wide
shadow is of the form 0xABCDE0000. The tests illustrate how this
change affects the generated bitcode.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D101584
This patch makes sure that globals in supported address spaces
will be replaced by globals with red zones in the same address
space by copying the address space.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D101362
This applies the D100251 mechanism to the gcov instrumentation pass.
With this patch, `-fno-omit-frame-pointer` in
`clang -fprofile-arcs -O1 -fno-omit-frame-pointer` will be respected for synthesized
`__llvm_gcov_writeout,__llvm_gcov_reset,__llvm_gcov_init` functions: the frame pointer
will be kept (note: on many targets -O1 eliminates the frame pointer by default).
`clang -fno-exceptions -fno-asynchronous-unwind-tables -g -fprofile-arcs` will
produce .debug_frame instead of .eh_frame.
Fix: https://github.com/ClangBuiltLinux/linux/issues/955
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D101129
This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075
The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary.
Reviewed By: stephan.yichao.zhao, gbalats
Differential Revision: https://reviews.llvm.org/D101048
The first version of origin tracking tracks only memory stores. Although
this is sufficient for understanding correct flows, it is hard to figure
out where an undefined value is read from. To find reading undefined values,
we still have to do a reverse binary search from the last store in the chain
with printing and logging at possible code paths. This is
quite inefficient.
Tracking memory load instructions can help this case. The main issues of
tracking loads are performance and code size overheads.
With tracking only stores, the code size overhead is 38%,
memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
larger than #store, so both code size and cpu overhead increases. The
first blocker is code size overhead: link fails if we inline tracking
loads. The workaround is using external function calls to propagate
metadata. This is also the workaround ASan uses. The cpu overhead
is ~10x. This is a trade off between debuggability and performance,
and will be used only when debugging cases that tracking only stores
is not enough.
Reviewed By: gbalats
Differential Revision: https://reviews.llvm.org/D100967
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.
Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)
Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.
This patch
* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.
The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.
Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.
Differential Revision: https://reviews.llvm.org/D100251
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
to
if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
Introduce a getValueAsBool that normalize the check, with the following
behavior:
no attributes or attribute set to "false" => return false
attribute set to "true" => return true
Differential Revision: https://reviews.llvm.org/D99299
Instruction::getDebugLoc can return an invalid DebugLoc. For such cases
where metadata was accidentally removed from the libcall insertion
point, simply insert a DILocation with line 0 scoped to the caller. When
we can inline the libcall, such as during LTO, then we won't fail a
Verifier check that all calls to functions with debug metadata
themselves must have debug metadata.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100158
Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and
D87581). This should be an improvement both in terms of first principles
soundness and observed test failures --- test failures would occur
non-deterministically depending on the ASLR random offset.
On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as
`PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result
be the not very round number 0x1555556000. That address had to be further
rounded to ensure page alignment after the shadow scale shifting is applied.
Still, that value explains why the mapping table may look less regular than
expected.
Further cleanups:
- Moved the mapping table comment, to ensure that the two Linux/AArch64
tables stayed together;
- Removed mention of Sv48. Neither the original mapping nor this one are
compatible with an actual Linux Sv48 address space (mainline Linux still
operates Sv48 in Sv39 mode). A future patch can improve this;
- Removed the additional comments, for consistency.
Differential Revision: https://reviews.llvm.org/D97646
Userspace page aliasing allows us to use middle pointer bits for tags
without untagging them before syscalls or accesses. This should enable
easier experimentation with HWASan on x86_64 platforms.
Currently stack, global, and secondary heap tagging are unsupported.
Only primary heap allocations get tagged.
Note that aliasing mode will not work properly in the presence of
fork(), since heap memory will be shared between the parent and child
processes. This mode is non-ideal; we expect Intel LAM to enable full
HWASan support on x86_64 in the future.
Reviewed By: vitalybuka, eugenis
Differential Revision: https://reviews.llvm.org/D98875
Userspace page aliasing allows us to use middle pointer bits for tags
without untagging them before syscalls or accesses. This should enable
easier experimentation with HWASan on x86_64 platforms.
Currently stack, global, and secondary heap tagging are unsupported.
Only primary heap allocations get tagged.
Note that aliasing mode will not work properly in the presence of
fork(), since heap memory will be shared between the parent and child
processes. This mode is non-ideal; we expect Intel LAM to enable full
HWASan support on x86_64 in the future.
Reviewed By: vitalybuka, eugenis
Differential Revision: https://reviews.llvm.org/D98875
Subsequent patches will implement page-aliasing mode for x86_64, which
will initially only work for the primary heap allocator. We force
callback instrumentation to simplify the initial aliasing
implementation.
Reviewed By: vitalybuka, eugenis
Differential Revision: https://reviews.llvm.org/D98069
On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`,
`__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or
`comdat noduplicates`).
With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage
collect such sections. If all `__sancov_bools` are discarded, LLD will error
`error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar).
```
% cat a.c
void discarded() {}
% clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections
...
ld.lld: error: undefined hidden symbol: __start___sancov_guards
>>> referenced by a.c
>>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard)
```
Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the
undefined error.
Differential Revision: https://reviews.llvm.org/D98903
This is only adding support to the dfsan instrumentation pass but not
to the runtime.
Added more RUN lines for testing: for each instrumentation test that
had a -dfsan-fast-16-labels invocation, a new invocation was added
using fast8.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98734
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587.
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427.
".llvm." suffix".
The recommit fixed a bug that symbols with "." at the beginning is not
properly handled in the last commit.
Original commit message:
Currently IndirectCallPromotion simply strip everything after the first "."
in LTO mode, in order to match the symbol name and the name with ".llvm."
suffix in the value profile. However, if -funique-internal-linkage-names
and thinlto are both enabled, the name may have both ".__uniq." suffix and
".llvm." suffix, and the current mechanism will strip them both, which is
unexpected. The patch fixes the problem.
Differential Revision: https://reviews.llvm.org/D98389
This broke the check-profile tests on Mac, see comment on the code
review.
> This is no longer needed, we can add __llvm_profile_runtime directly
> to llvm.compiler.used or llvm.used to achieve the same effect.
>
> Differential Revision: https://reviews.llvm.org/D98325
This reverts commit c7712087cb.
Also reverting the dependent follow-up commit:
Revert "[InstrProfiling] Generate runtime hook for ELF platforms"
> When using -fprofile-list to selectively apply instrumentation only
> to certain files or functions, we may end up with a binary that doesn't
> have any counters in the case where no files were selected. However,
> because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
> runtime would still be pulled in and incur some non-trivial overhead,
> especially in the case when the continuous or runtime counter relocation
> mode is being used. A better way would be to pull in the profile runtime
> only when needed by declaring the __llvm_profile_runtime symbol in the
> translation unit only when needed.
>
> This approach was already used prior to 9a041a7522, but we changed it
> to always generate the __llvm_profile_runtime due to a TAPI limitation.
> Since TAPI is only used on Mach-O platforms, we could use the early
> emission of __llvm_profile_runtime there, and on other platforms we
> could change back to the earlier approach where the symbol is generated
> later only when needed. We can stop passing -u__llvm_profile_runtime to
> the linker on Linux and Fuchsia since the generated undefined symbol in
> each translation unit that needed it serves the same purpose.
>
> Differential Revision: https://reviews.llvm.org/D98061
This reverts commit 87fd09b25f.
When using -fprofile-list to selectively apply instrumentation only
to certain files or functions, we may end up with a binary that doesn't
have any counters in the case where no files were selected. However,
because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
runtime would still be pulled in and incur some non-trivial overhead,
especially in the case when the continuous or runtime counter relocation
mode is being used. A better way would be to pull in the profile runtime
only when needed by declaring the __llvm_profile_runtime symbol in the
translation unit only when needed.
This approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation.
Since TAPI is only used on Mach-O platforms, we could use the early
emission of __llvm_profile_runtime there, and on other platforms we
could change back to the earlier approach where the symbol is generated
later only when needed. We can stop passing -u__llvm_profile_runtime to
the linker on Linux and Fuchsia since the generated undefined symbol in
each translation unit that needed it serves the same purpose.
Differential Revision: https://reviews.llvm.org/D98061
1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from
the value profile metadata and preserves the remaining entries for the fallback
memop call site. If there are more than five entries, the rest of the entries
would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes
up to 3 (by default) values, but potentially not for other downstream passes
that may use the value profile metadata.
2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept
track of. When the range buckets were introduced, it was changed to skip the
range buckets, but since it does not grab all entries (only five), if some range
buckets exist in the first five entries, it could potentially cause fewer
promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to
promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it
means that wrong entries may be preserved, as it didn't correctly keep track of
which were entries were skipped.
To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number
of value profile buckets), keeps track of which entries were skipped, and
preserves all the remaining entries.
Differential Revision: https://reviews.llvm.org/D97592
This is no longer needed, we can add __llvm_profile_runtime directly
to llvm.compiler.used or llvm.used to achieve the same effect.
Differential Revision: https://reviews.llvm.org/D98325
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.
Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.
Differential Revision: https://reviews.llvm.org/D88232
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.
Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.
With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.
Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.
This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.
`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.
Behaviors on COFF/Mach-O are not affected.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D97649
This is a part of https://reviews.llvm.org/D95835.
One issue is about origin load optimization: see the
comments of useCallbackLoadLabelAndOrigin
@gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad.
Reviewed By: morehouse, gbalats
Differential Revision: https://reviews.llvm.org/D97570
This addresses ~50 clang-tidy warnings on dfsan instrumentation pass.
It also contains some refactoring (all non-functional changes) to eliminate some variables and simplify code.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D97714
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.
With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker no longer lets `__start_/__stop_` references retain them.
Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.
This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.
`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.
Behaviors on other COFF/Mach-O are not affected.
Differential Revision: https://reviews.llvm.org/D97649
Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics
for comdat and may not discard the sections as a unit.
The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF)
are similar to D97432: `__profd_` is not directly referenced, so
`__profd_` may be discarded while `__profc_` is retained, breaking the
interconnection. We currently conservatively add all such sections to
`llvm.used` and let the linker do GC for ELF.
In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN,
causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC).
Use `llvm.compiler.used` to retain the current GC behavior.
Differential Revision: https://reviews.llvm.org/D97585
This will allow identifying exactly how many shadow bytes were used
during compilation, for when fast8 mode is introduced.
Also, it will provide a consistent matching point for instrumentation
tests so that the exact llvm type used (i8 or i16) for the shadow can
be replaced by a pattern substitution. This is handy for tests with
multiple prefixes.
Reviewed by: stephan.yichao.zhao, morehouse
Differential Revision: https://reviews.llvm.org/D97409
This is a part of https://reviews.llvm.org/D95835.
Each customized function has two wrappers. The
first one dfsw is for the normal shadow propagation. The second one dfso is used
when origin tracking is on. It calls the first one, and does additional
origin propagation. Which one to use can be decided at instrumentation
time. This is to ensure minimal additional overhead when origin tracking
is off.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D97483
`__sancov_pcs` parallels the other metadata section(s). While some optimizers
(e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the
sections as a unit, some (e.g. GlobalOpt/ConstantMerge) do not. So we have to
conservatively retain all unconditionally in the compiler.
When a comdat is used, the COFF/ELF linkers' GC semantics ensure the
associated parallel array elements are retained or discarded together,
so `llvm.compiler.used` is sufficient.
Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not
used), we have to use `llvm.used` to conservatively make all sections retain by
the linker. This will fix the Windows problem once internal linkage
GlobalObject's in `llvm.used` are retained via `/INCLUDE:`.
Reviewed By: morehouse, vitalybuka
Differential Revision: https://reviews.llvm.org/D97432
DFSan at store does store shadow data; store app data; and at load does
load shadow data; load app data.
When an application data is atomic, one overtainting case is
thread A: load shadow
thread B: store shadow
thread B: store app
thread A: load app
If the application address had been used by other flows, thread A reads
previous shadow, causing overtainting.
The change is similar to MSan's solution.
1) enforce ordering of app load/store
2) load shadow after load app; store shadow before shadow app
3) do not track atomic store by reseting its shadow to be 0.
The last one is to address a case like this.
Thread A: load app
Thread B: store shadow
Thread A: load shadow
Thread B: store app
This approach eliminates overtainting as a trade-off between undertainting
flows via shadow data race.
Note that this change addresses only native atomic instructions, but
does not support builtin libcalls yet.
https://llvm.org/docs/Atomics.html#libcalls-atomic
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D97310
And then push those change throughout LLVM.
Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).
The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.
Differential Revision: https://reviews.llvm.org/D97223
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)
If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.
This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)
For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.
This patch does not change the object file output for COFF (where `!associated` is ignored).
Reviewed By: morehouse, rnk, vitalybuka
Differential Revision: https://reviews.llvm.org/D97430
Putting globals in a comdat for dead-stripping changes the semantic and
can potentially cause false negative odr violations at link time.
If odr indicators are used, we keep the comdat sections, as link time
odr violations will be dectected for the odr indicator symbols.
This fixes PR 47925
Previously there was no way to control how module destructors were emitted
by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang)
to be able to decide how to emit these destructors (if at all).
This patch introduces the `AsanDtorKind` enum that represents the different ways
destructors can be emitted. There are currently only two valid ways to emit destructors.
* `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default.
* `None` - Do not emit module destructors.
The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated
to take the `AsanDtorKind` as an argument.
The `-asan-destructor-kind=` command line argument has been introduced to make this
easy to test from `opt`. If this argument is specified it overrides the value passed
to the `ModuleAddressSanitizerPass` constructor.
Note that `AsanDtorKind` is not `bool` because we will introduce a new way to
emit destructors in a subsequent patch.
Note that `AsanDtorKind` is given its own header file because if it is declared
in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error
(Module is ambiguous) when trying to use it in
`clang/Basic/CodeGenOptions.def`.
rdar://71609176
Differential Revision: https://reviews.llvm.org/D96571
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.
Differential Revision: https://reviews.llvm.org/D92074
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.
The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.
Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.
Differential Revision: https://reviews.llvm.org/D96757
__start_/__stop_ references retain C identifier name sections such as
__llvm_prf_*. Putting these into a section group disables this logic.
The ELF section group semantics ensures that group members are retained
or discarded as a unit. When a function symbol is discarded, this allows
allows linker to discard counters, data and values associated with that
function symbol as well.
Note that `noduplicates` COMDAT is lowered to zero-flag section group in
ELF. We only set this for functions that aren't already in a COMDAT and
for those that don't have available_externally linkage since we already
use regular COMDAT groups for those.
Differential Revision: https://reviews.llvm.org/D96757
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
This patch emits "instr_prof_hash_mismatch" function annotation metadata if
there is a hash mismatch while applying instrumented profiles.
During the PGO optimized build using instrumented profiles, if the CFG of
the function has changed since generating the profile, a hash mismatch is
encountered. This patch emits this information as annotation metadata. We
plan to use this with Propeller which is done at the machine IR level.
Propeller is usually applied on top of PGO and a hash mismatch during
PGO could be used to detect source drift.
Differential Revision: https://reviews.llvm.org/D95495
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.
The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.
This function is one of last three that not operate on DomTreeUpdater.
We're immediately dereferencing the casted pointer, so use cast<> which will assert instead of dyn_cast<> which can return null.
Fixes static analyzer warning.
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This patch removes InstrumentFuncEntry as it is dead.
The constructor of FuncPGOInstrumentation passes InstrumentFuncEntry
to MST, but it doesn't make a local copy as a member variable.
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
Raw profile count values for each BB are not kept after profile
annotation. We record function entry count and branch weights
and use them to compute the count when needed. This mechanism
works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in
BFI). This patch uses sum of profile count values to fix
function entry count to make the BFI count close to real profile
counts.
Differential Revision: https://reviews.llvm.org/D61540
This patch adds the functionality to compare BFI counts with real
profile
counts right after reading the profile. It will print remarks under
-Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo.
Differential Revision: https://reviews.llvm.org/D91813
This migrates all LLVM (except Kaleidoscope and
CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get.
The CodeGen/StackProtector.cpp usage may have a nullptr Scope
and can trigger an assertion failure, so I don't migrate it.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D93087
*************
* The problem
*************
See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.
Consider a parsing function
std::pair<char*, int> get_token(char* p);
where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.
If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.
The CL is the first change to address the issue.
**************************
* The proposed improvement
**************************
Eventually all fields and indices have their own shadow values in
variables and memory.
For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.
***************************
* An potential implementation plan
***************************
The idea is to adopt the change incrementially.
1) This CL
Support field-level accuracy at variables/args/ret in TLS mode,
load/store/alloca still use combined shadow values.
After the alloca promotion and SSA construction phases (>=-O1), we
assume alloca and memory operations are reduced. So if struct
variables do not relate to memory, their tracking is accurate at
field level.
2) Support field-level accuracy at alloca
3) Support field-level accuracy at load/store
These two should make O0 and real memory access work.
4) Support vector if necessary.
5) Support Args mode if necessary.
6) Support passing more accurate shadow values via custom functions if
necessary.
***************
* About this CL.
***************
The CL did the following
1) extended TLS arg/ret to work with aggregate types. This is similar
to what MSan does.
2) implemented how to map between an original type/value/zero-const to
its shadow type/value/zero-const.
3) extended (insert|extract)value to use field/index-level progagation.
4) for other instructions, propagation rules are combining inputs by or.
The CL converts between aggragate and primary shadow values at the
cases.
5) Custom function interfaces also need such a conversion because
all existing custom functions use i16. It is unclear whether custome
functions need more accurate shadow propagation yet.
6) Added test cases for aggregate type related cases.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92261
This guards against cases where the symbol was dead code eliminated in
the binary by ThinLTO, and we have a sample profile collected for one
binary but used to optimize another.
Most of the benefit from ICP comes from inlining the target, which we
can't do with only a declaration anyway. If this is in the pre-ThinLTO
link step (e.g. for instrumentation based PGO), we will attempt the
promotion again in the ThinLTO backend after importing anyway, and we
don't need the early promotion to facilitate that.
Differential Revision: https://reviews.llvm.org/D92804
The x86-64 backend currently has a bug which uses a wrong register when for the GOTPCREL reference.
The program will crash without the dso_local specifier.
This is a child diff of D92261.
This diff adds APIs that return shadow type/value/zero from origin
objects. For the time being these APIs simply returns primitive
shadow type/value/zero. The following diff will be implementing the
conversion.
As D92261 explains, some cases still use primitive shadow during
the incremential changes. The cases include
1) alloca/load/store
2) custom function IO
3) vectors
At the cases this diff does not use the new APIs, but uses primitive
shadow objects explicitly.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92629
This is a child diff of D92261.
It extended TLS arg/ret to work with aggregate types.
For a function
t foo(t1 a1, t2 a2, ... tn an)
Its arguments shadow are saved in TLS args like
a1_s, a2_s, ..., an_s
TLS ret simply includes r_s. By calculating the type size of each shadow
value, we can get their offset.
This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls
from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp.
Note that this change does not add test cases for overflowed TLS
arg/ret because this is hard to test w/o supporting aggregate shdow
types. We will be adding them after supporting that.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92440
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D92489
This is a child diff of D92261.
After supporting field/index-level shadow, the existing shadow with type
i16 works for only primitive types.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92459
At D92261, this type will be used to cache both combined shadow and
converted shadow values.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92458
The function was introduced on Jan 23, 2019 in commit
73078ecd38.
Its definition was removed on Oct 27, 2020 in commit
0930763b4b, leaving the declaration
unused.
Unused since https://reviews.llvm.org/D91762 and triggering
-Wunused-private-field
```
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:365:13: error: private field 'GetArgTLS' is not used [-Werror,-Wunused-private-field]
Constant *GetArgTLS;
^
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp:366:13: error: private field 'GetRetvalTLS' is not used [-Werror,-Wunused-private-field]
Constant *GetRetvalTLS;
```
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D91820
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.
I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.
This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Before the change, DFSan always does the propagation. W/o
origin tracking, it is harder to understand such flows. After
the change, the flag is off by default.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D91234
There is already an API in BasicBlock that checks and returns the musttail call if it precedes the return instruction.
Use it instead of manually checking in each place.
Differential Revision: https://reviews.llvm.org/D90693
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.
Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.
Change the existing memprof tests to specify log_path=stderr when that
was being relied on.
Depends on D89086.
Differential Revision: https://reviews.llvm.org/D89087
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.
Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.
With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:
code size boot time
before 92824064 6.18s
after 38822400 6.65s
[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76
Depends on D90425
Differential Revision: https://reviews.llvm.org/D90426
This is a workaround for poor heuristics in the backend where we can
end up materializing the constant multiple times. This is particularly
bad when using outlined checks because we materialize it for every call
(because the backend considers it trivial to materialize).
As a result the field containing the shadow base value will always
be set so simplify the code taking that into account.
Differential Revision: https://reviews.llvm.org/D90425
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
Following up D81682 and D83903, remove the code for the old value profiling
buckets, which have been replaced with the new, extended buckets and disabled by
default.
Also syncing InstrProfData.inc between compiler-rt and llvm.
Differential Revision: https://reviews.llvm.org/D88838
... by using MapVector. The issue was caused by 63182c2ac0.
Also use stable_partition instead of partition to get stable results
across different STL implementations.
Do not instrument user-defined ELF sections (whose names resemble valid
C identifiers). They may have special use semantics and modifying them
may break programs. This is e.g. the case with NetBSD __link_set API
that expects these sections to store consecutive array elements.
Differential Revision: https://reviews.llvm.org/D76665
When ASan and e.g. Dead Virtual Function Elimination are enabled, the
latter will rely on type metadata to determine if certain virtual calls can be
removed. However, ASan currently does not copy type metadata, which can cause
virtual function calls to be incorrectly removed.
Differential Revision: https://reviews.llvm.org/D88368
When address sanitizing a function, stack unpinsoning code is inserted before each ret instruction. However if the ret instruciton is preceded by a musttail call, such transformation broke the musttail call contract and generates invalid IR.
This patch fixes the issue by moving the insertion point prior to the musttail call if there is one.
Differential Revision: https://reviews.llvm.org/D87777
This is consistent with the clang option added in
7ed8124d46, and the comments on the
runtime patch in D87120.
Differential Revision: https://reviews.llvm.org/D87622
Turns out this was use-after-move of function_ref, which is trivially
copyable and movable, so the move did nothing and use after move was
safe.
But since this function_ref is being copied into a std::function, change
the function_ref to be std::function to avoid extra layers of type
erasure indirection - and then it's a real use after move, and fix that
by referring to the moved-to member variable rather than the moved-from
parameter.
gcov is an "Edge Profiling with Edge Counters" application according to
Optimally Profiling and Tracing Programs (1994).
The minimum number of counters necessary is |E|-(|V|-1). The unmeasured edges
form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage
this optimization. This patch implements the optimization for clang --coverage.
The produced .gcda files are much smaller now.
i.e. change the work flow from
* .gcno for function A
* .gcno for function B
* .gcno for function C
* .gcda for function A
* .gcda for function B
* .gcda for function C
to
* .gcno for function A
* .gcda for function A
* .gcno for function B
* .gcda for function B
* .gcno for function C
* .gcda for function C
Currently there is duplicate logic in .gcno & .gcda processing: how functions
are filtered, which edges are instrumented, etc. This refactor enables simplification.
Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode,
__llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.
PGOInstrumentation runs `SplitIndirectBrCriticalEdges` but some IndirectBrInst
critical edge cannot be split. `getInstrBB` will crash when calling `SplitCriticalEdge`, e.g.
int foo(char *p) {
void *targets[2];
targets[0] = &&indirect;
targets[1] = &&end;
for (;; p++)
if (*p == 7) {
indirect:
goto *targets[p[1]]; // the self loop is critical in -O
}
end:
return 0;
}
Skip such critical edges to prevent a crash.
Reviewed By: davidxl, lebedev.ri
Differential Revision: https://reviews.llvm.org/D87435
The entry block is split at the first instruction where `shouldKeepInEntry`
returns false. The created basic block has a br jumping to the original entry
block. The new basic block causes the function label line and the other entry
block lines to be covered by different basic blocks, which can affect line
counts with special control flows (fork/exec in the entry block requires
heuristics in llvm-cov gcov to get consistent line counts).
int main() { // BB0
return 0; // BB2 (due to entry block splitting)
}
// BB1 is the exit block (since gcov 4.8)
This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and
inserts an edge from the synthetic entry block to the original entry block. We
can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The
number of basic blocks does not change, but the emitted .gcno files will be
smaller because we can save one GCOV_TAG_LINES tag.
// BB0 is the synthetic entry block with a single edge to BB2
int main() { // BB2
return 0; // BB2
}
// BB1 is the exit block (since gcov 4.8)
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.
isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84242
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).
This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.
The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.
Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.
Differential Revision: https://reviews.llvm.org/D85948
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.
Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86477
This would be a problem if the entire instrumented function was a call
to
e.g. memcpy
Use FnPrologueEnd Instruction* instead of ActualFnStart BB*
Differential Revision: https://reviews.llvm.org/D86001
This allows us to add addtional instrumentation before the function start,
without splitting the first BB.
Differential Revision: https://reviews.llvm.org/D85985
Have the front-end use the `nounwind` attribute on atomic libcalls.
This prevents us from seeing `invoke __atomic_load` in MSAN, which
is problematic as it has no successor for instrumentation to be added.
This lets us support the scenario where a binary is linked from a mix
of object files with both instrumented and non-instrumented globals.
This is likely to occur on Android where the decision of whether to use
instrumented globals is based on the API level, which is user-facing.
Previously, in this scenario, it was possible for the comdat from
one of the object files with non-instrumented globals to be selected,
and since this comdat did not contain the note it would mean that the
note would be missing in the linked binary and the globals' shadow
memory would be left uninitialized, leading to a tag mismatch failure
at runtime when accessing one of the instrumented globals.
It is harmless to include the note when targeting a runtime that does
not support instrumenting globals because it will just be ignored.
Differential Revision: https://reviews.llvm.org/D85871
Commit 9385aaa848 ("[sancov] Fix PR33732") added zeroext to
__sanitizer_cov_trace(_const)?_cmp[1248] parameters for x86_64 only,
however, it is useful on other targets, in particular, on SystemZ: it
fixes swap-cmp.test.
Therefore, use it on all targets. This is safe: if target ABI does not
require zero extension for a particular parameter, zeroext is simply
ignored. A similar change has been implemeted as part of commit
3bc439bdff ("[MSan] Add instrumentation for SystemZ"), and there were
no problems with it.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D85689
Adds the binary format goff and the operating system zos to the triple
class. goff is selected as default binary format if zos is choosen as
operating system. No further functionality is added.
Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay
Reviewed By: efriedma, tahonermann, hubert.reinterpertcast
Differential Revision: https://reviews.llvm.org/D82081
A GlobalAlias is an address-taken user of its aliased function.
canRenameComdatFunc has excluded such cases.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D85597
MSan removes readnone/readonly and similar attributes from callees,
because after MSan instrumentation those attributes no longer apply.
This change removes the attributes from call sites, as well.
Failing to do this may cause DSE of paramTLS stores before calls to
readonly/readnone functions.
Differential Revision: https://reviews.llvm.org/D85259
If a section is supposed to hold elements of type T, then the
corresponding CreateSecStartEnd()'s Ty parameter represents T*.
Forwarding it to GlobalVariable constructor causes the resulting
GlobalVariable's type to be T*, and its SSA value type to be T**, which
is one indirection too many. This issue is mostly masked by pointer
casts, however, the global variable still gets an incorrect alignment,
which causes SystemZ to choose wrong instructions to access the
section.
D68041 placed `__profc_`, `__profd_` and (if exists) `__profvp_` in different comdat groups.
There are some issues:
* Cost: one or two additional section headers (`.group` section(s)): 64 or 128 bytes on ELF64.
* `__profc_`, `__profd_` and (if exists) `__profvp_` should be retained or
discarded. Placing them into separate comdat groups is conceptually inferior.
* If the prevailing group does not include `__profvp_` (value profiling not
used) but a non-prevailing group from another translation unit has `__profvp_`
(the function is inlined into another and triggers value profiling), there
will be a stray `__profvp_` if --gc-sections is not enabled.
This has been fixed by 3d6f53018f.
Actually, we can reuse an existing symbol (we choose `__profd_`) as the group
signature to avoid a string in the string table (the sole reason that D68041
could improve code size is that `__profv_` was an otherwise unused symbol which
wasted string table space). This saves one or two section headers.
For a -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED=IR build, `ninja
clang lld`, the patch has saved 10.5MiB (2.2%) for the total .o size.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84723
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Differential Revision: https://reviews.llvm.org/D81682
Freeze always returns a defined value. This also prevents msan from
checking the input shadow, which happened because freeze wasn't
explicitly visited.
Differential Revision: https://reviews.llvm.org/D85040
findAllocaForValue uses AllocaForValue to cache resolved values.
The function is used only to resolve arguments of lifetime
intrinsic which usually are not fare for allocas. So result reuse
is likely unnoticeable.
In followup patches I'd like to replace the function with
GetUnderlyingObjects.
Depends on D84616.
Differential Revision: https://reviews.llvm.org/D84617
Adds the -fast-16-labels flag, which enables efficient instrumentation
for DFSan when the user needs <=16 labels. The instrumentation
eliminates most branches and most calls to __dfsan_union or
__dfsan_union_load.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D84371