Currently adding attribute no_sanitize("bounds") isn't disabling
-fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang
frontend handles fsanitize=array-bounds which can already be disabled by
no_sanitize("bounds"). However, instrumentation added by the
BoundsChecking pass in the middle-end cannot be disabled by the
attribute.
The fix is very similar to D102772 that added the ability to selectively
disable sanitizer pass on certain functions.
In this patch, if no_sanitize("bounds") is provided, an additional
function attribute (NoSanitizeBounds) is attached to IR to let the
BoundsChecking pass know we want to disable local-bounds checking. In
order to support this feature, the IR is extended (similar to D102772)
to make Clang able to preserve the information and let BoundsChecking
pass know bounds checking is disabled for certain function.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D119816
For ASan this will effectively serve as a synonym for
__attribute__((no_sanitize("address"))).
Adding the disable_sanitizer_instrumentation to functions will drop the
sanitize_XXX attributes on the IR level.
This is the third reland of https://reviews.llvm.org/D114421.
Now that TSan test is fixed (https://reviews.llvm.org/D120050) there
should be no deadlocks.
Differential Revision: https://reviews.llvm.org/D120055
To make uses of the deprecated constructor easier to spot, and to
ensure that no new uses are introduced, rename it to
Address::deprecated().
While doing the rename, I've filled in element types in cases
where it was relatively obvious, but we're still left with 135
calls to the deprecated constructor.
The "-fzero-call-used-regs" option tells the compiler to zero out
certain registers before the function returns. It's also available as a
function attribute: zero_call_used_regs.
The two upper categories are:
- "used": Zero out used registers.
- "all": Zero out all registers, whether used or not.
The individual options are:
- "skip": Don't zero out any registers. This is the default.
- "used": Zero out all used registers.
- "used-arg": Zero out used registers that are used for arguments.
- "used-gpr": Zero out used registers that are GPRs.
- "used-gpr-arg": Zero out used GPRs that are used as arguments.
- "all": Zero out all registers.
- "all-arg": Zero out all registers used for arguments.
- "all-gpr": Zero out all GPRs.
- "all-gpr-arg": Zero out all GPRs used for arguments.
This is used to help mitigate Return-Oriented Programming exploits.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110869
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019
The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).
Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)
The outcome is that we can finally use Live++ or Recode along with clang-cl.
NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.
Depends on D43002, D80833 and D81301 for the full feature.
Differential Revision: https://reviews.llvm.org/D116511
The code generation for the UBSan VLA size check was qualified by a con-
dition that the parameter must be a signed integer, however the C spec
does not make any distinction that only signed integer parameters can be
used to declare a VLA, only qualifying that it must be greater than zero
if it is not a constant.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D116048
This reverts commit cc56c66f27.
Fixed a bad assertion, the target of a UsingShadowDecl must not have
*local* qualifiers, but it can be a typedef whose underlying type is qualified.
Currently there's no way to find the UsingDecl that a typeloc found its
underlying type through. Compare to DeclRefExpr::getFoundDecl().
Design decisions:
- a sugar type, as there are many contexts this type of use may appear in
- UsingType is a leaf like TypedefType, the underlying type has no TypeLoc
- not unified with UnresolvedUsingType: a single name is appealing,
but being sometimes-sugar is often fiddly.
- not unified with TypedefType: the UsingShadowDecl is not a TypedefNameDecl or
even a TypeDecl, and users think of these differently.
- does not cover other rarer aliases like objc @compatibility_alias,
in order to be have a concrete API that's easy to understand.
- implicitly desugared by the hasDeclaration ASTMatcher, to avoid
breaking existing patterns and following the precedent of ElaboratedType.
Scope:
- This does not cover types associated with template names introduced by
using declarations. A future patch should introduce a sugar TemplateName
variant for this. (CTAD deduced types fall under this)
- There are enough AST matchers to fix the in-tree clang-tidy tests and
probably any other matchers, though more may be useful later.
Caveats:
- This changes a fairly common pattern in the AST people may depend on matching.
Previously, typeLoc(loc(recordType())) matched whether a struct was
referred to by its original scope or introduced via using-decl.
Now, the using-decl case is not matched, and needs a separate matcher.
This is similar to the case of typedefs but nevertheless both adds
complexity and breaks existing code.
Differential Revision: https://reviews.llvm.org/D114251
Fix a mistake in 9bf917394eba3ba4df77cc17690c6d04f4e9d57f: sret
arguments use ConvertType, not ConvertTypeForMem, see the handling
in CodeGenTypes::GetFunctionType().
This fixes fp-matrix-pragma.c on s390x.
This reverts commit 2b554920f1.
This change causes tsan test timeout on x86_64-linux-autoconf.
The timeout can be reproduced by:
git clone https://github.com/llvm/llvm-zorg.git
BUILDBOT_CLOBBER= BUILDBOT_REVISION=eef8f3f85679c5b1ae725bade1c23ab7bb6b924f llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_standard.sh
WG14 adopted the _ExtInt feature from Clang for C23, but renamed the
type to be _BitInt. This patch does the vast majority of the work to
rename _ExtInt to _BitInt, which accounts for most of its size. The new
type is exposed in older C modes and all C++ modes as a conforming
extension. However, there are functional changes worth calling out:
* Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
* Updates the mangling for the type.
* Updates the documentation and adds a release note to warn users what
is going on.
* Adds new diagnostics for use of _BitInt to call out when it's used as
a Clang extension or as a pre-C23 compatibility concern.
* Adds new tests for the new diagnostic behaviors.
I want to call out the ABI break specifically. We do not believe that
this break will cause a significant imposition for early adopters of
the feature, and so this is being done as a full break. If it turns out
there are critical uses where recompilation is not an option for some
reason, we can consider using ABI tags to ease the transition.
at the start of the entry block, which in turn would aid better code transformation/optimization.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110257
Basically, inline builtin definition are shadowed by externally visible
redefinition. This matches GCC behavior.
The implementation has to workaround the fact that:
1. inline builtin are renamed at callsite during codegen, but
2. they may be shadowed by a later external definition
As a consequence, during codegen, we need to walk redecls and eventually rewrite
some call sites, which is totally inelegant.
Differential Revision: https://reviews.llvm.org/D112059
CodeGenFunction::InitTempAlloca() inits the static alloca within the
entry block which may *not* necessarily be correct always.
For example, the current instruction insertion point (pointed by the
instruction builder) could be a program point which is hit multiple
times during the program execution, and it is expected that the static
alloca is initialized every time the program point is hit.
Hence remove CodeGenFunction::InitTempAlloca(), and initialize the
static alloca where the instruction insertion point is at the moment.
This patch, as a starting attempt, removes the calls to
CodeGenFunction::InitTempAlloca() which do not have any side effect on
the lit tests.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D111293
Per the GCC info page:
If the function is declared 'extern', then this definition of the
function is used only for inlining. In no case is the function
compiled as a standalone function, not even if you take its address
explicitly. Such an address becomes an external reference, as if
you had only declared the function, and had not defined it.
Respect that behavior for inline builtins: keep the original definition, and
generate a copy of the declaration suffixed by '.inline' that's only referenced
in direct call.
This fixes holes in c3717b6858.
Differential Revision: https://reviews.llvm.org/D111009
(This is a recommit of 3d6f49a569 that should no longer break validation since
bd379915de).
It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.
Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.
Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.
After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite
Differential Revision: https://reviews.llvm.org/D109967
It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.
Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.
Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.
After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite
Differential Revision: https://reviews.llvm.org/D109967
Discovered in SYCL, the field annotations were always cast to an i8*,
which is an invalid bitcast for a pointer type with an address space.
This patch makes sure that we create an intrinsic that takes a pointer
to the correct address-space and properly do our casts.
Differential Revision: https://reviews.llvm.org/D109003
8ace121305 introduced a regression for code that explicitly ignores the
-Wframe-larger-than= warning. Make sure we don't generate the
warn-stack-size attribute for that case.
Differential Revision: https://reviews.llvm.org/D108686
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to
prevent all kinds of sanitizer instrumentation applied to a certain
function, Objective-C method, or global variable.
The no_sanitize(...) attribute drops instrumentation checks, but may
still insert code preventing false positive reports. In some cases
though (e.g. when building Linux kernel with -fsanitize=kernel-memory
or -fsanitize=thread) the users may want to avoid any kind of
instrumentation.
Differential Revision: https://reviews.llvm.org/D108029
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D106277
@kpn pointed out that the global variable initialization functions didn't
have the "strictfp" metadata set correctly, and @rjmccall said that there
was buggy code in SetFPModel and StartFunction, this patch is to solve
those problems. When Sema creates a FunctionDecl, it sets the
FunctionDeclBits.UsesFPIntrin to "true" if the lexical FP settings
(i.e. a combination of command line options and #pragma float_control
settings) correspond to ConstrainedFP mode. That bit is used when CodeGen
starts codegen for a llvm function, and it translates into the
"strictfp" function attribute. See bugs.llvm.org/show_bug.cgi?id=44571
Reviewed By: Aaron Ballman
Differential Revision: https://reviews.llvm.org/D102343
In preparation for dropping support for it. I've replaced it with
a proper type where the correct type was obvious and left an
explicit getPointerElementType() where it wasn't.
Fix suggested by Yuanfang Chen:
Non-distinct debuginfo is attached to the function due to the undecorated declaration. Later, when seeing the function definition and `nodebug` attribute, the non-distinct debuginfo should be cleared.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D104777
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).
This allows us to simplify the code dealing with adding mustprogress a
bit.
Reviewed By: aaron.ballman, lebedev.ri
Differential Revision: https://reviews.llvm.org/D96418
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
to
if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
Introduce a getValueAsBool that normalize the check, with the following
behavior:
no attributes or attribute set to "false" => return false
attribute set to "true" => return true
Differential Revision: https://reviews.llvm.org/D99299
The first one is the real parameters of the coroutine function, the
other one just for copying parameters to the coroutine frame.
Considering the following c++ code:
```
struct coro {
...
};
coro foo(struct test & t) {
...
co_await suspend_always();
...
co_await suspend_always();
...
co_await suspend_always();
}
int main(int argc, char *argv[]) {
auto c = foo(...);
c.handle.resume();
...
}
```
Function foo is the standard coroutine function, and it has only
one parameter named t (ignoring this at first),
when we use the llvm code to compile this function, we can get the
following ir:
```
!2921 = distinct !DISubprogram(name: "foo", linkageName:
"_ZN6Object3fooE4test", scope: !2211, file: !45, li\
ne: 48, type: !2329, scopeLine: 48, flags: DIFlagPrototyped |
DIFlagAllCallsDescribed, spFlags: DISPFlagDefi\
nition | DISPFlagOptimized, unit: !44, declaration: !2328,
retainedNodes: !2922)
!2924 = !DILocalVariable(name: "t", arg: 2, scope: !2921, file: !45,
line: 48, type: !838)
...
!2926 = !DILocalVariable(name: "t", scope: !2921, type: !838, flags:
DIFlagArtificial)
```
We can find there are two `the same` DIVariable named t in the same
dwarf scope for foo.resume.
And when we try to use llvm-dwarfdump to dump the dwarf info of this
elf, we get the following output:
```
0x00006684: DW_TAG_subprogram
DW_AT_low_pc (0x00000000004013a0)
DW_AT_high_pc (0x00000000004013a8)
DW_AT_frame_base (DW_OP_reg7 RSP)
DW_AT_object_pointer (0x0000669c)
DW_AT_GNU_all_call_sites (true)
DW_AT_specification (0x00005b5c "_ZN6Object3fooE4test")
0x000066a5: DW_TAG_formal_parameter
DW_AT_name ("t")
DW_AT_decl_file ("/disk1/yifeng.dongyifeng/my_code/llvm/build/bin/coro-debug-1.cpp")
DW_AT_decl_line (48)
DW_AT_type (0x00004146 "test")
0x000066ba: DW_TAG_variable
DW_AT_name ("t")
DW_AT_type (0x00004146 "test")
DW_AT_artificial (true)
```
The elf also has two 't' in the same scope.
But unluckily, it might let the debugger
confused. And failed to print parameters for O0 or above.
This patch will make coroutine parameters and move
parameters use the same DIVar and try to fix the problems
that I mentioned before.
Test Plan: check-clang
Reviewed By: aprantl, jmorse
Differential Revision: https://reviews.llvm.org/D97533
tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available.
Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame).
The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame.
In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap.
Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime.
To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point.
The following is a common code pattern called "Symmetric Transfer" in coroutine:
```
auto tmp = await_suspend();
__builtin_coro_resume(tmp.address());
return;
```
In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine.
During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards.
However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape.
To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived.
I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines.
Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893
Differential Revision: https://reviews.llvm.org/D99227
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.
Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.
Differential Revision: https://reviews.llvm.org/D98030