* svdupq builtins that duplicate scalars to every quadword of a vector
are defined using builtins for svld1rq (load and replicate quadword).
* svdupq builtins that duplicate boolean values to fill a predicate vector
are defined using `svcmpne`.
Reviewers: SjoerdMeijer, efriedma, ctetreau
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78750
test cases
Add support for #pragma float_control
Reviewers: rjmccall, erichkeane, sepavloff
Differential Revision: https://reviews.llvm.org/D72841
This reverts commit 85dc033cac, and makes
corrections to the test cases that failed on buildbots.
Summary:
Removes usage of VectorType::getNumElements identified by test located
at CodeGen/aarch64-sve-intrinsics/acle_sve_abs.c. Since the type is an
SVE predicate vector, it makes sense to specialize the code for scalable
vectors only.
Reviewers: rengolin, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78958
When passing a value of a struct/union type from secure to non-secure
state (that is returning from a CMSE entry function or passing an
argument to CMSE-non-secure call), there is a potential sensitive
information leak via the padding bits in the structure. It is not
possible in the general case to ensure those bits are cleared by using
Standard C/C++.
This patch makes the compiler emit code to clear such padding
bits. Since type information is lost in LLVM IR, the code generation
is done by Clang.
For each interesting record type, we build a bitmask, in which all the
bits, corresponding to user declared members, are set. Values of
record types are returned by coercing them to an integer. After the
coercion, the coerced value is masked (with bitwise AND) and then
returned by the function. In a similar manner, values of record types
are passed as arguments by coercing them to an array of integers, and
the coerced values themselves are masked.
For union types, we effectively clear only bits, which aren't part of
any member, since we don't know which is the currently active one.
The compiler will issue a warning, whenever a union is passed to
non-secure state.
Values of half-precision floating-point types are passed in the least
significant bits of a 32-bit register (GPR or FPR) with the most
significant bits unspecified. Since this is also a potential leak of
sensitive information, this patch also clears those unspecified bits.
Differential Revision: https://reviews.llvm.org/D76369
This patch also adds the enum `sv_prfop` for the prefetch operation specifier
and checks to ensure the passed enum values are valid.
Reviewers: SjoerdMeijer, efriedma, ctetreau
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78674
This also adds the IsOverloadWhileRW flag which tells CGBuiltin to use
the result predicate type and the first pointer type as the
overloaded types for the LLVM IR intrinsic.
Reviewers: SjoerdMeijer, efriedma
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78238
Add the IsOverloadNone flag to tell CGBuiltin that it does not have
an overloaded type. This is used for e.g. svpfalse which does
not take any arguments and always returns a svbool_t.
This patch also adds builtins for svcntb_pat, svcnth_pat, svcntw_pat
and svcntd_pat, as those don't require custom codegen.
Reviewers: SjoerdMeijer, efriedma, rovka
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77596
The ACLE has builtins that take a scalar value that is to be expanded
into a vector by the operation. While the ISA may have an instruction
that takes an immediate or a scalar to represent this, the LLVM IR
intrinsic may not, so Clang will have to splat the scalar value.
This patch also adds the _n forms for svabd, svadd, svdiv, svdivr,
svmax, svmin, svmul, svmulh, svub and svsubr.
Reviewers: SjoerdMeijer, efriedma, rovka
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77594
Summary:
This patch adds a mechanism to easily add range checks for a builtin's
immediate operands. This patch is tested with the qdech intrinsic, which takes
both an enum for the predicate pattern, as well as an immediate for the
multiplier.
Reviewers: efriedma, SjoerdMeijer, rovka
Reviewed By: efriedma, SjoerdMeijer
Subscribers: mgorny, tschuett, mgrang, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76678
This adds builtins for all contiguous loads/stores, including
non-temporal, first-faulting and non-faulting.
Reviewers: efriedma, SjoerdMeijer
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76238
Summary:
Range checks were not properly performed in the lane arguments of Neon
intrinsics implemented based on splat operations. Calls to those
intrinsics where translated to `__builtin__shufflevector` calls directly
by the pre-processor through the arm_neon.h macros, missing the chance
for the proper range checks.
This patch enables the range check by introducing an auxiliary splat
instruction in arm_neon.td, delaying the translation to shufflevector
calls to CGBuiltin.cpp in clang after the checks were performed.
Reviewers: jmolloy, t.p.northover, rsmith, olista01, ostannard
Reviewed By: ostannard
Subscribers: ostannard, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74619
Summary:
Some of the `*_laneq` intrinsics defined in arm_neon.td were missing the
setting of the `isLaneQ` attribute. This patch sets the attribute on the
related definitions, as they will be required to properly perform range
checks on their lane arguments.
Reviewers: jmolloy, t.p.northover, rsmith, olista01, dnsampaio
Reviewed By: dnsampaio
Subscribers: dnsampaio, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D74616
Reworked the patch to avoid sharing a header (SVETypeFlags.h) between
include/clang/Basic and utils/TableGen/SveEmitter.cpp. Now the patch
generates the enum/flags which is included in TargetBuiltins.h.
Also renamed one of the SveEmitter options to be in line with MVE.
Summary:
This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.
I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.
This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.
I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.
Reviewers: efriedma, rovka, SjoerdMeijer, rsandifo-arm, rengolin
Reviewed By: SjoerdMeijer
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75470
This patch adds 'q' to mean 'scalable vector' in the builtin
type string, and for SVE will return the matching builtin
type as defined in the C/C++ language extensions for SVE.
This patch also adds some scaffolding to generate the arm_sve.h
header file, and some builtin definitions (+CodeGen) to be able
to implement some simple masked load intrinsics that use the
ACLE types, such as:
svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) {
return svld1_s8(pg, base);
}
Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin
Reviewed By: efriedma
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75298
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
-gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
-gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
'__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes
Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen
Reviewed By: simon_tatham
Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75850
This patch introduces a new helper class `OMPBuilderCBHelpers`,
which will contain all reusable C/C++ language specific function-
alities required by the `OMPIRBuilder`.
Initially, this helper class contains the body and finalization
codegen functionalities implemented using callbacks which were
moved here for reusability among the different directives
implemented in the `OMPIRBuilder`, along with RAIIs for preserving
state prior to emitting outlined and/or inlined OpenMP regions.
In the future this helper class will also contain all the different
call backs required by OpenMP clauses/variable privatization.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D74562
Summary:
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2019Q1.1
(https://static.docs.arm.com/ihi0042/g/aapcs32.pdf)
section 8.1 Data Types, page 35, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths
of bit-fields where the container overlaps with a non-bit-field member.
This is because the C/C++ memory model defines these as being separate
memory locations, which can be accessed by two threads
simultaneously. For this reason, compilers must be permitted to use a
narrower memory access width (including splitting the access
into multiple instructions) to avoid writing to a different memory location.
```
I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Reviewers: rsmith, rjmccall, eli.friedman, ostannard
Subscribers: ostannard, kristof.beyls, cfe-commits, carwil, olista01
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72932
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
program for the AMDGPU target.
The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D71365
This change introduces three new builtins (which work on both pointers
and integers) that can be used instead of common bitwise arithmetic:
__builtin_align_up(x, alignment), __builtin_align_down(x, alignment) and
__builtin_is_aligned(x, alignment).
I originally added these builtins to the CHERI fork of LLVM a few years ago
to handle the slightly different C semantics that we use for CHERI [1].
Until recently these builtins (or sequences of other builtins) were
required to generate correct code. I have since made changes to the default
C semantics so that they are no longer strictly necessary (but using them
does generate slightly more efficient code). However, based on our experience
using them in various projects over the past few years, I believe that adding
these builtins to clang would be useful.
These builtins have the following benefit over bit-manipulation and casts
via uintptr_t:
- The named builtins clearly convey the semantics of the operation. While
checking alignment using __builtin_is_aligned(x, 16) versus
((x & 15) == 0) is probably not a huge win in readably, I personally find
__builtin_align_up(x, N) a lot easier to read than (x+(N-1))&~(N-1).
- They preserve the type of the argument (including const qualifiers). When
using casts via uintptr_t, it is easy to cast to the wrong type or strip
qualifiers such as const.
- If the alignment argument is a constant value, clang can check that it is
a power-of-two and within the range of the type. Since the semantics of
these builtins is well defined compared to arbitrary bit-manipulation,
it is possible to add a UBSAN checker that the run-time value is a valid
power-of-two. I intend to add this as a follow-up to this change.
- The builtins avoids int-to-pointer casts both in C and LLVM IR.
In the future (i.e. once most optimizations handle it), we could use the new
llvm.ptrmask intrinsic to avoid the ptrtoint instruction that would normally
be generated.
- They can be used to round up/down to the next aligned value for both
integers and pointers without requiring two separate macros.
- In many projects the alignment operations are already wrapped in macros (e.g.
roundup2 and rounddown2 in FreeBSD), so by replacing the macro implementation
with a builtin call, we get improved diagnostics for many call-sites while
only having to change a few lines.
- Finally, the builtins also emit assume_aligned metadata when used on pointers.
This can improve code generation compared to the uintptr_t casts.
[1] In our CHERI compiler we have compilation mode where all pointers are
implemented as capabilities (essentially unforgeable 128-bit fat pointers).
In our original model, casts from uintptr_t (which is a 128-bit capability)
to an integer value returned the "offset" of the capability (i.e. the
difference between the virtual address and the base of the allocation).
This causes problems for cases such as checking the alignment: for example, the
expression `if ((uintptr_t)ptr & 63) == 0` is generally used to check if the
pointer is aligned to a multiple of 64 bytes. The problem with offsets is that
any pointer to the beginning of an allocation will have an offset of zero, so
this check always succeeds in that case (even if the address is not correctly
aligned). The same issues also exist when aligning up or down. Using the
alignment builtins ensures that the address is used instead of the offset. While
I have since changed the default C semantics to return the address instead of
the offset when casting, this offset compilation mode can still be used by
passing a command-line flag.
Reviewers: rsmith, aaron.ballman, theraven, fhahn, lebedev.ri, nlopes, aqjune
Reviewed By: aaron.ballman, lebedev.ri
Differential Revision: https://reviews.llvm.org/D71499
Summary:
The new OpenMPConstants.h is a location for all OpenMP related constants
(and helpers) to live.
This patch moves the directives there (the enum OpenMPDirectiveKind) and
rewires Clang to use the new location.
Initially part of D69785.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: jholewinski, ppenzin, penzn, llvm-commits, cfe-commits, jfb, guansong, bollu, hiraditya, mgorny
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69853
Patch was reverted because https://bugs.llvm.org/show_bug.cgi?id=44048
The original patch is modified to set the strictfp IR attribute
explicitly in CodeGen instead of as a side effect of IRBuilder.
In the 2nd attempt to reapply there was a windows lit test fail, the
tests were fixed to use wildcard matching.
Differential Revision: https://reviews.llvm.org/D62731
According to OpenMP 5.0, if clause can be used in simd directive. If
condition in the if clause if false, the non-vectorized version of the
loop must be executed.
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.
This reverts commits af57dbf12e and e6584b2b7b
Add options to control floating point behavior: trapping and
exception behavior, rounding, and control of optimizations that affect
floating point calculations. More details in UsersManual.rst.
Reviewers: rjmccall
Differential Revision: https://reviews.llvm.org/D62731
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.
Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.
Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.
So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.
Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.
(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67161
Added parsing/sema/codegen support for 'parallel master taskloop'
constructs. Some of the clauses, like 'grainsize', 'num_tasks', 'final'
and 'priority' are not supported in full, only constant expressions can
be used currently in these clauses.
llvm-svn: 374791
The behavior from the original patch has changed, since we're no longer
allowing LLVM to just ignore the alignment. Instead, we're just
assuming the maximum possible alignment.
Differential Revision: https://reviews.llvm.org/D68824
llvm-svn: 374562
The test fails on Windows, with
error: 'warning' diagnostics expected but not seen:
File builtin-assume-aligned.c Line 62: requested alignment
must be 268435456 bytes or smaller; assumption ignored
error: 'warning' diagnostics seen but not expected:
File builtin-assume-aligned.c Line 62: requested alignment
must be 8192 bytes or smaller; assumption ignored
llvm-svn: 374456
Code to handle __builtin_assume_aligned was allowing larger values, but
would convert this to unsigned along the way. This patch removes the
EmitAssumeAligned overloads that take unsigned to do away with this
problem.
Additionally, it adds a warning that values greater than 1 <<29 are
ignored by LLVM.
Differential Revision: https://reviews.llvm.org/D68824
llvm-svn: 374450
Reason: this commit causes crashes in the clang compiler when building
LLVM Support with libc++, see https://bugs.llvm.org/show_bug.cgi?id=42665
for details.
llvm-svn: 366429
Summary:
This patch does mainly three things:
1. It fixes a false positive error detection in Sema that is similar to
D62156. The error happens when explicitly calling an overloaded
destructor for different address spaces.
2. It selects the correct destructor when multiple overloads for
address spaces are available.
3. It inserts the expected address space cast when invoking a
destructor, if needed, and therefore fixes a crash due to the unmet
assertion in llvm::CastInst::Create.
The following is a reproducer of the three issues:
struct MyType {
~MyType() {}
~MyType() __constant {}
};
__constant MyType myGlobal{};
kernel void foo() {
myGlobal.~MyType(); // 1 and 2.
// 1. error: cannot initialize object parameter of type
// '__generic MyType' with an expression of type '__constant MyType'
// 2. error: no matching member function for call to '~MyType'
}
kernel void bar() {
// 3. The implicit call to the destructor crashes due to:
// Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
// in llvm::CastInst::Create.
MyType myLocal;
}
The added test depends on D62413 and covers a few more things than the
above reproducer.
Subscribers: yaxunl, Anastasia, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64569
llvm-svn: 366422
The original commit is r366076. It is temporarily reverted (r366155)
due to test failure. This resubmit makes test more robust by accepting
regex instead of hardcoded names/references in several places.
This is a followup patch for https://reviews.llvm.org/D61809.
Handle unnamed bitfield properly and add more test cases.
Fixed the unnamed bitfield issue. The unnamed bitfield is ignored
by debug info, so we need to ignore such a struct/union member
when we try to get the member index in the debug info.
D61809 contains two test cases but not enough as it does
not checking generated IRs in the fine grain level, and also
it does not have semantics checking tests.
This patch added unit tests for both code gen and semantics checking for
the new intrinsic.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 366231
i.e., recent 5745eccef54ddd3caca278d1d292a88b2281528b:
* Bump the function_type_mismatch handler version, as its signature has changed.
* The function_type_mismatch handler can return successfully now, so
SanitizerKind::Function must be AlwaysRecoverable (like for
SanitizerKind::Vptr).
* But the minimal runtime would still unconditionally treat a call to the
function_type_mismatch handler as failure, so disallow -fsanitize=function in
combination with -fsanitize-minimal-runtime (like it was already done for
-fsanitize=vptr).
* Add tests.
Differential Revision: https://reviews.llvm.org/D61479
llvm-svn: 366186
This is a followup patch for https://reviews.llvm.org/D61809.
Handle unnamed bitfield properly and add more test cases.
Fixed the unnamed bitfield issue. The unnamed bitfield is ignored
by debug info, so we need to ignore such a struct/union member
when we try to get the member index in the debug info.
D61809 contains two test cases but not enough as it does
not checking generated IRs in the fine grain level, and also
it does not have semantics checking tests.
This patch added unit tests for both code gen and semantics checking for
the new intrinsic.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 366076
For background of BPF CO-RE project, please refer to
http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.
In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.
Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
addr = preserve_array_access_index(base, index, dimension)
addr = preserve_union_access_index(base, di_index)
addr = preserve_struct_access_index(base, gep_index, di_index)
here,
base: the base pointer for the array/union/struct access.
index: the last access index for array, the same for IR/DebugInfo layout.
dimension: the array dimension.
gep_index: the access index based on IR layout.
di_index: the access index based on user/debuginfo types.
If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().
For example, for the following example,
$ cat test.c
struct sk_buff {
int i;
int b1:1;
int b2:2;
union {
struct {
int o1;
int o2;
} o;
struct {
char flags;
char dev_id;
} dev;
int netid;
} u[10];
};
static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
= (void *) 4;
#define _(x) (__builtin_preserve_access_index(x))
int bpf_prog(struct sk_buff *ctx) {
char dev_id;
bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
return dev_id;
}
$ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
test.c >& log
The generated IR looks like below:
...
define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
%2 = alloca %struct.sk_buff*, align 8
%3 = alloca i8, align 1
store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
%4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
%5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
%6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
%struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
%7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
[10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
%8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
%union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
%9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
%10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
%struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
%11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
%12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
%13 = sext i8 %12 to i32, !dbg !54
call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
ret i32 %13, !dbg !57
}
!19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
!26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
!34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)
Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.
For &ctx->u[5].dev.dev_id,
. The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
. The "%7 = ..." represents array subscript "5".
. The "%8 = ..." represents union member "dev" with index 1 for DI layout.
. The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.
Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.
The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D61809
llvm-svn: 365438
For background of BPF CO-RE project, please refer to
http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.
In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.
Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
addr = preserve_array_access_index(base, index, dimension)
addr = preserve_union_access_index(base, di_index)
addr = preserve_struct_access_index(base, gep_index, di_index)
here,
base: the base pointer for the array/union/struct access.
index: the last access index for array, the same for IR/DebugInfo layout.
dimension: the array dimension.
gep_index: the access index based on IR layout.
di_index: the access index based on user/debuginfo types.
If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().
For example, for the following example,
$ cat test.c
struct sk_buff {
int i;
int b1:1;
int b2:2;
union {
struct {
int o1;
int o2;
} o;
struct {
char flags;
char dev_id;
} dev;
int netid;
} u[10];
};
static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
= (void *) 4;
#define _(x) (__builtin_preserve_access_index(x))
int bpf_prog(struct sk_buff *ctx) {
char dev_id;
bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
return dev_id;
}
$ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
test.c >& log
The generated IR looks like below:
...
define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
%2 = alloca %struct.sk_buff*, align 8
%3 = alloca i8, align 1
store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
%4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
%5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
%6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
%struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
%7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
[10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
%8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
%union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
%9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
%10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
%struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
%11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
%12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
%13 = sext i8 %12 to i32, !dbg !54
call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
ret i32 %13, !dbg !57
}
!19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
!26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
!34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)
Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.
For &ctx->u[5].dev.dev_id,
. The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
. The "%7 = ..." represents array subscript "5".
. The "%8 = ..." represents union member "dev" with index 1 for DI layout.
. The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.
Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.
The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 365435
A handful of C++ cases as reported in PR42352 didn't actually give an
error when always_inlining with a different target feature list. This
resulted in broken IR.
llvm-svn: 364109
Summary:
Add support for the C++2a [[no_unique_address]] attribute for targets using the Itanium C++ ABI.
This depends on D63371.
Reviewers: rjmccall, aaron.ballman
Subscribers: dschuff, aheejin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D63451
llvm-svn: 363976
Summary:
This patch implements the source location builtins `__builtin_LINE(), `__builtin_FUNCTION()`, `__builtin_FILE()` and `__builtin_COLUMN()`. These builtins are needed to implement [`std::experimental::source_location`](https://rawgit.com/cplusplus/fundamentals-ts/v2/main.html#reflection.src_loc.creation).
With the exception of `__builtin_COLUMN`, GCC also implements these builtins, and Clangs behavior is intended to match as closely as possible.
Reviewers: rsmith, joerg, aaron.ballman, bogner, majnemer, shafik, martong
Reviewed By: rsmith
Subscribers: rnkovacs, loskutov, riccibruno, mgorny, kunitoki, alexr, majnemer, hfinkel, cfe-commits
Differential Revision: https://reviews.llvm.org/D37035
llvm-svn: 360937
Improved classification of address space cast when qualification
conversion is performed - prevent adding addr space cast for
non-pointer and non-reference types. Take address space correctly
from the pointee.
Also pass correct address space from 'this' object using
AggValueSlot when generating addrspacecast in the constructor
call.
Differential Revision: https://reviews.llvm.org/D59988
llvm-svn: 357682
Future versions of MSVC make these intrinsics available on x86 & x64,
according to:
http://lists.llvm.org/pipermail/cfe-dev/2019-March/061711.html
The purpose of these builtins is to emit plain, non-atomic, volatile
stores when /volatile:ms (-cc1 -fms-volatile) is enabled.
llvm-svn: 357220
This reverts commit r353765. After talking with our c stdlib folks, we decided
to use the existing pass_object_size attribute to implement _FORTIFY_SOURCE
wrappers, like Bionic does (I didn't realize that pass_object_size could be used
for this purpose). Sorry for the flip/flop, and thanks to James Y. Knight for
pointing this out to me.
llvm-svn: 356103
This provides a code size win on the caller side, since the init
message send is done in the runtime function.
rdar://44987038
Differential revision: https://reviews.llvm.org/D57936
llvm-svn: 354056
This attribute applies to declarations of C stdlib functions
(sprintf, memcpy...) that have known fortified variants
(__sprintf_chk, __memcpy_chk, ...). When applied, clang will emit
calls to the fortified variant functions instead of calls to the
defaults.
In GCC, this is done by adding gnu_inline-style wrapper functions,
but that doesn't work for us for variadic functions because we don't
support __builtin_va_arg_pack (and have no intention to).
This attribute takes two arguments, the first is 'type' argument
passed through to __builtin_object_size, and the second is a flag
argument that gets passed through to the variadic checking variants.
rdar://47905754
Differential revision: https://reviews.llvm.org/D57918
llvm-svn: 353765
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.
Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.
Differential Revision: https://reviews.llvm.org/D57668
llvm-svn: 353184
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every unreachable instruction. However, the
optimizer will remove code after calls to functions marked with
noreturn. To avoid this UBSan removes noreturn from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
_asan_handle_no_return before noreturn functions. This is important for
functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* longjmp (longjmp itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the noreturn attributes are missing and ASan cannot
unpoison the stack, so it has false positives when stack unwinding is
used.
Changes:
Clang-CodeGen now directly insert calls to `__asan_handle_no_return`
when a call to a noreturn function is encountered and both
UBsan-unreachable and ASan are enabled. This allows UBSan to continue
removing the noreturn attribute from functions without any changes to
the ASan pass.
Previously generated code:
```
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
```
Generated code (for now):
```
call void @__asan_handle_no_return
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
```
rdar://problem/40723397
Reviewers: delcypher, eugenis, vsk
Differential Revision: https://reviews.llvm.org/D57278
> llvm-svn: 352690
llvm-svn: 352829
Summary:
UBSan wants to detect when unreachable code is actually reached, so it
adds instrumentation before every unreachable instruction. However, the
optimizer will remove code after calls to functions marked with
noreturn. To avoid this UBSan removes noreturn from both the call
instruction as well as from the function itself. Unfortunately, ASan
relies on this annotation to unpoison the stack by inserting calls to
_asan_handle_no_return before noreturn functions. This is important for
functions that do not return but access the the stack memory, e.g.,
unwinder functions *like* longjmp (longjmp itself is actually
"double-proofed" via its interceptor). The result is that when ASan and
UBSan are combined, the noreturn attributes are missing and ASan cannot
unpoison the stack, so it has false positives when stack unwinding is
used.
Changes:
Clang-CodeGen now directly insert calls to `__asan_handle_no_return`
when a call to a noreturn function is encountered and both
UBsan-unreachable and ASan are enabled. This allows UBSan to continue
removing the noreturn attribute from functions without any changes to
the ASan pass.
Previously generated code:
```
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
```
Generated code (for now):
```
call void @__asan_handle_no_return
call void @longjmp
call void @__asan_handle_no_return
call void @__ubsan_handle_builtin_unreachable
```
rdar://problem/40723397
Reviewers: delcypher, eugenis, vsk
Differential Revision: https://reviews.llvm.org/D57278
llvm-svn: 352690
This builtin has the same UI as __builtin_object_size, but has the
potential to be evaluated dynamically. It is meant to be used as a
drop-in replacement for libraries that use __builtin_object_size when
a dynamic checking mode is enabled. For instance,
__builtin_object_size fails to provide any extra checking in the
following function:
void f(size_t alloc) {
char* p = malloc(alloc);
strcpy(p, "foobar"); // expands to __builtin___strcpy_chk(p, "foobar", __builtin_object_size(p, 0))
}
This is an overflow if alloc < 7, but because LLVM can't fold the
object size intrinsic statically, it folds __builtin_object_size to
-1. With __builtin_dynamic_object_size, alloc is passed through to
__builtin___strcpy_chk.
rdar://32212419
Differential revision: https://reviews.llvm.org/D56760
llvm-svn: 352665
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Lambda captures should be destroyed if an exception is thrown only if
the construction of the complete lambda-expression has not completed.
(If the lambda-expression has been fully constructed, any exception will
invoke its destructor, which will destroy the captures.)
This is directly modeled after how we handle the equivalent situation in
InitListExprs.
Note that EmitLambdaLValue was unreachable because in C++11 onwards the
frontend never creates the awkward situation where a prvalue expression
(such as a lambda) is used in an lvalue context (such as the left-hand
side of a class member access).
llvm-svn: 351487
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```
This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.
Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.
This is a second commit, the original one was r351105,
which was mass-reverted in r351159 because 2 compiler-rt tests were failing.
Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall
Reviewed By: rjmccall
Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D54589
llvm-svn: 351177
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```
This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.
Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.
Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall
Reviewed By: rjmccall
Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D54589
llvm-svn: 351105
It is faster to directly call the ObjC runtime for methods such as retain/release instead of sending a message to those functions.
Differential Revision: https://reviews.llvm.org/D55869
Reviewed By: rjmccall
llvm-svn: 349952
It is faster to directly call the ObjC runtime for methods such as alloc/allocWithZone instead of sending a message to those functions.
This patch adds support for converting messages to alloc/allocWithZone to their equivalent runtime calls.
Tests included for the positive case of applying this transformation, negative tests that we ensure we only convert "alloc" to objc_alloc, not "alloc2", and also a driver test to ensure we enable this only for supported runtime versions.
Reviewed By: rjmccall
https://reviews.llvm.org/D55349
llvm-svn: 348687
As suggested by Richard Smith, and initially put up for review here:
https://reviews.llvm.org/D53341, this patch removes a hack that was used
to ensure that proper target-feature lists were used when emitting
cpu-dispatch (and eventually, target-clones) implementations. As a part
of this, the GlobalDecl object is proliferated to a bunch more
locations.
Originally, this was put up for review (see above) to get acceptance on
the approach, though discussion with Richard in San Diego showed he
approved of the approach taken here. Thus, I believe this is acceptable
for Review-After-commit
Differential Revision: https://reviews.llvm.org/D53341
Change-Id: I0a0bd673340d334d93feac789d653e03d9f6b1d5
llvm-svn: 346757
The artificial variable describing the array size is supposed to be
called "__vla_expr", but this was implemented by retrieving the name
of the associated alloca, which isn't a reliable source for the name,
since nonassert compilers may drop names from LLVM IR.
rdar://problem/45924808
llvm-svn: 346542
The goal is to use `emitConstant` in more places. Didn't move
`ComplexExprEmitter::emitConstant` because it returns a different type.
Reviewers: rjmccall, ahatanak
Reviewed By: rjmccall
Subscribers: dexonsmith, erik.pilkington, cfe-commits
Differential Revision: https://reviews.llvm.org/D53725
llvm-svn: 345897
__tls_guard.
__tls_guard can only ever transition from 0 to 1, and only once. This
permits LLVM to remove repeated checks for TLS initialization and
repeated initialization code in cases like:
int g();
thread_local int n = g();
int a = n + n;
where we could not prove that __tls_guard was still 'true' when checking
it for the second reference to 'n' in the initializer of 'a'.
llvm-svn: 345774
A ConstantExpr class represents a full expression that's in a context where a
constant expression is required. This class reflects the path the evaluator
took to reach the expression rather than the syntactic context in which the
expression occurs.
In the future, the class will be expanded to cache the result of the evaluated
expression so that it's not needlessly re-evaluated
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D53475
llvm-svn: 345692
Similar to how ICC handles CPU-Dispatch on Windows, this patch uses the
resolver function directly to forward the call to the proper function.
This is not nearly as efficient as IFuncs of course, but is still quite
useful for large functions specifically developed for certain
processors.
This is unfortunately still limited to x86, since it depends on
__builtin_cpu_supports and __builtin_cpu_is, which are x86 builtins.
The naming for the resolver/forwarding function for cpu-dispatch was
taken from ICC's implementation, which uses the unmodified name for this
(no mangling additions). This is possible, since cpu-dispatch uses '.A'
for the 'default' version.
In 'target' multiversioning, this function keeps the '.resolver'
extension in order to keep the default function keeping the default
mangling.
Change-Id: I4731555a39be26c7ad59a2d8fda6fa1a50f73284
Differential Revision: https://reviews.llvm.org/D53586
llvm-svn: 345298
These declarations somehow survived a cleanup that combined them with the target
multiversioning functions. This patch removes them as they are no
longer necessary or used.
Change-Id: I318286401ace63bef1aa48018dabb25be0117ca0
llvm-svn: 345145
libgcc supports more than 32 features by adding a new 32-bit variable __cpu_features2.
This adds the clang support for checking these feature bits.
Patches for compiler-rt and llvm to support this are coming as well.
Probably still need an additional patch for target multiversioning in clang.
Differential Revision: https://reviews.llvm.org/D53458
llvm-svn: 344832
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
This reapplies r343518 after fixing a use-after-free bug in function
Sema::ActOnBlockStmtExpr where the BlockScopeInfo was dereferenced after
it was popped and deleted.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 343542
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
This reapplies r341754, which was reverted in r341757 because it broke a
couple of bots. r341754 was calling markEscapingByrefs after the call to
PopFunctionScopeInfo, which caused the popped function scope to be
cleared out when the following code was compiled, for example:
$ cat test.m
struct A {
id data[10];
};
void foo() {
__block A v;
^{ (void)v; };
}
This commit calls markEscapingByrefs before calling PopFunctionScopeInfo
to prevent that from happening.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 343518
Summary:
Some lines have a hit counter where they should not have one.
Cleanup stuff is located to the last line of the body which is most of the time a '}'.
And Exception stuff is added at the beginning of a function and at the end (represented by '{' and '}').
So in such cases, the DebugLoc used in GCOVProfiling.cpp must be marked as not covered.
This patch is a followup of https://reviews.llvm.org/D49915.
Tests in projects/compiler_rt are fixed by: https://reviews.llvm.org/D49917
Reviewers: marco-c, davidxl
Reviewed By: marco-c
Subscribers: dblaikie, cfe-commits, sylvestre.ledru
Differential Revision: https://reviews.llvm.org/D49916
llvm-svn: 342717
Previously, both types (plus the future target-clones) of
multiversioning had a separate ResolverOption structure and emission
function. This patch combines the two, at the expense of a slightly
more expensive sorting function.
llvm-svn: 342152
from those that aren't.
This patch changes the way __block variables that aren't captured by
escaping blocks are handled:
- Since non-escaping blocks on the stack never get copied to the heap
(see https://reviews.llvm.org/D49303), Sema shouldn't error out when
the type of a non-escaping __block variable doesn't have an accessible
copy constructor.
- IRGen doesn't have to use the specialized byref structure (see
https://clang.llvm.org/docs/Block-ABI-Apple.html#id8) for a
non-escaping __block variable anymore. Instead IRGen can emit the
variable as a normal variable and copy the reference to the block
literal. Byref copy/dispose helpers aren't needed either.
rdar://problem/39352313
Differential Revision: https://reviews.llvm.org/D51564
llvm-svn: 341754
This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing)
with 1 change:
Remove the changes to make microsoft builtins also use the LLVM intrinsics.
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops) that we want to replicate, we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340141
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).
Original commit message:
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).
We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.
The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242
With improved codegen in:
https://reviews.llvm.org/rL337966https://reviews.llvm.org/rL339359
And basic IR optimization added in:
https://reviews.llvm.org/rL338218https://reviews.llvm.org/rL340022
...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.
In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.
Differential Revision: https://reviews.llvm.org/D50924
llvm-svn: 340137
Clang generates copy and dispose helper functions for each block literal
on the stack. Often these functions are equivalent for different blocks.
This commit makes changes to merge equivalent copy and dispose helper
functions and reduce code size.
To enable merging equivalent copy/dispose functions, the captured object
infomation is encoded into the helper function name. This allows IRGen
to check whether an equivalent helper function has already been emitted
and reuse the function instead of generating a new helper function
whenever a block is defined. In addition, the helper functions are
marked as linkonce_odr to enable merging helper functions that have the
same name across translation units and marked as unnamed_addr to enable
the linker's deduplication pass to merge functions that have different
names but the same content.
rdar://problem/42640608
Differential Revision: https://reviews.llvm.org/D50152
llvm-svn: 339438
Summary:
Introduces funclet-based unwinding for Objective-C and fixes an issue
where global blocks can't have their isa pointers initialised on
Windows.
After discussion with Dustin, this changes the name mangling of
Objective-C types to prevent a C++ catch statement of type struct X*
from catching an Objective-C object of type X*.
Reviewers: rjmccall, DHowett-MSFT
Reviewed By: rjmccall, DHowett-MSFT
Subscribers: mgrang, mstorsjo, smeenai, cfe-commits
Differential Revision: https://reviews.llvm.org/D50144
llvm-svn: 339428
Summary:
C and C++ are interesting languages. They are statically typed, but weakly.
The implicit conversions are allowed. This is nice, allows to write code
while balancing between getting drowned in everything being convertible,
and nothing being convertible. As usual, this comes with a price:
```
unsigned char store = 0;
bool consume(unsigned int val);
void test(unsigned long val) {
if (consume(val)) {
// the 'val' is `unsigned long`, but `consume()` takes `unsigned int`.
// If their bit widths are different on this platform, the implicit
// truncation happens. And if that `unsigned long` had a value bigger
// than UINT_MAX, then you may or may not have a bug.
// Similarly, integer addition happens on `int`s, so `store` will
// be promoted to an `int`, the sum calculated (0+768=768),
// and the result demoted to `unsigned char`, and stored to `store`.
// In this case, the `store` will still be 0. Again, not always intended.
store = store + 768; // before addition, 'store' was promoted to int.
}
// But yes, sometimes this is intentional.
// You can either make the conversion explicit
(void)consume((unsigned int)val);
// or mask the value so no bits will be *implicitly* lost.
(void)consume((~((unsigned int)0)) & val);
}
```
Yes, there is a `-Wconversion`` diagnostic group, but first, it is kinda
noisy, since it warns on everything (unlike sanitizers, warning on an
actual issues), and second, there are cases where it does **not** warn.
So a Sanitizer is needed. I don't have any motivational numbers, but i know
i had this kind of problem 10-20 times, and it was never easy to track down.
The logic to detect whether an truncation has happened is pretty simple
if you think about it - https://godbolt.org/g/NEzXbb - basically, just
extend (using the new, not original!, signedness) the 'truncated' value
back to it's original width, and equality-compare it with the original value.
The most non-trivial thing here is the logic to detect whether this
`ImplicitCastExpr` AST node is **actually** an implicit conversion, //or//
part of an explicit cast. Because the explicit casts are modeled as an outer
`ExplicitCastExpr` with some `ImplicitCastExpr`'s as **direct** children.
https://godbolt.org/g/eE1GkJ
Nowadays, we can just use the new `part_of_explicit_cast` flag, which is set
on all the implicitly-added `ImplicitCastExpr`'s of an `ExplicitCastExpr`.
So if that flag is **not** set, then it is an actual implicit conversion.
As you may have noted, this isn't just named `-fsanitize=implicit-integer-truncation`.
There are potentially some more implicit conversions to be warned about.
Namely, implicit conversions that result in sign change; implicit conversion
between different floating point types, or between fp and an integer,
when again, that conversion is lossy.
One thing i know isn't handled is bitfields.
This is a clang part.
The compiler-rt part is D48959.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=21530 | PR21530 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=37552 | PR37552 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=35409 | PR35409 ]].
Partially fixes [[ https://bugs.llvm.org/show_bug.cgi?id=9821 | PR9821 ]].
Fixes https://github.com/google/sanitizers/issues/940. (other than sign-changing implicit conversions)
Reviewers: rjmccall, rsmith, samsonov, pcc, vsk, eugenis, efriedma, kcc, erichkeane
Reviewed By: rsmith, vsk, erichkeane
Subscribers: erichkeane, klimek, #sanitizers, aaron.ballman, RKSimon, dtzWill, filcab, danielaustin, ygribov, dvyukov, milianw, mclow.lists, cfe-commits, regehr
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D48958
llvm-svn: 338288
With this change compiler generates alignment checks for wider range
of types. Previously such checks were generated only for the record types
with non-trivial default constructor. So the types like:
struct alignas(32) S2 { int x; };
typedef __attribute__((ext_vector_type(2), aligned(32))) float float32x2_t;
did not get checks when allocated by 'new' expression.
This change also optimizes the checks generated for the arrays created
in 'new' expressions. Previously the check was generated for each
invocation of type constructor. Now the check is generated only once
for entire array.
Differential Revision: https://reviews.llvm.org/D49589
llvm-svn: 338199
When an exception is thrown in a block copy helper function, captured
objects that have previously been copied should be destructed or
released. Similarly, captured objects that are yet to be released should
be released when an exception is thrown in a dispose helper function.
rdar://problem/42410255
Differential Revision: https://reviews.llvm.org/D49718
llvm-svn: 338041
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.
This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.
This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.
The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.
Differential Revision: https://reviews.llvm.org/D47474
llvm-svn: 337552
The member init list for the sole constructor for CodeGenFunction
has gotten out of hand, so this patch moves the non-parameter-dependent
initializations into the member value inits.
Note: This is what was intended to be committed in r336726
llvm-svn: 336729
The member init list for the sole constructor for CodeGenFunction
has gotten out of hand, so this patch moves the non-parameter-dependent
initializations into the member value inits.
llvm-svn: 336726
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.
Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.
To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.
To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.
To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.
To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.
There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.
Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.
Differential Revision: https://reviews.llvm.org/D48617
llvm-svn: 336583
Similarly to CFI on virtual and indirect calls, this implementation
tries to use program type information to make the checks as precise
as possible. The basic way that it works is as follows, where `C`
is the name of the class being defined or the target of a call and
the function type is assumed to be `void()`.
For virtual calls:
- Attach type metadata to the addresses of function pointers in vtables
(not the functions themselves) of type `void (B::*)()` for each `B`
that is a recursive dynamic base class of `C`, including `C` itself.
This type metadata has an annotation that the type is for virtual
calls (to distinguish it from the non-virtual case).
- At the call site, check that the computed address of the function
pointer in the vtable has type `void (C::*)()`.
For non-virtual calls:
- Attach type metadata to each non-virtual member function whose address
can be taken with a member function pointer. The type of a function
in class `C` of type `void()` is each of the types `void (B::*)()`
where `B` is a most-base class of `C`. A most-base class of `C`
is defined as a recursive base class of `C`, including `C` itself,
that does not have any bases.
- At the call site, check that the function pointer has one of the types
`void (B::*)()` where `B` is a most-base class of `C`.
Differential Revision: https://reviews.llvm.org/D47567
llvm-svn: 335569
Summary:
Because wasm control flow needs to be structured, using WinEH
instructions to support wasm EH brings several benefits. This patch
makes wasm EH uses Windows EH instructions, with some changes:
1. Because wasm uses a single catch block to catch all C++ exceptions,
this merges all catch clauses into a single catchpad, within which we
test the EH selector as in Itanium EH.
2. Generates a call to `__clang_call_terminate` in case a cleanup
throws. Wasm does not have a runtime to handle this.
3. In case there is no catch-all clause, inserts a call to
`__cxa_rethrow` at the end of a catchpad in order to unwind to an
enclosing EH scope.
Reviewers: majnemer, dschuff
Subscribers: jfb, sbc100, jgravelle-google, sunfish, cfe-commits
Differential Revision: https://reviews.llvm.org/D44931
llvm-svn: 333703
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.
Reviewers: javed.absar, mstorsjo
Reviewed By: mstorsjo
Subscribers: kristof.beyls, cfe-commits
Differential Revision: https://reviews.llvm.org/D47476
llvm-svn: 333513
Introduced CreateMemTempWithoutCast and CreateTemporaryAllocaWithoutCast to emit alloca
without casting to default addr space.
ActiveFlag is a temporary variable emitted for clean up. It is defined as AllocaInst* type and there is
a cast to AlllocaInst in SetActiveFlag. An alloca casted to generic pointer causes assertion in
SetActiveFlag.
Since there is only load/store of ActiveFlag, it is safe to use the original alloca, therefore use
CreateMemTempWithoutCast is called.
Differential Revision: https://reviews.llvm.org/D47099
llvm-svn: 332982
lifetime.start/end expects pointer argument in alloca address space.
However in C++ a temporary variable is in default address space.
This patch changes API CreateMemTemp and CreateTempAlloca to
get the original alloca instruction and pass it lifetime.start/end.
It only affects targets with non-zero alloca address space.
Differential Revision: https://reviews.llvm.org/D45900
llvm-svn: 332593
This is similar to the LLVM change https://reviews.llvm.org/D46290.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46320
llvm-svn: 331834
function if a function delegates to another function.
Fix a bug introduced in r328731, which caused a struct with ObjC __weak
fields that was passed to a function to be destructed twice, once in the
callee function and once in another function the callee function
delegates to. To prevent this, keep track of the callee-destructed
structs passed to a function and disable their cleanups at the point of
the call to the delegated function.
This reapplies r331016, which was reverted in r331019 because it caused
an assertion to fail in EmitDelegateCallArg on a windows bot. I made
changes to EmitDelegateCallArg so that it doesn't try to deactivate
cleanups for structs that have trivial destructors (cleanups for those
structs are never pushed to the cleanup stack in EmitParmDecl).
rdar://problem/39194693
Differential Revision: https://reviews.llvm.org/D45382
llvm-svn: 331020
function if a function delegates to another function.
Fix a bug introduced in r328731, which caused a struct with ObjC __weak
fields that was passed to a function to be destructed twice, once in the
callee function and once in another function the callee function
delegates to. To prevent this, keep track of the callee-destructed
structs passed to a function and disable their cleanups at the point of
the call to the delegated function.
rdar://problem/39194693
Differential Revision: https://reviews.llvm.org/D45382
llvm-svn: 331016
Summary:
A clang builtin for xray typed events. Differs from
__xray_customevent(...) by the presence of a type tag that is vended by
compiler-rt in typical usage. This allows xray handlers to expand logged
events with their type description and plugins to process traced events
based on type.
This change depends on D45633 for the intrinsic definition.
Reviewers: dberris, pelikan, rnk, eizan
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D45716
llvm-svn: 330220
register destructor functions annotated with __attribute__((destructor))
using __cxa_atexit or atexit.
Register destructor functions annotated with __attribute__((destructor))
calling __cxa_atexit in a synthesized constructor function instead of
emitting references to the functions in a special section.
The primary reason for adding this option is that we are planning to
deprecate the __mod_term_funcs section on Darwin in the future. This
feature is enabled by default only on Darwin. Users who do not want this
can use command line option 'fno_register_global_dtors_with_atexit' to
disable it.
rdar://problem/33887655
Differential Revision: https://reviews.llvm.org/D45578
llvm-svn: 330199
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:
archtype
cas
classs
checkk
compres
definit
frome
iff
inteval
ith
lod
methode
nd
optin
ot
pres
statics
te
thru
Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)
Differential revision: https://reviews.llvm.org/D44188
llvm-svn: 329399
the tail padding is not reused.
We track on the AggValueSlot (and through a couple of other
initialization actions) whether we're dealing with an object that might
share its tail padding with some other object, so that we can avoid
emitting stores into the tail padding if that's the case. We still
widen stores into tail padding when we can do so.
Differential Revision: https://reviews.llvm.org/D45306
llvm-svn: 329342
Summary:
The following class hierarchy requires that we be able to emit a
this-adjusting thunk for B::foo in C's vftable:
struct Incomplete;
struct A {
virtual A* foo(Incomplete p) = 0;
};
struct B : virtual A {
void foo(Incomplete p) override;
};
struct C : B { int c; };
This TU is valid, but lacks a definition of 'Incomplete', which makes it
hard to build a thunk for the final overrider, B::foo.
Before this change, Clang gives up attempting to emit the thunk, because
it assumes that if the parameter types are incomplete, it must be
emitting the thunk for optimization purposes. This is untrue for the MS
ABI, where the implementation of B::foo has no idea what thunks C's
vftable may require. Clang needs to emit the thunk without necessarily
having access to the complete prototype of foo.
This change makes Clang emit a musttail variadic call when it needs such
a thunk. I call these "unprototyped" thunks, because they only prototype
the "this" parameter, which must always come first in the MS C++ ABI.
These thunks work, but they create ugly LLVM IR. If the call to the
thunk is devirtualized, it will be a call to a bitcast of a function
pointer. Today, LLVM cannot inline through such a call, but I want to
address that soon, because we also use this pattern for virtual member
pointer thunks.
This change also implements an old FIXME in the code about reusing the
thunk's computed CGFunctionInfo as much as possible. Now we don't end up
computing the thunk's mangled name and arranging it's prototype up to
around three times.
Fixes PR25641
Reviewers: rjmccall, rsmith, hans
Subscribers: Prazek, cfe-commits
Differential Revision: https://reviews.llvm.org/D45112
llvm-svn: 329009
Summary:
Libc++'s default allocator uses `__builtin_operator_new` and `__builtin_operator_delete` in order to allow the calls to new/delete to be ellided. However, libc++ now needs to support over-aligned types in the default allocator. In order to support this without disabling the existing optimization Clang needs to support calling the aligned new overloads from the builtins.
See llvm.org/PR22634 for more information about the libc++ bug.
This patch changes `__builtin_operator_new`/`__builtin_operator_delete` to call any usual `operator new`/`operator delete` function. It does this by performing overload resolution with the arguments passed to the builtin to determine which allocation function to call. If the selected function is not a usual allocation function a diagnostic is issued.
One open issue is if the `align_val_t` overloads should be considered "usual" when `LangOpts::AlignedAllocation` is disabled.
In order to allow libc++ to detect this new behavior the value for `__has_builtin(__builtin_operator_new)` has been updated to `201802`.
Reviewers: rsmith, majnemer, aaron.ballman, erik.pilkington, bogner, ahatanak
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D43047
llvm-svn: 328134
source expressions when iterating over a PseudoObjectExpr's semantic
subexpression list.
Previously the loop in emitPseudoObjectExpr would emit the IR for each
OpaqueValueExpr that was in a PseudoObjectExpr's semantic-form
expression list and use the result when the OpaqueValueExpr later
appeared in other expressions. This caused an assertion failure when
AggExprEmitter tried to copy the result of an OpaqueValueExpr and the
copied type didn't have trivial copy/move constructors or assignment
operators.
This patch adds flag IsUnique to OpaqueValueExpr which indicates it is a
unique reference to its source expression (it is not used in multiple
places). The loop in emitPseudoObjectExpr ignores OpaqueValueExprs that
are unique and CodeGen visitors simply traverse the source expressions
of such OpaqueValueExprs.
rdar://problem/34363596
Differential Revision: https://reviews.llvm.org/D39562
llvm-svn: 327939
This patch uses the infrastructure added in r326307 for enabling
non-trivial fields to be declared in C structs to allow __weak fields in
C structs in ARC.
This recommits r327206, which was reverted because it caused
module-enabled builders to fail. I discovered that the
CXXRecordDecl::CanPassInRegisters flag wasn't being set correctly in
some cases after I moved it to RecordDecl.
Thanks to Eric Liu for helping me investigate the bug.
rdar://problem/33599681
https://reviews.llvm.org/D44095
llvm-svn: 327870
This patch uses the infrastructure added in r326307 for enabling
non-trivial fields to be declared in C structs to allow __weak fields in
C structs in ARC.
rdar://problem/33599681
Differential Revision: https://reviews.llvm.org/D44095
llvm-svn: 327206
The indirect function argument is in alloca address space in LLVM IR. However,
during Clang codegen for C++, the address space of indirect function argument
should match its address space in the source code, i.e., default addr space, even
for indirect argument. This is because destructor of the indirect argument may
be called in the caller function, and address of the indirect argument may be
taken, in either case the indirect function argument is expected to be in default
addr space, not the alloca address space.
Therefore, the indirect function argument should be mapped to the temp var
casted to default address space. The caller will cast it to alloca addr space
when passing it to the callee. In the callee, the argument is also casted to the
default address space and used.
CallArg is refactored to facilitate this fix.
Differential Revision: https://reviews.llvm.org/D34367
llvm-svn: 326946
We may emit incorrect lifetime info during codegen for loop counters in
OpenMP constructs because of automatic scope cleanup when we needed
temporarily locations for private loop counters.
llvm-svn: 326922
ARC mode.
Declaring __strong pointer fields in structs was not allowed in
Objective-C ARC until now because that would make the struct non-trivial
to default-initialize, copy/move, and destroy, which is not something C
was designed to do. This patch lifts that restriction.
Special functions for non-trivial C structs are synthesized that are
needed to default-initialize, copy/move, and destroy the structs and
manage the ownership of the objects the __strong pointer fields point
to. Non-trivial structs passed to functions are destructed in the callee
function.
rdar://problem/33599681
Differential Revision: https://reviews.llvm.org/D41228
llvm-svn: 326307
The following test case causes issue with codegen of __enqueue_block
void (^block)(void) = ^{ callee(id, out); };
enqueue_kernel(queue, 0, ndrange, block);
Clang first does codegen for block expression in the first line and deletes its block info.
Clang then tries to do codegen for the same block expression again for the second line,
and fails because the block info is gone.
The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
record llvm block invoke function and llvm block literal emitted for each AST block
expression, and use the recorded information for generating the wrapper kernel.
The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
of blocks.
Another minor issue is that some clean up AST expression is generated for block
with captures, which can be stripped by IgnoreImplicit.
Differential Revision: https://reviews.llvm.org/D43240
llvm-svn: 325264
Summary:
Fixes PR36247, which is where WinEHPrepare replaces inline asm in
funclets with unreachable.
Make getBundlesForFunclet return by value to simplify some call sites.
Reviewers: smeenai, majnemer
Subscribers: eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D43033
llvm-svn: 324689
Summary:
Previously, Clang only emitted label names in assert builds.
However there is a CC1 option -discard-value-names that should have been used to control emission instead.
This patch removes the NDEBUG preprocessor block and instead allows LLVM to handle removing the names in accordance with the option.
Reviewers: erichkeane, aaron.ballman, majnemer
Reviewed By: aaron.ballman
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D42829
llvm-svn: 324127
Summary:
This patch enables debugging of C99 VLA types by generating more precise
LLVM Debug metadata, using the extended DISubrange 'count' field that
takes a DIVariable.
This should implement:
Bug 30553: Debug info generated for arrays is not what GDB expects (not as good as GCC's)
https://bugs.llvm.org/show_bug.cgi?id=30553
Reviewers: echristo, aprantl, dexonsmith, clayborg, pcc, kristof.beyls, dblaikie
Reviewed By: aprantl
Subscribers: jholewinski, schweitz, davide, fhahn, JDevlieghere, cfe-commits
Differential Revision: https://reviews.llvm.org/D41698
llvm-svn: 323952
simd`.
Added host codegen + codegen for devices with default codegen for
`#pragma omp target teams distribute parallel for simd` directive.
llvm-svn: 322515
This alignment can be less than 4 on certain embedded targets, which may
not even be able to deal with 4-byte alignment on the stack.
Patch by Jacob Young!
llvm-svn: 322406
getAssociatedStmt() returns the outermost captured statement for the
OpenMP directive. It may return incorrect region in case of combined
constructs. Reworked the code to reduce the number of calls of
getAssociatedStmt() and used getInnermostCapturedStmt() and
getCapturedStmt() functions instead.
In case of firstprivate variables it may lead to an extra allocas
generation for private copies even if the variable is passed by value
into outlined function and could be used directly as private copy.
llvm-svn: 322393
GCC's attribute 'target', in addition to being an optimization hint,
also allows function multiversioning. We currently have the former
implemented, this is the latter's implementation.
This works by enabling functions with the same name/signature to coexist,
so that they can all be emitted. Multiversion state is stored in the
FunctionDecl itself, and SemaDecl manages the definitions.
Note that it ends up having to permit redefinition of functions so
that they can all be emitted. Additionally, all versions of the function
must be emitted, so this also manages that.
Note that this includes some additional rules that GCC does not, since
defining something as a MultiVersion function after a usage has been made illegal.
The only 'history rewriting' that happens is if a function is emitted before
it has been converted to a multiversion'ed function, at which point its name
needs to be changed.
Function templates and virtual functions are NOT yet supported (not supported
in GCC either).
Additionally, constructors/destructors are disallowed, but the former is
planned.
llvm-svn: 322028
only.
Added support for -fopenmp-simd option that allows compilation of
simd-based constructs without emission of OpenMP runtime calls.
llvm-svn: 321560
...when such an operation is done on an object during con-/destruction.
This is the cfe part of a patch covering both cfe and compiler-rt.
Differential Revision: https://reviews.llvm.org/D40295
llvm-svn: 321519
Diagnose 'unreachable' UB when a noreturn function returns.
1. Insert a check at the end of functions marked noreturn.
2. A decl may be marked noreturn in the caller TU, but not marked in
the TU where it's defined. To diagnose this scenario, strip away the
noreturn attribute on the callee and insert check after calls to it.
Testing: check-clang, check-ubsan, check-ubsan-minimal, D40700
rdar://33660464
Differential Revision: https://reviews.llvm.org/D40698
llvm-svn: 321231
Summary:
The -fxray-always-emit-customevents flag instructs clang to always emit
the LLVM IR for calls to the `__xray_customevent(...)` built-in
function. The default behaviour currently respects whether the function
has an `[[clang::xray_never_instrument]]` attribute, and thus not lower
the appropriate IR code for the custom event built-in.
This change allows users calling through to the
`__xray_customevent(...)` built-in to always see those calls lowered to
the corresponding LLVM IR to lay down instrumentation points for these
custom event calls.
Using this flag enables us to emit even just the user-provided custom
events even while never instrumenting the start/end of the function
where they appear. This is useful in cases where "phase markers" using
__xray_customevent(...) can have very few instructions, must never be
instrumented when entered/exited.
Reviewers: rnk, dblaikie, kpw
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D40601
llvm-svn: 319388
This updates -mcount to use the new attribute names (LLVM r318195), and
switches over -finstrument-functions to also use these attributes rather
than inserting instrumentation in the frontend.
It also adds a new flag, -finstrument-functions-after-inlining, which
makes the cygprofile instrumentation get inserted after inlining rather
than before.
Differential Revision: https://reviews.llvm.org/D39331
llvm-svn: 318199
Summary:
We don't want to store cleanup dest slot saved into the coroutine frame (as some of the cleanup code may
access them after coroutine frame destroyed).
This is an alternative to https://reviews.llvm.org/D37093
It is possible to do this for all functions, but, cursory check showed that in -O0, we get slightly longer function (by 1-3 instructions), thus, we are only limiting cleanup.dest.slot elimination to coroutines.
Reviewers: rjmccall, hfinkel, eric_niebler
Reviewed By: eric_niebler
Subscribers: EricWF, cfe-commits
Differential Revision: https://reviews.llvm.org/D39768
llvm-svn: 317981
This patch fixes various places in clang to propagate may-alias
TBAA access descriptors during construction of lvalues, thus
eliminating the need for the LValueBaseInfo::MayAlias flag.
This is part of D38126 reworked to be a separate patch to
simplify review.
Differential Revision: https://reviews.llvm.org/D39008
llvm-svn: 316988
This patch addresses the rest of the cases where we pass lvalue
base info, but do not provide corresponding TBAA info.
This patch should not bring in any functional changes.
This is part of D38126 reworked to be a separate patch to make
reviewing easier.
Differential Revision: https://reviews.llvm.org/D38945
llvm-svn: 315986
In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.
The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.
This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.
Differential Revision: https://reviews.llvm.org/D38134
llvm-svn: 315804