This patch is to implement sema and parsing for 'target parallel for simd' pragma.
Differential Revision: http://reviews.llvm.org/D22096
llvm-svn: 275365
-fxray-instrument: enables XRay annotation of IR
-fxray-instruction-threshold: configures the threshold for function size (looking at IR instructions), and allow LLVM to decide whether to add the nop sleds later on in the process.
Also implements the related xray_always_instrument and xray_never_instrument function attributes.
Patch by Dean Michael Berris.
llvm-svn: 275330
Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute simd'.
Differential Revision: http://reviews.llvm.org/D22007
llvm-svn: 274604
Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute parallel for simd'.
Differential Revision: http://reviews.llvm.org/D21977
llvm-svn: 274530
With all MaterializeTemporaryExprs coming with a ExprWithCleanups, it's
easy to add correct lifetime.end marks into the right RunCleanupsScope.
Differential Revision: http://reviews.llvm.org/D20499
llvm-svn: 274385
Replace inheriting constructors implementation with new approach, voted into
C++ last year as a DR against C++11.
Instead of synthesizing a set of derived class constructors for each inherited
base class constructor, we make the constructors of the base class visible to
constructor lookup in the derived class, using the normal rules for
using-declarations.
For constructors, UsingShadowDecl now has a ConstructorUsingShadowDecl derived
class that tracks the requisite additional information. We create shadow
constructors (not found by name lookup) in the derived class to model the
actual initialization, and have a new expression node,
CXXInheritedCtorInitExpr, to model the initialization of a base class from such
a constructor. (This initialization is special because it performs real perfect
forwarding of arguments.)
In cases where argument forwarding is not possible (for inalloca calls,
variadic calls, and calls with callee parameter cleanup), the shadow inheriting
constructor is not emitted and instead we directly emit the initialization code
into the caller of the inherited constructor.
Note that this new model is not perfectly compatible with the old model in some
corner cases. In particular:
* if B inherits a private constructor from A, and C uses that constructor to
construct a B, then we previously required that A befriends B and B
befriends C, but the new rules require A to befriend C directly, and
* if a derived class has its own constructors (and so its implicit default
constructor is suppressed), it may still inherit a default constructor from
a base class
llvm-svn: 274049
[OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for'
This patch is an initial implementation for #distribute parallel for.
The main differences that affect other pragmas are:
The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds.
As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value.
As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound.
llvm-svn: 273884
http://reviews.llvm.org/D21564
This patch is an initial implementation for #distribute parallel for.
The main differences that affect other pragmas are:
The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds.
As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value.
As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound.
llvm-svn: 273705
Summary:
This patch fixes an issue detected when firstprivate variables are passed to an OpenMP outlined function vararg list. Currently they are not compatible with what the runtime library expects causing malfunction in some targets.
This patch fixes the issue by moving the casting logic already in place for offloading to the common code that creates the outline function and arguments and updates the regression tests accordingly.
Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev
Subscribers: cfe-commits, caomhin
Differential Revision: http://reviews.llvm.org/D21150
llvm-svn: 272900
Summary:
This patch is to add parsing and sema support for `target update` directive. Support for the `to` and `from` clauses will be added by a different patch. This patch also adds support for other clauses that are already implemented upstream and apply to `target update`, e.g. `device` and `if`.
This patch is based on the original post by Kelvin Li.
Reviewers: hfinkel, carlo.bertolli, kkwli0, arpith-jacob, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: http://reviews.llvm.org/D15944
llvm-svn: 270878
Underaligned atomic LValues require libcalls which MSVC doesn't have.
MSVC doesn't seem to consider such operations as requiring a barrier
anyway.
This fixes PR27843.
llvm-svn: 270576
For better performance and to unify code with offloading part we pass
scalar firstprivate values by value, instead of by reference. It will
remove some extra copying operations.
llvm-svn: 269751
schedule modifiers.
Runtime library expects some additional data in schedule argument for
loop-based directives, that have additional schedule modifiers
'monotonic|nonmonotonic'.
llvm-svn: 269035
Currently there is a problem with codegen of inlined directives inside
lambdas, it may cause a crash during codegen because of incorrect
capturing of variables. Patch fixes this problem.
llvm-svn: 267677
The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed.
The next code will be generated for the taskloop directive:
#pragma omp taskloop num_tasks(N) lastprivate(j)
for( i=0; i<N*GRAIN*STRIDE-1; i+=STRIDE ) {
int th = omp_get_thread_num();
#pragma omp atomic
counter++;
#pragma omp atomic
th_counter[th]++;
j = i;
}
Generated code:
task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct
task),sizeof(struct shar),&task_entry);
psh = task->shareds;
psh->pth_counter = &th_counter;
psh->pcounter = &counter;
psh->pj = &j;
task->lb = 0;
task->ub = N*GRAIN*STRIDE-2;
task->st = STRIDE;
__kmpc_taskloop(
NULL, // location
gtid, // gtid
task, // task structure
1, // if clause value
&task->lb, // lower bound
&task->ub, // upper bound
STRIDE, // loop increment
0, // 1 if nogroup specified
2, // schedule type: 0-none, 1-grainsize, 2-num_tasks
N, // schedule value (ignored for type 0)
(void*)&__task_dup_entry // tasks duplication routine
);
llvm-svn: 267395
If loop control variable for simd-based directives is explicitly marked
as linear/lastprivate in clauses, codegen for such construct would
crash. Patch fixes this problem.
llvm-svn: 267101
Revert the two changes to thread CodeGenOptions into the TargetInfo allocation
and to fix the layering violation by moving CodeGenOptions into Basic.
Code Generation is arguably not particularly "basic". This addresses Richard's
post-commit review comments. This change purely does the mechanical revert and
will be followed up with an alternate approach to thread the desired information
into TargetInfo.
llvm-svn: 265806
This is a mechanical move of CodeGenOptions from libFrontend to libBasic. This
fixes the layering violation introduced earlier by threading CodeGenOptions into
TargetInfo. It should also fix the modules based self-hosting builds. NFC.
llvm-svn: 265702
Summary: See LLVM change D18775 for details, this change depends on it.
Reviewers: jyknight, reames
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18776
llvm-svn: 265569
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264700
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264576
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264569
This reverts commit r263607.
This change caused more objc_retain/objc_release calls in the IR but those
are then incorrectly optimized by the ARC optimizer. Work is going to have
to be done to ensure the ARC optimizer doesn't optimize user written RR, but
that should land before this change.
This change will also need to be updated to take account for any changes required
to ensure that user written calls to RR are distinct from those inserted by ARC.
llvm-svn: 263984
It is faster to directly call the ObjC runtime for methods such as retain/release instead of sending a message to those functions.
This patch adds support for converting messages to retain/release/alloc/autorelease to their equivalent runtime calls.
Tests included for the positive case of applying this transformation, negative tests that we ensure we only convert "alloc" to objc_alloc, not "alloc2", and also a driver test to ensure we enable this only for supported runtime versions.
Reviewed by John McCall.
Differential Revision: http://reviews.llvm.org/D14737
llvm-svn: 263607
OpenMP 4.5 allows privatization of non-static data members in OpenMP
constructs. Patch adds proper codegen support for data members in
'linear' clause
llvm-svn: 263003
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262832
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262741
We'd lose track of the parent CodeGenFunction, leading us to get
confused with regard to which function a nested finally belonged to.
Differential Revision: http://reviews.llvm.org/D17752
llvm-svn: 262379
This patch introduces the -fwhole-program-vtables flag, which enables the
whole-program vtable optimization feature (D16795) in Clang.
Differential Revision: http://reviews.llvm.org/D16821
llvm-svn: 261767
Expressions inside 'schedule'|'dist_schedule' clause must be captured in
combined directives to avoid possible crash during codegen. Patch
improves handling of such constructs
llvm-svn: 260954
This patch changes cc1 option -fprofile-instr-generate to an enum option
-fprofile-instrument={clang|none}. It also changes cc1 options
-fprofile-instr-generate= to -fprofile-instrument-path=.
The driver level option -fprofile-instr-generate and -fprofile-instr-generate=
remain intact. This change will pave the way to integrate new PGO
instrumentation in IR level.
Review: http://reviews.llvm.org/D16730
llvm-svn: 259811
Codegen for array sections/array subscripts worked only for expressions with arrays as base. Patch fixes codegen for bases with pointer/reference types.
llvm-svn: 259776
Summary:
This patch adds parsing + sema for the target parallel for directive along with testcases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16759
llvm-svn: 259654
reclaiming a call result in order to ignore it or assign it
to an __unsafe_unretained variable. This avoids adding
an unwanted retain/release pair when the return value is
not actually returned autoreleased (e.g. when it is returned
from a nonatomic getter or a typical collection accessor).
This runtime function is only available on the latest Apple
OS releases; the backwards-compatibility story is that you
don't get the optimization unless your deployment target is
recent enough. Sorry.
rdar://20530049
llvm-svn: 258962
Summary:
This patch adds parsing + sema for the target parallel directive and its clauses along with testcases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16553
Rebased to current trunk and updated test cases.
llvm-svn: 258832
* Runtime diagnostic data for cfi-icall changed to match the rest of
cfi checks
* Layout of all CFI diagnostic data changed to put Kind at the
beginning. There is no ABI stability promise yet.
* Call cfi_slowpath_diag instead of cfi_slowpath when needed.
* Emit __cfi_check_fail function, which dispatches a CFI check
faliure according to trap/recover settings of the current module.
* A tiny driver change to match the way the new handlers are done in
compiler-rt.
llvm-svn: 258745
This is part of a new statistics gathering feature for the sanitizers.
See clang/docs/SanitizerStats.rst for further info and docs.
Differential Revision: http://reviews.llvm.org/D16175
llvm-svn: 257971
Clang-side cross-DSO CFI.
* Adds a command line flag -f[no-]sanitize-cfi-cross-dso.
* Links a runtime library when enabled.
* Emits __cfi_slowpath calls is bitset test fails.
* Emits extra hash-based bitsets for external CFI checks.
* Sets a module flag to enable __cfi_check generation during LTO.
This mode does not yet support diagnostics.
llvm-svn: 255694
`pass_object_size` is our way of enabling `__builtin_object_size` to
produce high quality results without requiring inlining to happen
everywhere.
A link to the design doc for this attribute is available at the
Differential review link below.
Differential Revision: http://reviews.llvm.org/D13263
llvm-svn: 254554
Summary:
This patch implements the 4.5 specification for the implicit data maps. OpenMP 4.5 specification changes the default way data is captured into a target region. All the non-aggregate kinds are passed by value by default. This required activating the capturing by value during SEMA for the target region. All the non-aggregate values that can be encoded in the size of a pointer are properly casted and forwarded to the runtime library. On top of fixing the previous weird behavior for mapping pointers in nested data regions (an explicit map was always required), this also improves performance as the number of allocations/transactions to the device per non-aggregate map are reduced from two to only one - instead of passing a reference and the value, only the value passed.
Explicit maps will be added later on once firstprivate, private, and map clauses' SEMA and parsing are available.
Reviewers: hfinkel, rjmccall, ABataev
Subscribers: cfe-commits, carlo.bertolli
Differential Revision: http://reviews.llvm.org/D14940
llvm-svn: 254521
This patch changes the generation of CGFunctionInfo to contain
the FunctionProtoType if it is available. This enables the code
generation for call instructions to look into this type for
exception information and therefore generate better quality
IR - it will not create invoke instructions for functions that
are know not to throw.
llvm-svn: 253926
When a struct's size is not a power of 2, the corresponding _Atomic() type is
promoted to the nearest. We already correctly handled normal C++ expressions of
this form, but direct calls to the __c11_atomic_whatever builtins ended up
performing dodgy operations on the smaller non-atomic types (e.g. memcpy too
much). Later optimisations removed this as undefined behaviour.
This patch converts EmitAtomicExpr to allocate its temporaries at the full
atomic width, sidestepping the issue.
llvm-svn: 252507
This works around PR25162. The MSVC tables make it very difficult to
correctly inline a C++ destructor that contains try / catch. We've
attempted to address PR25162 in LLVM's backend, but it feels pretty
infeasible. MSVC and ICC both appear to avoid inlining such complex
destructors.
Long term, we want to fix this by making the inliner smart enough to
know when it is inlining into a cleanup, so it can inline simple
destructors (~unique_ptr and ~vector) while avoiding destructors
containing try / catch.
llvm-svn: 251576
match the feature set of the function that they're being called from.
This ensures that we can effectively diagnose some[1] code that would
instead ICE in the backend with a failure to select message.
Example:
__m128d foo(__m128d a, __m128d b) {
return __builtin_ia32_addsubps(b, a);
}
compiled for normal x86_64 via:
clang -target x86_64-linux-gnu -c
would fail to compile in the back end because the normal subtarget
features for x86_64 only include sse2 and the builtin requires sse3.
[1] We're still not erroring on:
__m128i bar(__m128i const *p) { return _mm_lddqu_si128(p); }
where we should fail and error on an always_inline function being
inlined into a function that doesn't support the subtarget features
required.
llvm-svn: 250473
This patch implements the outlining for offloading functions for code
annotated with the OpenMP target directive. It uses a temporary naming
of the outlined functions that will have to be updated later on once
target side codegen and registration of offloading libraries is
implemented - the naming needs to be made unique in the produced
library.
llvm-svn: 249148
Summary:
This change adds support for `__builtin_ms_va_list`, a GCC extension for
variadic `ms_abi` functions. The existing `__builtin_va_list` support is
inadequate for this because `va_list` is defined differently in the Win64
ABI vs. the System V/AMD64 ABI.
Depends on D1622.
Reviewers: rsmith, rnk, rjmccall
CC: cfe-commits
Differential Revision: http://reviews.llvm.org/D1623
llvm-svn: 247941
It seems that there is small bug, and we can't generate assume loads
when some virtual functions have internal visibiliy
This reverts commit 982bb7d966947812d216489b3c519c9825cacbf2.
llvm-svn: 247332
Currently all variables used in OpenMP regions are captured into a record and passed to outlined functions in this record. It may result in some poor performance because of too complex analysis later in optimization passes. Patch makes to emit outlined functions for parallel-based regions with a list of captured variables. It reduces code for 2*n GEPs, stores and loads at least.
Codegen for task-based regions remains unchanged because runtime requires that all captured variables are passed in captured record.
llvm-svn: 247251
Generating call assume(icmp %vtable, %global_vtable) after constructor
call for devirtualization purposes.
For more info go to:
http://lists.llvm.org/pipermail/cfe-dev/2015-July/044227.html
Edit:
Fixed version because of PR24479.
After this patch got reverted because of ScalarEvolution bug (D12719)
Merged after John McCall big patch (Added Address).
http://reviews.llvm.org/D11859
llvm-svn: 247199
Summary:
Currently clang provides no general way to generate nontemporal loads/stores.
There are some architecture specific builtins for doing so (e.g. in x86), but
there is no way to generate non-temporal store on, e.g. AArch64. This patch adds
generic builtins which are expanded to a simple store with '!nontemporal'
attribute in IR.
Differential Revision: http://reviews.llvm.org/D12313
llvm-svn: 247104
Introduce an Address type to bundle a pointer value with an
alignment. Introduce APIs on CGBuilderTy to work with Address
values. Change core APIs on CGF/CGM to traffic in Address where
appropriate. Require alignments to be non-zero. Update a ton
of code to compute and propagate alignment information.
As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment
helper function to CGF and made use of it in a number of places in
the expression emitter.
The end result is that we should now be significantly more correct
when performing operations on objects that are locally known to
be under-aligned. Since alignment is not reliably tracked in the
type system, there are inherent limits to this, but at least we
are no longer confused by standard operations like derived-to-base
conversions and array-to-pointer decay. I've also fixed a large
number of bugs where we were applying the complete-object alignment
to a pointer instead of the non-virtual alignment, although most of
these were hidden by the very conservative approach we took with
member alignment.
Also, because IRGen now reliably asserts on zero alignments, we
should no longer be subject to an absurd but frustrating recurring
bug where an incomplete type would report a zero alignment and then
we'd naively do a alignmentAtOffset on it and emit code using an
alignment equal to the largest power-of-two factor of the offset.
We should also now be emitting much more aggressive alignment
attributes in the presence of over-alignment. In particular,
field access now uses alignmentAtOffset instead of min.
Several times in this patch, I had to change the existing
code-generation pattern in order to more effectively use
the Address APIs. For the most part, this seems to be a strict
improvement, like doing pointer arithmetic with GEPs instead of
ptrtoint. That said, I've tried very hard to not change semantics,
but it is likely that I've failed in a few places, for which I
apologize.
ABIArgInfo now always carries the assumed alignment of indirect and
indirect byval arguments. In order to cut down on what was already
a dauntingly large patch, I changed the code to never set align
attributes in the IR on non-byval indirect arguments. That is,
we still generate code which assumes that indirect arguments have
the given alignment, but we don't express this information to the
backend except where it's semantically required (i.e. on byvals).
This is likely a minor regression for those targets that did provide
this information, but it'll be trivial to add it back in a later
patch.
I partially punted on applying this work to CGBuiltin. Please
do not add more uses of the CreateDefaultAligned{Load,Store}
APIs; they will be going away eventually.
llvm-svn: 246985
Fix processing of shared variables with reference types in OpenMP constructs. Previously, if the variable was not marked in one of the private clauses, the reference to this variable was emitted incorrectly and caused an assertion later.
llvm-svn: 246846
This implements basic support for compiling (though not yet assembling
or linking) for a WebAssembly target. Note that ABI details are not yet
finalized, and may change.
Differential Revision: http://reviews.llvm.org/D12002
llvm-svn: 246814
Added codegen for array section in 'depend' clause of 'task' directive. It emits to pointers, one for the begin of array section and another for the end of array section. Size of the section is calculated as (end + 1 - start) * sizeof(basic_element_type).
llvm-svn: 246422
Added codegen for array section in 'depend' clause of 'task' directive. It emits to pointers, one for the begin of array section and another for the end of array section. Size of the section is calculated as (end + 1 - start) * sizeof(basic_element_type).
llvm-svn: 246278
Summary:
float_cast_overflow is the only UBSan check without a source location attached.
This patch propagates SourceLocations where necessary to get them to the
EmitCheck() call.
Reviewers: rsmith, ABataev, rjmccall
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D11757
llvm-svn: 244568
The new EH instructions make it possible for LLVM to generate .xdata
tables that the MSVC personality routines will be happy about. Because
this is experimental, hide it behind a -cc1 flag (-fnew-ms-eh).
Differential Revision: http://reviews.llvm.org/D11405
llvm-svn: 243767