Need to call getRawStmt() function instead, when trying to get inner
associated statement for the executable directive. Not all directives
use captured statements.
In untied tasks, need to allocate the space for local variales, declared
in task region, when the memory for task data is allocated. THe function
can be interrupted and we can exit from the function in untied task
switch. Need to keep the state of the local variables in this case.
Also, the compiler should not call cleanup when exiting in untied task
switch until the real exit out of the declaration scope is met during
execution.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84457
If the arguments are mapped, but are actually not used in the target
region, the compiler still adds attribute TGT_OMP_TARGET_PARAM for such
arguments. It makes the libomptarget to add such parameters to the list
of arguments, passed to the kernel at the runtime, and may lead to
incorrect results/crashes during execution.
Differential Revision: https://reviews.llvm.org/D85755
Summary:
In untied tasks, need to allocate the space for local variales, declared
in task region, when the memory for task data is allocated. THe function
can be interrupted and we can exit from the function in untied task
switch. Need to keep the state of the local variables in this case.
Also, the compiler should not call cleanup when exiting in untied task
switch until the real exit out of the declaration scope is met during
execution.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D84457
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D80735
Without this patch, the following example fails but shouldn't
according to OpenMP TR8:
```
#pragma omp target enter data map(alloc:i)
#pragma omp target data map(present, alloc: i)
{
#pragma omp target exit data map(delete:i)
} // fails presence check here
```
OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states:
> If the map clause appears on a target, target data, target enter
> data or target exit data construct with a present map-type-modifier
> then on entry to the region if the corresponding list item does not
> appear in the device data environment an error occurs and the
> program terminates.
There is no corresponding statement about the exit from a region.
Thus, the `present` modifier should:
1. Check for presence upon entry into any region, including a `target
exit data` region. This behavior is already implemented correctly.
2. Should not check for presence upon exit from any region, including
a `target` or `target data` region. Without this patch, this
behavior is not implemented correctly, breaking the above example.
In the case of `target data`, this patch fixes the latter behavior by
removing the `present` modifier from the map types Clang generates for
the runtime call at the end of the region.
In the case of `target`, we have not found a valid OpenMP program for
which such a fix would matter. It appears that, if a program can
guarantee that data is present at the beginning of a `target` region
so that there's no error there, that data is also guaranteed to be
present at the end. This patch adds a comment to the runtime to
document this case.
Reviewed By: grokos, RaviNarayanaswamy, ABataev
Differential Revision: https://reviews.llvm.org/D84422
When we use the OpenMPIRBuilder for the parallel region we need to also
use it to get the thread ID (among other things) in the body. This is
because CGOpenMPRuntime::getThreadID() and
CGOpenMPRuntime::emitUpdateLocation implicitly assumes that if they are
called from within a parallel region there is a certain structure to the
code and certain members of the OMPRegionInfo are initialized. It might
make sense to initialize them even if we use the OpenMPIRBuilder but we
would preferably get rid of such state instead.
Bug reported by Anchu Rajendran Sudhakumari.
Depends on D82470.
Reviewed By: anchu-rajendran
Differential Revision: https://reviews.llvm.org/D82822
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D84767
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.
Differential Revision: https://reviews.llvm.org/D84767
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
next patch in this series implements OpenMP runtime support.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84711
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
next patch in this series implements OpenMP runtime support.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84711
This patch implements Clang front end support for the OpenMP TR8
`present` map type modifier. The next patch in this series implements
OpenMP runtime support.
This patch does not attempt to implement TR8 sec. 2.22.7.1 "map
Clause", p. 319, L14-16:
> If a map clause with a present map-type-modifier is present in a map
> clause, then the effect of the clause is ordered before all other
> map clauses that do not have the present modifier.
Compare to L10-11, which Clang does not appear to implement yet:
> For a given construct, the effect of a map clause with the to, from,
> or tofrom map-type is ordered before the effect of a map clause with
> the alloc, release, or delete map-type.
This patch also does not implement the `present` implicit-behavior for
`defaultmap` or the `present` motion-modifier for `target update`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D83061
Summary:
Need to avoid an optimization for base pointer mapping for target data
directives.
Reviewers: jdoerfert, ye-luo
Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D84182
This patch fixes the compilation warnings that L is not a reference.
Thanks to Lingda Li for providing the patch.
Differential Revision: https://reviews.llvm.org/D83959
This patch implements the code generation to use OpenMP 5.0 declare mapper (a.k.a. user-defined mapper) constructs.
Patch written by Lingda Li.
Differential Revision: https://reviews.llvm.org/D67833
Summary:
If user-defined reductions with the initializer are used with classes,
the compiler misses the constructor call when trying to create a private
copy of the reduction variable.
Reviewers: jdoerfert
Subscribers: cfe-commits, yaxunl, guansong, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83334
Summary:
D82193 exposed a problem with global type definitions in
`OMPConstants.h`. This causes a race when running in thinLTO mode.
Types now live inside of OpenMPIRBuilder to prevent this from happening.
Reviewers: jdoerfert
Subscribers: yaxunl, hiraditya, guansong, dexonsmith, aaron.ballman, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D83176
Summary:
This patch is removing the custom enumeration for OpenMP Directives and Clauses and replace them
with the newly tablegen generated one from llvm/Frontend. This is a first patch and some will follow to share the same
infrastructure where possible. The next patch should use the clauses allowance defined in the tablegen file.
Reviewers: jdoerfert, DavidTruby, sscalpone, kiranchandramohan, ichoyjx
Reviewed By: DavidTruby, ichoyjx
Subscribers: jholewinski, cfe-commits, dblaikie, MaskRay, ymandel, ichoyjx, mgorny, yaxunl, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #flang, #clang
Differential Revision: https://reviews.llvm.org/D82906
1. Provides no piroirity supoort && disables three priority related
attributes: init_priority, ctor attr, dtor attr;
2. '-qunique' in XL compiler equivalent behavior of emitting sinit
and sterm functions name using getUniqueModuleId() util function
in LLVM (currently no support for InternalLinkage and WeakODRLinkage
symbols);
3. Add testcases to emit IR sample with __sinit80000000, __dtor, and
__sterm80000000;
4. Temporarily side-steps the need to implement the functionality of
llvm.global_ctors and llvm.global_dtors arrays. The uses of that
functionality in this patch (with respect to the name of the functions
involved) are not representative of how the functionality will be used
once implemented.
Differential Revision: https://reviews.llvm.org/D74166
Summary:
Added codegen for use_device_addr clause. The components of the list
items are mapped as a kind of RETURN components and then the returned
base address is used instead of the real address of the base declaration
used in the use_device_addr expressions.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D80730
Summary:
If the data member is mapped as an array section, need to emit the
pointer to the last element of this array section and use this pointer
as the highest element in partial struct data.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81037
Summary:
Added initial codegen for 'affinity' clauses on task directives.
Emits next code:
```
kmp_task_affinity_info_t affs[<num_elems>];
void *td = __kmpc_task_alloc(..);
affs[<i>].base = &data_i;
affs[<i>].size = sizeof(data_i);
__kmpc_omp_reg_task_with_affinity(&loc, <gtid>, td, <num_elems>, affs);
```
The result returned by the call of `__kmpc_omp_reg_task_with_affinity`
function is ignored currently sincethe runtime currently ignores args
and returns 0 uncoditionally.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, sstefan1, llvm-commits, cfe-commits, caomhin
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80240
Summary: This changes Clang's generation of OpenMP runtime functions to use the types and functions defined in OpenMPKinds and OpenMPConstants. New OpenMP runtime function information should now be added to OMPKinds.def. This patch also changed the definitions of __kmpc_push_num_teams and __kmpc_copyprivate to match those found in the runtime.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: jfb, AndreyChurbanov, openmp-commits, fghanim, hiraditya, sstefan1, cfe-commits, llvm-commits
Tags: #openmp, #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80222
Summary:
No need to generate inlined OpenMP region for variables captured in
lambdas or block decls, only for implicitly captured variables in the
OpenMP region.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79966
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.
See also D80072.
Differential Revision: https://reviews.llvm.org/D80166
Summary:
Predefined allocators should not be mapped at all (they are just enumeric
constants). FOr user-defined allocators need to map the traits only as
firstprivates, the allocator itself is private.
At the beginning of the target region the user-defined allocatores must
be created and then destroyed at the end of the target region:
```
omp_allocator_handle_t my_allocator = __kmpc_init_allocator(<gtid>,
/*default memhandle*/ 0, <number_of_traits>, &<traits>);
...
call void @__kmpc_destroy_allocator(<gtid>, my_allocator);
```
Reviewers: jdoerfert, aaron.ballman
Subscribers: jholewinski, yaxunl, guansong, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79257
Summary:
omp.h header file defines omp_null_allocator as a predefined allocator,
need to consider it also as a predefined allocator.
Reviewers: jdoerfert
Subscribers: jholewinski, yaxunl, guansong, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D79186
Summary:
This change fixes an aarch64-specific bug in the generation of the NDS and WDS values used to compute the signature of the vector functions out of OpenMP directives like `declare simd`. When the directive is used in conjunction with the `linear` clause, the size of the pointee must be used instead of the size of the pointer to compute NDS and WDS.
The code-fix is strictly related to the behavior for `linear`, but given that the only way we have to test the NDS and WDS values is to check the resulting `<vlen>` token in the mangled name of the vector function, the tests have been extended to cover all the possible values of WDS and NDS as defined in the ABI at https://github.com/ARM-software/abi-aa/tree/master/vfabia64.
Reviewers: ABataev, jdoerfert, andwar
Reviewed By: jdoerfert
Subscribers: yaxunl, kristof.beyls, guansong, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78969
Summary:
The linear parameter token in the mangling function must be multiplied
by the pointee size in bytes when the parameter is a pointer.
Reviewers: ABataev, andwar, jdoerfert
Subscribers: yaxunl, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78965
Summary:
Patch forces codegen to use the new runtime functions for task reductions where
the issue with passing the address of the original variables to the UDR
initializers is fixed. Also, this patch is required for upcoming
support of task modifier inreduction clause.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, cfe-commits, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D78733
Implemented codegen for the iterator expression in the depend clauses.
Iterator construct is emitted the following way:
iterator(cnt1, cnt2, ...), in : <dep>
<TotalNumDeps> = <cnt1_size> * <cnt2_size> * ...;
kmp_depend_t deps[<TotalNumDeps>];
deps_counter = 0;
for (cnt1) {
for (cnt2) {
...
deps[deps_counter].base_addr = &<dep>;
deps[deps_counter].size = sizeof(<dep>);
deps[deps_counter].flags = in;
deps_counter += 1;
...
}
}
For depobj construct the codegen is very similar, but the memory is
allocated dynamically and added extra first item reserved for internal use.
clauses.
Implemented codegen for array shaping operation in depend clauses. The
begin of the expression is the pointer itself, while the size of the
dependence data is the mukltiplacation of all dimensions in the array
shaping expression.
This is the second part loosely extracted from D71179 and cleaned up.
This patch provides semantic analysis support for `omp begin/end declare
variant`, mostly as defined in OpenMP technical report 8 (TR8) [0].
The sema handling makes code generation obsolete as we generate "the
right" calls that can just be handled as usual. This handling also
applies to the existing, albeit problematic, `omp declare variant
support`. As a consequence a lot of unneeded code generation and
complexity is removed.
A major purpose of this patch is to provide proper `math.h`/`cmath`
support for OpenMP target offloading. See PR42061, PR42798, PR42799. The
current code was developed with this feature in mind, see [1].
The logic is as follows:
If we have seen a `#pragma omp begin declare variant match(<SELECTOR>)`
but not the corresponding `end declare variant`, and we find a function
definition we will:
1) Create a function declaration for the definition we were about to generate.
2) Create a function definition but with a mangled name (according to
`<SELECTOR>`).
3) Annotate the declaration with the `OMPDeclareVariantAttr`, the same
one used already for `omp declare variant`, using and the mangled
function definition as specialization for the context defined by
`<SELECTOR>`.
When a call is created we inspect it. If the target has an
`OMPDeclareVariantAttr` attribute we try to specialize the call. To this
end, all variants are checked, the best applicable one is picked and a
new call to the specialization is created. The new call is used instead
of the original one to the base function. To keep the AST printing and
tooling possible we utilize the PseudoObjectExpr. The original call is
the syntactic expression, the specialized call is the semantic
expression.
[0] https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf
[1] https://reviews.llvm.org/D61399#change-496lQkg0mhRN
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim, aaron.ballman
Subscribers: bollu, guansong, openmp-commits, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75779
This is the first part extracted from D71179 and cleaned up.
This patch provides parsing support for `omp begin/end declare variant`,
as defined in OpenMP technical report 8 (TR8) [0].
A major purpose of this patch is to provide proper math.h/cmath support
for OpenMP target offloading. See PR42061, PR42798, PR42799. The current
code was developed with this feature in mind, see [1].
[0] https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf
[1] https://reviews.llvm.org/D61399#change-496lQkg0mhRN
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D74941
If the ancestor device modifier is used and the value of the device
clause is evaluated to 1, the ancestor device shall be used for the
execution.
Since the reverse offloading is not supported yet, the target construct
execution is always initiated from the host, not from the device. So, if
the ancestor modifier is specified, just execute target region on the
host.
Avoid copying of the orignal variable if it is going to be marked as
firstprivate in task regions. For taskloops, still need to copy the
non-trvially copyable variables to correctly construct them upon task
creation.
Most clients of SourceManager.h need to do things like turning source
locations into file & line number pairs, but this doesn't require
bringing in FileManager.h and LLVM's FS headers.
The main code change here is to sink SM::createFileID into the cpp file.
I reason that this is not performance critical because it doesn't happen
on the diagnostic path, it happens along the paths of macro expansion
(could be hot) and new includes (less hot).
Saves some includes:
309 - /usr/local/google/home/rnk/llvm-project/clang/include/clang/Basic/FileManager.h
272 - /usr/local/google/home/rnk/llvm-project/clang/include/clang/Basic/FileSystemOptions.h
271 - /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/Support/VirtualFileSystem.h
267 - /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/Support/FileSystem.h
266 - /usr/local/google/home/rnk/llvm-project/llvm/include/llvm/Support/Chrono.h
Differential Revision: https://reviews.llvm.org/D75406
Added codegen for update clause in depobj. Reads the number of the
elements from the first element and updates flags for each element in
the loop.
```
omp_depend_t x;
kmp_depend_info *base = (kmp_depend_info *)x;
intptr_t num = x[-1].base_addr;
kmp_depend_info *end = x + num;
kmp_depend_info *el = base;
do {
el.flags = new_flag;
el = &el[1];
} while (el != end);
```
in depobj object.
The first element in the list of the dependencies is used for internal
purposes to store the number of the elements in the provided list.
The first element now is skipped and depobj object poits exactly to the
list of dependencies.
Added codegen for 'depend' clause in depobj directive. The depend clause
is emitted as kmp_depend_info <deps>[<number_of_items_in_clause> + 1]. The
first element in this array is reserved for storing the number of
elements in this array: <deps>[0].base_addr =
<number_of_items_in_clause>;
This extra element is required to implement 'update' and 'destroy'
clauses. It is required to know the size of array to destroy it
correctly and to update depency kind.
Chen
Summary:
Base declaration in pointer arithmetic expression is determined by
binary search with type information. Take "int *a, *b; *(a+*b)" as an
example, we determine the base by checking the type of LHS and RHS. In
this case the type of LHS is "int *", the type of RHS is "int",
therefore, we know that we need to visit LHS in order to find base
declaration.
Reviewers: ABataev, jdoerfert
Reviewed By: ABataev
Subscribers: guansong, cfe-commits, sandoval, dreachem
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75077
This patch implements an almost complete handling of OpenMP
contexts/traits such that we can reuse most of the logic in Flang
through the OMPContext.{h,cpp} in llvm/Frontend/OpenMP.
All but construct SIMD specifiers, e.g., inbranch, and the device ISA
selector are define in `llvm/lib/Frontend/OpenMP/OMPKinds.def`. From
these definitions we generate the enum classes `TraitSet`,
`TraitSelector`, and `TraitProperty` as well as conversion and helper
functions in `llvm/lib/Frontend/OpenMP/OMPContext.{h,cpp}`.
The above enum classes are used in the parser, sema, and the AST
attribute. The latter is not a collection of multiple primitive variant
arguments that contain encodings via numbers and strings but instead a
tree that mirrors the `match` clause (see `struct OpenMPTraitInfo`).
The changes to the parser make it more forgiving when wrong syntax is
read and they also resulted in more specialized diagnostics. The tests
are updated and the core issues are detected as before. Here and
elsewhere this patch tries to be generic, thus we do not distinguish
what selector set, selector, or property is parsed except if they do
behave exceptionally, as for example `user={condition(EXPR)}` does.
The sema logic changed in two ways: First, the OMPDeclareVariantAttr
representation changed, as mentioned above, and the sema was adjusted to
work with the new `OpenMPTraitInfo`. Second, the matching and scoring
logic moved into `OMPContext.{h,cpp}`. It is implemented on a flat
representation of the `match` clause that is not tied to clang.
`OpenMPTraitInfo` provides a method to generate this flat structure (see
`struct VariantMatchInfo`) by computing integer score values and boolean
user conditions from the `clang::Expr` we keep for them.
The OpenMP context is now an explicit object (see `struct OMPContext`).
This is in anticipation of construct traits that need to be tracked. The
OpenMP context, as well as the `VariantMatchInfo`, are basically made up
of a set of active or respectively required traits, e.g., 'host', and an
ordered container of constructs which allows duplication. Matching and
scoring is kept as generic as possible to allow easy extension in the
future.
---
Test changes:
The messages checked in `OpenMP/declare_variant_messages.{c,cpp}` have
been auto generated to match the new warnings and notes of the parser.
The "subset" checks were reversed causing the wrong version to be
picked. The tests have been adjusted to correct this.
We do not print scores if the user did not provide one.
We print spaces to make lists in the `match` clause more legible.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: merge_guards_bot, rampitec, mgorny, hiraditya, aheejin, fedor.sergeev, simoncook, bollu, guansong, dexonsmith, jfb, s.egerton, llvm-commits, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71830
The code generation is exactly the same as it was.
But not that the special handling of untied tasks is still handled by
emitUntiedSwitch in clang.
Differential Revision: https://reviews.llvm.org/D69828
According to OpenMP 5.0, cancel and cancellation point constructs are
supported in taskloop directive. Added support for cancellation in
taskloop, master taskloop and parallel master taskloop.
directive.
According to OpenMP 5.0, The atomic_default_mem_order clause specifies the default memory ordering behavior for atomic constructs that must be provided by an implementation. If the default memory ordering is specified as seq_cst, all atomic constructs on which memory-order-clause is not specified behave as if the seq_cst clause appears. If the default memory ordering is specified as relaxed, all atomic constructs on which memory-order-clause is not specified behave as if the relaxed clause appears.
If the default memory ordering is specified as acq_rel, atomic constructs on which memory-order-clause is not specified behave as if the release clause appears if the atomic write or atomic update operation is specified, as if the acquire clause appears if the atomic read operation is specified, and as if the acq_rel clause appears if the atomic captured update operation is specified.
Add support for Flush in the OMPIRBuilder. This patch also adds changes
to clang to use the OMPIRBuilder when '-fopenmp-enable-irbuilder'
commandline option is used.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D70712
regions.
If the lastprivate conditional is passed as shared in inner region, we
shall check if it was ever changed and use this updated value after exit
from the inner region as an update value.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
regions with reductions, lastprivates or linears clauses.
If the lastprivate conditional variable is updated in inner parallel
region with reduction, lastprivate or linear clause, the value must be
considred as a candidate for lastprivate conditional. Also, tracking in
inner parallel regions is not required.
Use canonical decls instead of mangled names in the set of already
emitted decls. This allows to reduce the number of function calls for
getting declarations mangled names and speedup the compilation.
Added codegen support for lastprivate conditional. According to the
standard, if when the conditional modifier appears on the clause, if an
assignment to a list item is encountered in the construct then the
original list item is assigned the value that is assigned to the new
list item in the sequentially last iteration or lexically last section
in which such an assignment is encountered.
We look for the assignment operations and check if the left side
references lastprivate conditional variable. Then the next code is
emitted:
if (last_iv_a <= iv) {
last_iv_a = iv;
last_a = lp_a;
}
At the end the implicit barrier is generated to wait for the end of all
threads and then in the check for the last iteration the private copy is
assigned the last value.
if (last_iter) {
lp_a = last_a; // <--- new code
a = lp_a; // <--- store of private value to the original variable.
}
This removes the OpenMPProcBindClauseKind enum in favor of
llvm::omp::ProcBindKind which lives in OpenMPConstants.h and was
introduced in D70109.
No change in behavior is expected.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D70289
As a permanent and generic solution to the problem of variable
finalization (destructors, lastprivate, ...), this patch introduces the
finalization stack. The objects on the stack describe (1) the
(structured) regions the OpenMP-IR-Builder is currently constructing,
(2) if these are cancellable, and (3) the callback that will perform the
finalization (=cleanup) when necessary.
As the finalization can be necessary multiple times, at different source
locations, the callback takes the position at which code is currently
generated. This position will also encode the destination of the "region
exit" block *iff* the finalization call was issues for a region
generated by the OpenMPIRBuilder. For regions generated through the old
Clang OpenMP code geneneration, the "region exit" is determined by Clang
inside the finalization call instead (see getOMPCancelDestination).
As a first user, the parallel + cancel barrier interaction is changed.
In contrast to the temporary solution before, the barrier generation in
Clang does not need to be aware of the "CancelDestination" block.
Instead, the finalization callback is and, as described above, later
even that one does not need to be.
D70109 will be updated to use this scheme.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D70258
Summary:
Basic codegen for the declarations marked as nontemporal. Also, if the
base declaration in the member expression is marked as nontemporal,
lvalue for member decl access inherits nonteporal flag from the base
lvalue.
Reviewers: rjmccall, hfinkel, jdoerfert
Subscribers: guansong, arphaman, caomhin, kkwli0, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71708
in declare variant.
If the types of the fnction are not equal, but match, at the codegen
thei may have different types. This may lead to compiler crash.
This is a follow up patch to use the OpenMP-IR-Builder, as discussed on
the mailing list ([1] and later) and at the US Dev Meeting'19.
[1] http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: ppenzin, penzn, llvm-commits, cfe-commits, jfb, guansong, bollu, hiraditya, mgorny
Tags: #clang
Differential Revision: https://reviews.llvm.org/D69922
Summary:
The new OpenMPConstants.h is a location for all OpenMP related constants
(and helpers) to live.
This patch moves the directives there (the enum OpenMPDirectiveKind) and
rewires Clang to use the new location.
Initially part of D69785.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: jholewinski, ppenzin, penzn, llvm-commits, cfe-commits, jfb, guansong, bollu, hiraditya, mgorny
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69853
AggValueSlot
This reapplies 8a5b7c3570 after a null
dereference bug in CGOpenMPRuntime::emitUserDefinedMapper.
Original commit message:
This is needed for the pointer authentication work we plan to do in the
near future.
a63a81bd99/clang/docs/PointerAuthentication.rst
The std::pair<const clang::ValueDecl *, llvm::ArrayRef<clang::OMPClauseMappableExprCommon::MappableComponent>>
type will be copied in a range-based for loop. Make the copy explicit to
avoid the -Wrange-loop-analysis warning.
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.
Differential Revision: https://reviews.llvm.org/D70046
If the context selector score was not specified, its value must be set
to 0. Simplify the processing of unspecified scores + save memory in
attribute representation.
If we can determined, that the global tid parameter can be used in the
function, better to use it rather than calling __kmpc_global_thread_num
function.
llvm-svn: 375134
When the parallel region is called directly in the sequential region,
the zeroed tid/bound id are used. But they must point to the different
memory locations as the parameters are marked as noalias.
llvm-svn: 375017
The final list of OpenMP offload targets becomes known only at the link time and since offload registration code depends on the targets list it makes sense to delay offload registration code generation to the link time instead of adding it to the host part of every fat object. This patch moves offload registration code generation from clang to the offload wrapper tool.
This is the last part of the OpenMP linker script elimination patch https://reviews.llvm.org/D64943
Differential Revision: https://reviews.llvm.org/D68746
llvm-svn: 374937
Added parsing/sema/codegen support for 'parallel master taskloop'
constructs. Some of the clauses, like 'grainsize', 'num_tasks', 'final'
and 'priority' are not supported in full, only constant expressions can
be used currently in these clauses.
llvm-svn: 374791
and of vendors, not or.
If several vendors are provided in the same vendor context trait, the
context shall match only if all vendors are matching, not one of them.
This is per OpenMP 5.0, 2.3.3 Matching and Scoring Context Selectors,
all selectors in the construct, device, and implementation sets of the
context selector appear in the corresponding trait set of the OpenMP
context.
llvm-svn: 374107
We previously failed to treat an array with an instantiation-dependent
but not value-dependent bound as being an instantiation-dependent type.
We now track the array bound expression as part of a constant array type
if it's an instantiation-dependent expression.
llvm-svn: 373685
If the context selector has associated score and several contexts
selectors matches current context, the function with the highest score
must be selected.
llvm-svn: 373661
We can point to the target region + emit parent functions names/real var
names if they were not found in host module during device codegen.
llvm-svn: 373620
Linker automatically provides __start_<section name> and __stop_<section name> symbols to satisfy unresolved references if <section name> is representable as a C identifier (see https://sourceware.org/binutils/docs/ld/Input-Section-Example.html for details). These symbols indicate the start address and end address of the output section respectively. Therefore, renaming OpenMP offload entries section name from ".omp.offloading_entries" to "omp_offloading_entries" to use this feature.
This is the first part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943).
Differential Revision: https://reviews.llvm.org/D68070
llvm-svn: 373118
is not provided.
We should not emit any target-dependent code if only -fopenmp flag is
used and device targets are not provided to prevent compiler crash.
llvm-svn: 372623
Runtime function __kmpc_push_tripcount better to call inside of the task
context for target regions. Otherwise, the libomptarget is unable to
link the provided tripcount value for nowait target regions and
completely looses this information.
llvm-svn: 372609
non-ordered loops.
According to OpenMP 5.0, 2.9.2 Worksharing-Loop Construct, Desription, If the static schedule kind is specified or if the ordered clause is specified, and if the nonmonotonic modifier is not specified, the effect is as if the monotonic modifier is specified. Otherwise, unless the monotonic modifier is specified, the effect is as if the nonmonotonic modifier is specified.
The first part of this requirement is implemented in runtime. Patch adds
support for the second, nonmonotonic, part of this requirement.
llvm-svn: 369801
construct.
OpenMP 5.0 introduced new clause for declare target directive, device_type clause, which may accept values host, nohost, and any. Host means
that the function must be emitted only for the host, nohost - only for
the device, and any - for both, device and the host.
llvm-svn: 369775
Summary:
This patch adds support for the close map modifier in Clang.
This ensures that the new map type is marked and passed to the OpenMP runtime appropriately.
Additional regression tests have been merged from patch D55892 (author @saghir).
Reviewers: ABataev, caomhin, jdoerfert, kkwli0
Reviewed By: ABataev
Subscribers: kkwli0, Hahnfeld, saghir, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D65341
llvm-svn: 368491
This patch implements the code generation for OpenMP 5.0 declare mapper
(user-defined mapper) constructs. For each declare mapper, a mapper
function is generated. These mapper functions will be called by the
runtime and/or other mapper functions to achieve user defined mapping.
The design slides can be found at
https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
Re-commit after revert in r367773 because r367755 changed the LLVM-IR
output such that a CHECK line failed.
Patch by Lingda Li <lildmh@gmail.com>
Differential Revision: https://reviews.llvm.org/D59474
llvm-svn: 367905
This patch implements the code generation for OpenMP 5.0 declare mapper
(user-defined mapper) constructs. For each declare mapper, a mapper
function is generated. These mapper functions will be called by the
runtime and/or other mapper functions to achieve user defined mapping.
The design slides can be found at
https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
Patch by Lingda Li <lildmh@gmail.com>
Differential Revision: https://reviews.llvm.org/D59474
llvm-svn: 367773
Summary:
This patch fixes the case where variables in different compilation units or the same compilation unit are under the declare target link clause AND have the same name.
This also fixes the name clash error that occurs when unified memory is activated.
The changes in this patch include:
- Pointers to internal variables are given unique names.
- Externally visible variables are given the same name as before.
- All pointer variables (external or internal) are weakly linked.
Reviewers: ABataev, jdoerfert, caomhin
Reviewed By: ABataev
Subscribers: lebedev.ri, guansong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64592
llvm-svn: 367613
by David Truby.
Summary:
This adds a zero length array section mapping for each pointer captured by a lambda that is used in a target region, as per section 2.19.7.1 of the OpenMP 5 specification.
Reviewers: ABataev
Reviewed By: ABataev
Subscribers: guansong, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64558
llvm-svn: 365777
Target-based runtime functions use int64_t type for sizes, while the
compiler uses size_t type. It leads to miscompilation in 32 bit mode.
llvm-svn: 364327
Summary:
Add support for the C++2a [[no_unique_address]] attribute for targets using the Itanium C++ ABI.
This depends on D63371.
Reviewers: rjmccall, aaron.ballman
Subscribers: dschuff, aheejin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D63451
llvm-svn: 363976
Summary:
This patch adds support for the handling of the variables under the declare target to clause.
The variables in this case are handled like link variables are. A pointer is created on the host and then mapped to the device. The runtime will then copy the address of the host variable in the device pointer.
Reviewers: ABataev, AlexEichenberger, caomhin
Reviewed By: ABataev
Subscribers: guansong, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D63108
llvm-svn: 363959
Summary: This patch avoids the emission of maps for target link variables when unified memory is present.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: guansong, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60883
llvm-svn: 363435
Summary:
This patch adds support for the registration of the requires directives with the runtime.
Each requires directive clause will enable a particular flag to be set.
The set of flags is passed to the runtime to be checked for compatibility with other such flags coming from other object files.
The registration function is called whenever OpenMP is present even if a requires directive is not present. This helps detect cases in which requires directives are used inconsistently.
Reviewers: ABataev, AlexEichenberger, caomhin
Reviewed By: ABataev, AlexEichenberger
Subscribers: jholewinski, guansong, jfb, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60568
llvm-svn: 361298
performance.
Internally generated functions must be marked as always_inlines in most
cases. Patch marks some extra reduction function + outlined parallel
functions as always_inline for better performance, but only if the
optimization is requested.
llvm-svn: 361269
All target-parallel-based constructs can be run in SPMD mode from now
on. Even if num_threads clauses or if clauses are used, such constructs
can be executed in SPMD mode.
llvm-svn: 358595
regions.
Added more complex analysis for number of teams and number of threads in
the target regions, also merged related common code between CGOpenMPRuntime
and CGOpenMPRuntimeNVPTX classes.
llvm-svn: 358126
If the pointer is captured by reference, it must be mapped as
_PTR_AND_OBJ kind of mapping to correctly translate the pointer address
on the device.
llvm-svn: 357488
For the global variables the allocate directive must specify only the
predefined allocator. This allocator must be translated into the correct
form of the address space for the targets that support different address
spaces.
llvm-svn: 356702
Added initial codegen for the local variables with the #pragma omp
allocate directive. Instead of allocating the variables on the stack,
__kmpc_alloc|__kmpc_free functions are used for memory (de-)allocation.
llvm-svn: 356472
array.
If the firstprivate variable is a reference, we may incorrectly classify
the kind of the private copy. Use the type of the private copy instead
of the original shared variable.
llvm-svn: 356098
If the variable was declared and marked as declare target, a new offload
entry with size 0 is created. But if later a definition is created and
marked as declare target, this definition is not added to the entry set
and the definition remains not mapped to the target. Patch fixes this
problem allowing to redefine the size and linkage for
previously registered declaration.
llvm-svn: 355960
memory.
If the variable with the constant non-scalar type is firstprivatized in
the target region, the local copy is created with the data copying.
Instead, we allocate the copy in the constant memory and avoid extra
copying in the outlined target regions. This global copy is used in the
target regions without loss of the performance.
llvm-svn: 355418
The various EltSize, Offset, DataLayout, and StructLayout arguments
are all computable from the Address's element type and the DataLayout
which the CGBuilder already has access to.
After having previously asserted that the computed values are the same
as those passed in, now remove the redundant arguments from
CGBuilder's Create*GEP functions.
Differential Revision: https://reviews.llvm.org/D57767
llvm-svn: 353629
Some of these functions take some extraneous arguments, e.g. EltSize,
Offset, which are computable from the Type and DataLayout.
Add some asserts to ensure that the computed values are consistent
with the passed-in values, in preparation for eliminating the
extraneous arguments. This also asserts that the Type is an Array for
the calls named "Array" and a Struct for the calls named "Struct".
Then, correct a couple of errors:
1. Using CreateStructGEP on an array type. (this causes the majority
of the test differences, as struct GEPs are created with i32
indices, while array GEPs are created with i64 indices)
2. Passing the wrong Offset to CreateStructGEP in TargetInfo.cpp on
x86-64 NACL (which uses 32-bit pointers).
Differential Revision: https://reviews.llvm.org/D57766
llvm-svn: 353529
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.
Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.
Differential Revision: https://reviews.llvm.org/D57668
llvm-svn: 353184
Summary: this commit adds support to a new dependence type introduced in OpenMP
5.0. The LLVM OpenMP RTL already supports this feature, so we only need to
modify CLANG to take advantage of them.
Differential Revision: https://reviews.llvm.org/D57576
llvm-svn: 353018
This patch implements parsing and sema for "omp declare mapper"
directive. User defined mapper, i.e., declare mapper directive, is a new
feature in OpenMP 5.0. It is introduced to extend existing map clauses
for the purpose of simplifying the copy of complex data structures
between host and device (i.e., deep copy). An example is shown below:
struct S { int len; int *d; };
#pragma omp declare mapper(struct S s) map(s, s.d[0:s.len]) // Memory region that d points to is also mapped using this mapper.
Contributed-by: Lingda Li <lildmh@gmail.com>
Differential Revision: https://reviews.llvm.org/D56326
llvm-svn: 352906
This fixes most references to the paths:
llvm.org/svn/
llvm.org/git/
llvm.org/viewvc/
github.com/llvm-mirror/
github.com/llvm-project/
reviews.llvm.org/diffusion/
to instead point to https://github.com/llvm/llvm-project.
This is *not* a trivial substitution, because additionally, all the
checkout instructions had to be migrated to instruct users on how to
use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of
checking out various projects into various subdirectories.
I've attempted to not change any scripts here, only documentation. The
scripts will have to be addressed separately.
Additionally, I've deleted one document which appeared to be outdated
and unneeded:
lldb/docs/building-with-debug-llvm.txt
Differential Revision: https://reviews.llvm.org/D57330
llvm-svn: 352514
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
With commit r351627, LLVM gained the ability to apply (existing) IPO
optimizations on indirections through callbacks, or transitive calls.
The general idea is that we use an abstraction to hide the middle man
and represent the callback call in the context of the initial caller.
It is described in more detail in the commit message of the LLVM patch
r351627, the llvm::AbstractCallSite class description, and the
language reference section on callback-metadata.
This commit enables clang to emit !callback metadata that is
understood by LLVM. It does so in three different cases:
1) For known broker functions declarations that are directly
generated, e.g., __kmpc_fork_call for the OpenMP pragma parallel.
2) For known broker functions that are identified by their name and
source location through the builtin detection, e.g.,
pthread_create from the POSIX thread API.
3) For user annotated functions that carry the "callback(callee, ...)"
attribute. The attribute has to include the name, or index, of
the callback callee and how the passed arguments can be
identified (as many as the callback callee has). See the callback
attribute documentation for detailed information.
Differential Revision: https://reviews.llvm.org/D55483
llvm-svn: 351629
Each we create the target regions with the teams distribute inner
region, we can better estimate number of the teams required to execute
the target region. Function __kmpc_push_target_tripcount() is used for
purpose, which accepts device_id and the number of the iterations,
performed by the associated loop.
llvm-svn: 350571
All of the other constructors already take a reference to the AST context.
This avoids calling Decl::getASTContext in most cases. Additionally move
the definition of the constructor from Expr.h to Expr.cpp since it is calling
DeclRefExpr::computeDependence. NFC.
llvm-svn: 349901
A map clause with the close map-type-modifier is a hint to
prefer that the variables are mapped using a copy into faster
memory.
Patch by Ahsan Saghir (saghir)
Differential Revision: https://reviews.llvm.org/D55719
llvm-svn: 349551
__kmpc_barrier runtime functions must be marked as convergent to prevent
some dangerous optimizations. Also, for NVPTX target all barriers must
be emitted as simple barriers.
llvm-svn: 348271
It seems the two failing tests can be simply fixed after r348037
Fix 3 cases in Analysis/builtin-functions.cpp
Delete the bad CodeGen/builtin-constant-p.c for now
llvm-svn: 348053
Kept the "indirect_builtin_constant_p" test case in test/SemaCXX/constant-expression-cxx1y.cpp
while we are investigating why the following snippet fails:
extern char extern_var;
struct { int a; } a = {__builtin_constant_p(extern_var)};
llvm-svn: 348039
This was reverted in r347656 due to me thinking it caused a miscompile of
Chromium. Turns out it was the Chromium code that was broken.
llvm-svn: 347756
This caused a miscompile in Chrome (see crbug.com/908372) that's
illustrated by this small reduction:
static bool f(int *a, int *b) {
return !__builtin_constant_p(b - a) || (!(b - a));
}
int arr[] = {1,2,3};
bool g() {
return f(arr, arr + 3);
}
$ clang -O2 -S -emit-llvm a.cc -o -
g() should return true, but after r347417 it became false for some reason.
This also reverts the follow-up commits.
r347417:
> Re-Reinstate 347294 with a fix for the failures.
>
> Don't try to emit a scalar expression for a non-scalar argument to
> __builtin_constant_p().
>
> Third time's a charm!
r347446:
> The result of is.constant() is unsigned.
r347480:
> A __builtin_constant_p() returns 0 with a function type.
r347512:
> isEvaluatable() implies a constant context.
>
> Assume that we're in a constant context if we're asking if the expression can
> be compiled into a constant initializer. This fixes the issue where a
> __builtin_constant_p() in a compound literal was diagnosed as not being
> constant, even though it's always possible to convert the builtin into a
> constant.
r347531:
> A "constexpr" is evaluated in a constant context. Make sure this is reflected
> if a __builtin_constant_p() is a part of a constexpr.
llvm-svn: 347656
For the NVPTX target default locations should be emitted as constants +
additional info must be emitted in the reserved_2 field of the ident_t
structure. The 1st bit controls the execution mode and the 2nd bit
controls use of the lightweight runtime. The combination of the bits for
Non-SPMD mode + lightweight runtime represents special undefined mode,
used outside of the target regions for orphaned directives or functions.
Should allow and additional optimization inside of the target regions.
llvm-svn: 347425
The base pointer for the lambda mapping must point to the lambda capture
placement and pointer must point to the captured variable itself. Patch
fixes this problem.
llvm-svn: 346408
Fixed lookup for the target regions in unused virtual functions + fixed
processing of the global variables not marked as declare target but
emitted during debug info emission.
llvm-svn: 346343
The previously used combination `PTR_AND_OBJ | PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ | LITERAL`.
llvm-svn: 345982
Added support for mapping of lambdas in the target regions. It scans all
the captures by reference in the lambda, implicitly maps those variables
in the target region and then later reinstate the addresses of
references in lambda to the correct addresses of the captured|privatized
variables.
llvm-svn: 345609
Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: jholewinski, guansong, cfe-commits
Differential Revision: https://reviews.llvm.org/D53448
llvm-svn: 345509
Summary:
For the following code:
```
int i;
#pragma omp taskloop
for (i = 0; i < 100; ++i)
{}
#pragma omp taskloop nogroup
for (i = 0; i < 100; ++i)
{}
```
Clang emits the following LLVM IR:
```
...
call void @__kmpc_taskgroup(%struct.ident_t* @0, i32 %0)
%2 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates*)* @.omp_task_entry. to i32 (i32, i8*)*))
...
call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %2, i32 1, i64* %8, i64* %9, i64 %13, i32 0, i32 0, i64 0, i8* null)
call void @__kmpc_end_taskgroup(%struct.ident_t* @0, i32 %0)
...
%15 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates.1*)* @.omp_task_entry..2 to i32 (i32, i8*)*))
...
call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %15, i32 1, i64* %21, i64* %22, i64 %26, i32 0, i32 0, i64 0, i8* null)
```
The first set of instructions corresponds to the first taskloop construct. It is important to note that the implicit taskgroup region associated with the taskloop construct has been materialized in our IR: the `__kmpc_taskloop` occurs inside a taskgroup region. Note also that this taskgroup region does not exist in our second taskloop because we are using the `nogroup` clause.
The issue here is the 4th argument of the kmpc_taskloop call, starting from the end, is always a zero. Checking the LLVM OpenMP RT implementation, we see that this argument corresponds to the nogroup parameter:
```
void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup,
int sched, kmp_uint64 grainsize, void *task_dup);
```
So basically we always tell to the RT to do another taskgroup region. For the first taskloop, this means that we create two taskgroup regions. For the second example, it means that despite the fact we had a nogroup clause we are going to have a taskgroup region, so we unnecessary wait until all descendant tasks have been executed.
Reviewers: ABataev
Reviewed By: ABataev
Subscribers: rogfer01, cfe-commits
Differential Revision: https://reviews.llvm.org/D53636
llvm-svn: 345180
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.
llvm-svn: 343856
Add support for OMP5.0 requires directive and unified_address clause.
Patches to follow will include support for additional clauses.
Differential Revision: https://reviews.llvm.org/D52359
llvm-svn: 343063
Comparison functions used in sorting algorithms need to have strict weak
ordering. Remove the assert and allow comparisons on all lists.
llvm-svn: 342774
declare reduction.
If the declare reduction construct with the non-dependent type is
defined in the template construct, the compiler might crash on the
template instantition. Reworked the whole instantiation scheme for the
declare reduction constructs to fix this problem correctly.
llvm-svn: 342151
Currently ident_t objects are created const when debug info is not
enabled, but the libittnotify libray in the OpenMP runtime writes to
the reserved_2 field (See __kmp_itt_region_forking in
openmp/runtime/src/kmp_itt.inl). Now create ident_t objects non-const.
Differential Revision: https://reviews.llvm.org/D51331
llvm-svn: 340934
The compiler may produce unexpected error messages/crashes when declare
target variables were used. Patch fixes problems with the declarations
marked as declare target to or link.
llvm-svn: 339805