Commit Graph

220 Commits

Author SHA1 Message Date
Gheorghe-Teodor Bercea 8233af90e1 [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD mode achieve coalescing
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.

Reviewers: ABataev, Hahnfeld, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52629

llvm-svn: 343260
2018-09-27 20:29:00 +00:00
Gheorghe-Teodor Bercea 02650d4c2c [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.

Reviewers: ABataev, caomhin, Hahnfeld

Reviewed By: ABataev, Hahnfeld

Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52434

llvm-svn: 343253
2018-09-27 19:22:56 +00:00
Alexey Bataev e6aa4694de [OPENMP] Fix PR38903: Crash on instantiation of the non-dependent
declare reduction.

If the declare reduction construct with the non-dependent type is
defined in the template construct, the compiler might crash on the
template instantition. Reworked the whole instantiation scheme for the
declare reduction constructs to fix this problem correctly.

llvm-svn: 342151
2018-09-13 16:54:05 +00:00
Alexey Bataev f138fda5ed [OPENMP] Fix emission of the loop doacross constructs.
The number of loops associated with the OpenMP loop constructs should
not be considered as the number loops to collapse.

llvm-svn: 339603
2018-08-13 19:04:24 +00:00
Alexey Bataev 23647171ea Revert "[OPENMP] Fix emission of the loop doacross constructs."
This reverts commit r339568 because of the problems with the buildbots.

llvm-svn: 339574
2018-08-13 14:42:18 +00:00
Alexey Bataev 0ce6360e0e [OPENMP] Fix emission of the loop doacross constructs.
The number of loops associated with the OpenMP loop constructs should
not be considered as the number loops to collapse.

llvm-svn: 339568
2018-08-13 14:05:43 +00:00
Alexey Bataev bf8fe71b91 [OPENMP] Mark variables captured in declare target region as implicitly
declare target.

According to OpenMP 5.0, variables captured in lambdas in declare target
regions must be considered as implicitly declare target.

llvm-svn: 339152
2018-08-07 16:14:36 +00:00
Alexey Bataev c52f01d1d7 [OPENMP] Fix checks for declare target link entries.
If the declare target link entries are created but not used, the
compiler will produce an error message. Patch improves handling of such
situations + improves checks for possibly lost declare target variables.

llvm-svn: 337207
2018-07-16 20:05:25 +00:00
Adrian Prantl 9fc8faf9e6 Remove \brief commands from doxygen comments.
This is similar to the LLVM change https://reviews.llvm.org/D46290.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46320

llvm-svn: 331834
2018-05-09 01:00:01 +00:00
Alexey Bataev 6d94410914 [OPENMP] Support C++ member functions in the device constructs.
Added correct emission of the C++ member functions for the device
function when they are used in the device constructs.

llvm-svn: 331365
2018-05-02 15:45:28 +00:00
Alexey Bataev 18fa2323b6 [OPENMP] Emit names of the globals depending on target.
Some symbols are not allowed to be used as names on some targets. Patch
ries to unify the emission of the names of LLVM globals so they could be
used on different targets.

llvm-svn: 331358
2018-05-02 14:20:50 +00:00
Alexey Bataev a4fa0b880a [OPENMP] General code improvements.
llvm-svn: 330140
2018-04-16 17:59:34 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Alexey Bataev 03f270c900 [OPENMP] Added emission of offloading data sections for declare target
variables.

Added emission of the offloading data sections for the variables within
declare target regions + fixes emission of the declare target variables
marked as declare target not within the declare target region.

llvm-svn: 328888
2018-03-30 18:31:07 +00:00
Alexey Bataev 34f8a7043b [OPENMP] Codegen for ctor|dtor of declare target variables.
When the declare target variables are emitted for the device,
constructors|destructors for these variables must emitted and registered
by the runtime in the offloading sections.

llvm-svn: 328705
2018-03-28 14:28:54 +00:00
Alexey Bataev 92327c50d3 [OPENMP] Codegen for declare target with link clause.
If the link clause is used on the declare target directive, the object
should be linked on target or target data directives, not during the
codegen. Patch adds support for this clause.

llvm-svn: 328544
2018-03-26 16:40:55 +00:00
Alexey Bataev b7f3cba84c [OPENMP, NVPTX] Emit correct thread id.
We emitted fake thread id for the outined function in NVPTX codegen.
Patch adds emission of the real thread id.

llvm-svn: 327867
2018-03-19 17:04:07 +00:00
Alexey Bataev 4f4bf7c348 [OPENMP] Codegen for `omp declare target` construct.
Added initial codegen for device side of declarations inside `omp
declare target` construct + codegen for implicit `declare target`
functions, which are used in the target regions.

llvm-svn: 327636
2018-03-15 15:47:20 +00:00
Gheorghe-Teodor Bercea d3dcf2f05d [OpenMP] Add OpenMP data sharing infrastructure using global memory
Summary:
This patch handles the Clang code generation phase for the OpenMP data sharing infrastructure.

TODO: add a more detailed description.

Reviewers: ABataev, carlo.bertolli, caomhin, hfinkel, Hahnfeld

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D43660

llvm-svn: 327513
2018-03-14 14:17:45 +00:00
Alexey Bataev 1c44e15f6d [OPENMP] Fix generation of the unique names for task reduction
variables.

If the task has reduction construct and this construct for some variable
requires unique threadprivate storage, we may generate different names
for variables used in taskgroup task_reduction clause and in task
  in_reduction clause. Patch fixes this problem.

llvm-svn: 326827
2018-03-06 18:59:43 +00:00
Alexey Bataev 7ef47a67a5 [OPENMP] Require valid SourceLocation in function call, NFC.
Removed default empty SourceLocation argument from `emitCall` function
and require valid location.

llvm-svn: 325812
2018-02-22 18:33:31 +00:00
Alexey Bataev 8451efad89 [OPENMP] Add codegen for `depend` clauses on `target` directive.
Added basic support for codegen of `depend` clauses on `target`
directive.

llvm-svn: 322501
2018-01-15 19:06:12 +00:00
Alexey Bataev 7cae94e74c [OPENMP] Add debug info for generated functions.
Most of the generated functions for the OpenMP were generated with
disabled debug info. Patch fixes this for better user experience.

llvm-svn: 321816
2018-01-04 19:45:16 +00:00
Alexey Bataev a8a9153a37 [OPENMP] Support for -fopenmp-simd option with compilation of simd loops
only.

Added support for -fopenmp-simd option that allows compilation of
simd-based constructs without emission of OpenMP runtime calls.

llvm-svn: 321560
2017-12-29 18:07:07 +00:00
Alexey Bataev e213f3e61a [OPENMP] Fix PR34916: Crash on mixing taskloop|tasks directives.
If both taskloop and task directives are used at the same time in one
program, we may ran into the situation when the particular type for task
directive is reused for taskloop directives. Patch fixes this problem.

llvm-svn: 315464
2017-10-11 15:29:40 +00:00
Alexey Bataev f43f714213 [OPENMP] Fix for PR33922: New ident_t flags for
__kmpc_for_static_fini().

Added special flags for calls of __kmpc_for_static_fini(), like previous
ly for __kmpc_for_static_init(). Added flag OMP_IDENT_WORK_DISTRIBUTE
for distribute cnstruct, OMP_IDENT_WORK_SECTIONS for sections-based
  constructs and OMP_IDENT_WORK_LOOP for loop-based constructs in
  location flags.

llvm-svn: 312642
2017-09-06 16:17:35 +00:00
Alexey Bataev 0f87dbee4e [OPENMP] Fix for PR33922: New ident_t flags for
__kmpc_for_static_init().

OpenMP 5.0 will include OpenMP Tools interface that requires distinguishing different worksharing constructs.

Since the same entry point (__kmp_for_static_init(ident_t *loc,
kmp_int32 global_tid,........)) is called in case static
loop/sections/distribute it is suggested using 'flags' field of the
ident_t structure to pass the type of the construct.

llvm-svn: 310865
2017-08-14 17:56:13 +00:00
Alexey Bataev 3c595a6b2c [OPENMP] Generalization of calls of the outlined functions.
General improvement of the outlined functions calls.

llvm-svn: 310840
2017-08-14 15:01:03 +00:00
Alexey Bataev 3b8d5586ec [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310387
2017-08-08 18:04:06 +00:00
Alexey Bataev 4aa19052f3 Revert "[OPENMP][DEBUG] Set proper address space info if required by target."
This reverts commit r310377.

llvm-svn: 310379
2017-08-08 16:45:36 +00:00
Alexey Bataev 5a497136be [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310377
2017-08-08 16:29:11 +00:00
Alexey Bataev 6a824b9a45 Revert "[OPENMP][DEBUG] Set proper address space info if required by target."
This reverts commit r310360.

llvm-svn: 310364
2017-08-08 14:44:43 +00:00
Alexey Bataev 59b81e51d3 [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310360
2017-08-08 14:25:14 +00:00
Alexey Bataev d90ec748a8 Revert "[OPENMP][DEBUG] Set proper address space info if required by target."
This reverts commit r310104.

llvm-svn: 310135
2017-08-04 21:27:11 +00:00
Alexey Bataev be83fad57e [OPENMP][DEBUG] Set proper address space info if required by target.
Arguments, passed to the outlined function, must have correct address
space info for proper Debug info support. Patch sets global address
space for arguments that are mapped and passed by reference.

Also, cuda-gdb does not handle reference types correctly, so reference
arguments are represented as pointers.

llvm-svn: 310104
2017-08-04 19:46:10 +00:00
Alexey Bataev 2c7eee5b84 [OPENMP] Unify generation of outlined function calls.
llvm-svn: 310098
2017-08-04 19:10:54 +00:00
Alexey Bataev be5a8b42cd [OPENMP] Codegen for reduction clauses in 'taskloop' directives.
Adds codegen for taskloop-based directives.

llvm-svn: 308174
2017-07-17 13:30:36 +00:00
Simon Pilgrim 6c0eeffe71 Fix spelling mistakes in comments. NFCI.
llvm-svn: 307932
2017-07-13 17:34:44 +00:00
Simon Pilgrim f87eb880dd Fix -Wdocumentation warning. NFCI
llvm-svn: 307931
2017-07-13 17:29:48 +00:00
Alexey Bataev 5c40bec5eb [OPENMP] Generalization of codegen for reduction clauses.
Reworked codegen for reduction clauses for future support of reductions
in task-based directives.

llvm-svn: 307910
2017-07-13 13:36:14 +00:00
Carlo Bertolli b0ff0a69c3 Recommit of
[OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host

https://reviews.llvm.org/D29508

This patch makes the following additions:

It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation.
It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses.
It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code.

llvm-svn: 301340
2017-04-25 17:52:12 +00:00
Carlo Bertolli f09daae75d Revert r301223
llvm-svn: 301233
2017-04-24 19:50:35 +00:00
Carlo Bertolli 4287d65c10 [OpenMP] Initial implementation of code generation for pragma 'distribute parallel for' on host
https://reviews.llvm.org/D29508

This patch makes the following additions:

1. It abstracts away loop bound generation code from procedures associated with pragma 'for' and loops in general, in such a way that the same procedures can be used for 'distribute parallel for' without the need for a full re-implementation.
2. It implements code generation for 'distribute parallel for' and adds regression tests. It includes tests for clauses.

It is important to notice that most of the clauses are implemented as part of existing procedures. For instance, firstprivate is already implemented for 'distribute' and 'for' as separate pragmas. As the implementation of 'distribute parallel for' is based on the same procedures, then we automatically obtain implementation for such clauses without the need to add new code. However, this requires regression tests that verify correctness of produced code.

Looking forward to comments.

llvm-svn: 301223
2017-04-24 19:26:11 +00:00
Simon Pilgrim 2c51880a82 Spelling mistakes in comments. NFCI. (PR27635)
llvm-svn: 299083
2017-03-30 14:13:19 +00:00
Arpith Chacko Jacob 101e8fb1f3 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295333
2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob bd6344c0be Revert r295319 while investigating buildbot failure.
llvm-svn: 295323
2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob 8e170fc857 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295319
2017-02-16 14:03:36 +00:00
Arpith Chacko Jacob 19b911cb75 [OpenMP] Codegen support for 'target parallel' on the host.
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements.  Support for this functionality is included in
the patch.

A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region.  Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp).  For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not.  The patch adds support for handling multiple captured
statements based on the combined directive.

When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753

llvm-svn: 292419
2017-01-18 18:18:53 +00:00
Arpith Chacko Jacob 42793e000a Revert r292374 to debug Windows buildbot failure.
llvm-svn: 292400
2017-01-18 15:36:05 +00:00
Arpith Chacko Jacob 68019578a3 [OpenMP] Codegen support for 'target parallel' on the host.
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements.  Support for this functionality is included in
the patch.

A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region.  Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp).  For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not.  The patch adds support for handling multiple captured
statements based on the combined directive.

When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753

llvm-svn: 292374
2017-01-18 15:14:52 +00:00
Arpith Chacko Jacob bb36fe8dba [OpenMP] Basic support for a parallel directive in a target region on an NVPTX device
Summary:

This patch introduces support for the execution of parallel constructs in a target
region on the NVPTX device.  Parallel regions must be in the lexical scope of the
target directive.

The master thread in the master warp signals parallel work for worker threads in worker
warps on encountering a parallel region.

Note: The patch does not yet support capture of arguments in a parallel region so
the test cases are simple.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28145

llvm-svn: 291565
2017-01-10 15:42:51 +00:00
Samuel Antao f83efdb77a [OpenMP] Add fields for flags in the offload entry descriptor.
Summary:
This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use. 

 Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added. 

The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031.

Reviewers: ABataev, hfinkel, carlo.bertolli, kkwli0, arpith-jacob, Hahnfeld

Subscribers: cfe-commits, caomhin, jholewinski

Differential Revision: https://reviews.llvm.org/D28298

llvm-svn: 291124
2017-01-05 16:02:49 +00:00
Samuel Antao cc10b85789 [OpenMP] Codegen for use_device_ptr clause.
Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev

Subscribers: caomhin, cfe-commits

Differential Revision: https://reviews.llvm.org/D22691

llvm-svn: 276977
2016-07-28 14:23:26 +00:00
Samuel Antao 8d2d730f2a [OpenMP] Codegen for target update directive.
Summary: This patch implements the code generation for the `target update` directive. The implemntation relies on the logic already in place for target data standalone directives, i.e. target enter/exit data.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev

Subscribers: caomhin, cfe-commits

Differential Revision: http://reviews.llvm.org/D20650

llvm-svn: 270886
2016-05-26 18:30:22 +00:00
Alexey Bataev 8b42706a6e [OPENMP 4.5] Codegen for dacross loop synchronization constructs.
OpenMP 4.5 adds support for doacross loop synchronization. Patch
implements codegen for this construct.

llvm-svn: 270690
2016-05-25 12:36:08 +00:00
Alexey Bataev 1e1e286a6b [OPENMP 4.5] Initial codegen for 'priority' clause in task-based
directives.

OpenMP 4.5 supports clause 'priority' in task-based directives. Patch
adds initial codegen support for this clause in codegen.

llvm-svn: 269050
2016-05-10 12:21:02 +00:00
Alexey Bataev 8a83159731 [OPENMP 4.0] Fixed codegen for destructors in task-based directives.
If private variables require destructors call at the deletion of the
task, additional flag in task flags must be set. Patch fixes this
problem.

llvm-svn: 269039
2016-05-10 10:36:51 +00:00
Alexey Bataev 9ebd742748 [OPENMP 4.5] Add codegen support in runtime for '[non]monotonic'
schedule modifiers.

Runtime library expects some additional data in schedule argument for
loop-based directives, that have additional schedule modifiers
'monotonic|nonmonotonic'.

llvm-svn: 269035
2016-05-10 09:57:36 +00:00
Alexey Bataev c7a82b41a7 [OPENMP 4.0] Codegen for 'declare simd' directive.
OpenMP 4.0 adds support for elemental functions using declarative
directive '#pragma omp declare simd'. Patch adds mangling for simd
functions in accordance with
https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt

llvm-svn: 268721
2016-05-06 09:40:08 +00:00
Alexey Bataev f93095a003 [OPENMP 4.5] Codegen for 'lastprivate' clauses in 'taskloop' directives.
OpenMP 4.5 adds taskloop/taskloop simd directives. These directives
allow to use lastprivate clause. Patch adds codegen for this clause.

llvm-svn: 268618
2016-05-05 08:46:22 +00:00
Alexey Bataev 24b5baed27 [OPENMP] Simplified interface for codegen of tasks, NFC.
Reduced number of arguments in member functions of runtime support
library for task-based directives.

llvm-svn: 267863
2016-04-28 09:23:51 +00:00
Alexey Bataev 2b19a6fe53 [OPENMP 4.5] Codegen for 'grainsize/num_tasks' clauses of 'taskloop'
directive.

OpenMP 4.5 defines 'taskloop' directive and 2 additional clauses
'grainsize' and 'num_tasks' for this directive. Patch adds codegen for
these clauses.
These clauses are generated as arguments of the '__kmpc_taskloop'
libcall and are encoded the following way:

void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup,  int sched, kmp_uint64 grainsize, void *task_dup);

If 'grainsize' is specified, 'sched' argument must be set to '1' and
'grainsize' argument must be set to the value of the 'grainsize' clause.
If 'num_tasks' is specified, 'sched' argument must be set to '2' and
'grainsize' argument must be set to the value of the 'num_tasks' clause.
It is possible because these 2 clauses are mutually exclusive and can't
be used at the same time on the same directive.
If none of these clauses is specified, 'sched' argument must be set to
'0'.

llvm-svn: 267862
2016-04-28 09:15:06 +00:00
NAKAMURA Takumi 01d07dbee5 CGOpenMPRuntime.h: Prune extra comma in \param. [-Wdocumentation]
llvm-svn: 267845
2016-04-28 02:45:21 +00:00
Samuel Antao 8dd6628743 [OpenMP] Code generation for target exit data directive
Summary:
This patch adds support for the target exit data directive code generation.

Given that, apart from the employed runtime call, target exit data requires the same code generation pattern as target enter data, the OpenMP codegen entry point was renamed and reused for both.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev

Subscribers: cfe-commits, fraggamuffin, caomhin

Differential Revision: http://reviews.llvm.org/D17369

llvm-svn: 267814
2016-04-27 23:14:30 +00:00
Samuel Antao bd0ae2e14c [OpenMP] Code generation for target enter data directive
Summary: This patch adds support for the target enter data directive code generation.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev

Subscribers: cfe-commits, fraggamuffin, caomhin

Differential Revision: http://reviews.llvm.org/D17368

llvm-svn: 267812
2016-04-27 23:07:29 +00:00
Samuel Antao df158d5567 [OpenMP] Code generation for target data directive
Summary:
This patch adds support for the target data directive code generation.

Part of the already existent functionality related with data maps is moved to a new function so that it could be reused.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev

Subscribers: cfe-commits, fraggamuffin, caomhin

Differential Revision: http://reviews.llvm.org/D17367

llvm-svn: 267811
2016-04-27 22:58:19 +00:00
NAKAMURA Takumi 90d96bf41e CGOpenMPRuntime.h: Prune '\param IfCond' in r267395. [-Wdocumentation]
llvm-svn: 267503
2016-04-26 00:45:00 +00:00
Alexey Bataev 7292c29bb5 [OPENMP 4.5] Codegen for 'taskloop' directive.
The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed.
The next code will be generated for the taskloop directive:
    #pragma omp taskloop num_tasks(N) lastprivate(j)
        for( i=0; i<N*GRAIN*STRIDE-1; i+=STRIDE ) {
          int th = omp_get_thread_num();
          #pragma omp atomic
            counter++;
          #pragma omp atomic
            th_counter[th]++;
          j = i;
    }

Generated code:
task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct
task),sizeof(struct shar),&task_entry);
psh = task->shareds;
psh->pth_counter = &th_counter;
psh->pcounter = &counter;
psh->pj = &j;
task->lb = 0;
task->ub = N*GRAIN*STRIDE-2;
task->st = STRIDE;
__kmpc_taskloop(
NULL,             // location
gtid,             // gtid
task,             // task structure
1,                // if clause value
&task->lb,        // lower bound
&task->ub,        // upper bound
STRIDE,           // loop increment
0,                // 1 if nogroup specified
2,                // schedule type: 0-none, 1-grainsize, 2-num_tasks
N,                // schedule value (ignored for type 0)
(void*)&__task_dup_entry // tasks duplication routine
);

llvm-svn: 267395
2016-04-25 12:22:29 +00:00
Alexey Bataev 48591dd98c [OPENMP] Codegen for untied tasks.
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.

llvm-svn: 266853
2016-04-20 04:01:36 +00:00
Alexey Bataev 995e861ba6 Revert "[OPENMP] Codegen for untied tasks."
This reverts commit r266754.

llvm-svn: 266755
2016-04-19 16:36:01 +00:00
Alexey Bataev 823acfacdf [OPENMP] Codegen for untied tasks.
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.

llvm-svn: 266754
2016-04-19 16:27:55 +00:00
Alexey Bataev bec9572213 Revert "[OPENMP] Codegen for untied tasks."
This reverts commit 266722.

llvm-svn: 266724
2016-04-19 09:27:38 +00:00
Alexey Bataev 26b2577f6b [OPENMP] Codegen for untied tasks.
If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. Patch adds proper codegen for untied tasks.

llvm-svn: 266722
2016-04-19 09:10:27 +00:00
Carlo Bertolli c687225b43 [OPENMP] Codegen for teams directive for NVPTX
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:

Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.
Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.
Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.

http://reviews.llvm.org/D17963

llvm-svn: 265304
2016-04-04 15:55:02 +00:00
Alexey Bataev 14fa1c6b60 [OPENMP] Allow runtime insert its own code inside OpenMP regions.
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
 (required) code to support target specific codegen.

llvm-svn: 264700
2016-03-29 05:34:15 +00:00
Alexey Bataev f539faa733 Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions."
Reverting because of failed tests.

llvm-svn: 264577
2016-03-28 12:58:34 +00:00
Alexey Bataev 424be92831 [OPENMP] Allow runtime insert its own code inside OpenMP regions.
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
 (required) code to support target specific codegen.

llvm-svn: 264576
2016-03-28 12:52:58 +00:00
Alexey Bataev f662b5943c Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions."
This reverts commit 3ee791165100607178073f14531a0dc90c622b36.

llvm-svn: 264570
2016-03-28 10:12:03 +00:00
Alexey Bataev b8c425c4f7 [OPENMP] Allow runtime insert its own code inside OpenMP regions.
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
  (required) code to support target specific codegen.

llvm-svn: 264569
2016-03-28 09:53:43 +00:00
Arpith Chacko Jacob 5c309e475d [OpenMP] Base support for target directive codegen on NVPTX device.
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.

Reviewers: ABataev

Differential Revision: http://reviews.llvm.org/D17877

Reworked test case after buildbot failure on windows.
Updated patch to integrate r263837 and test case nvptx_target_firstprivate_codegen.cpp.

llvm-svn: 264018
2016-03-22 01:48:56 +00:00
Arpith Chacko Jacob 129fa9a048 Revert r263783 as buildbot failure is being investigated.
llvm-svn: 263784
2016-03-18 12:39:40 +00:00
Arpith Chacko Jacob ac563708ab [OpenMP] Base support for target directive codegen on NVPTX device.
Summary:
Reworked test case after buildbot failure on windows.

This patch adds base support for codegen of the target directive on the NVPTX device.

Reviewers: ABataev

Differential Revision: http://reviews.llvm.org/D17877

llvm-svn: 263783
2016-03-18 11:47:43 +00:00
Alexey Bataev a839dddf92 [OPENMP 4.0] Use 'declare reduction' constructs in 'reduction' clauses.
OpenMP 4.0 allows to define custom reduction operations using '#pragma
omp declare reduction' construct. Patch allows to use this custom
defined reduction operations in 'reduction' clauses.

llvm-svn: 263701
2016-03-17 10:19:46 +00:00
Arpith Chacko Jacob 9cb61faa61 Revert commit http://reviews.llvm.org/D17877 to fix tests on x86.
llvm-svn: 263589
2016-03-15 21:26:34 +00:00
Arpith Chacko Jacob 5e1493b560 [OpenMP] Base support for target directive codegen on NVPTX device.
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.

Reviewers: ABataev

Differential Revision: http://reviews.llvm.org/D17877

llvm-svn: 263587
2016-03-15 21:04:57 +00:00
Arpith Chacko Jacob fc46c25d74 Reverted http://reviews.llvm.org/D17877 to fix tests.
llvm-svn: 263555
2016-03-15 16:19:13 +00:00
Arpith Chacko Jacob c61744c26b [OpenMP] Base support for target directive codegen on NVPTX device.
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.

Reviewers: ABataev

Differential Revision: http://reviews.llvm.org/D17877

llvm-svn: 263552
2016-03-15 15:24:52 +00:00
Carlo Bertolli fc35ad2bbc Reapply r262741 [OPENMP] Codegen for distribute directive
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.

http://reviews.llvm.org/D17170

llvm-svn: 262832
2016-03-07 16:04:49 +00:00
Simon Pilgrim 1feae0d0bb Fixed -Wdocumentation warning - typo in a parameter name
llvm-svn: 262783
2016-03-05 22:35:55 +00:00
Samuel Antao bf4d18d3d2 Revert r262741 - [OPENMP] Codegen for distribute directive
Was causing a failure in one of the buildbot slaves.

llvm-svn: 262744
2016-03-04 21:02:14 +00:00
Carlo Bertolli 4a56e3831d [OPENMP] Codegen for distribute directive
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.

http://reviews.llvm.org/D17170

llvm-svn: 262741
2016-03-04 20:24:58 +00:00
Alexey Bataev c5b1d320b8 [OPENMP 4.0] Codegen for 'declare reduction' construct.
Emit function for 'combiner' part of 'declare reduction' construct and
'initialilzer' part, if any.

llvm-svn: 262699
2016-03-04 09:22:22 +00:00
Carlo Bertolli 430d8ecc55 Add code generation for teams directive inside target region
llvm-svn: 262652
2016-03-03 20:34:23 +00:00
Alexey Bataev 50b3c95992 [OPENMP] Improved layout of CGOpenMPRuntime class, NFC.
llvm-svn: 261315
2016-02-19 10:38:26 +00:00
Samuel Antao 2de62b0c89 [OpenMP] Rename the offload entry points.
Summary:
Unlike other outlined regions in OpenMP, offloading entry points have to have be visible (external linkage) for the device side. Using dots in the names of the entries can be therefore problematic for some toolchains, e.g. NVPTX.

Also the patch drops the column information in the unique name of the entry points. The parsing of directives ignore unknown tokens, preventing several target  regions to be implemented in the same line. Therefore, the line information is sufficient for the name to be unique. Also, the preprocessor printer does not preserve the column information, causing offloading-entry detection issues if the host uses an integrated preprocessor and the target doesn't (or vice versa).

Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev

Subscribers: cfe-commits, fraggamuffin, caomhin

Differential Revision: http://reviews.llvm.org/D17179

llvm-svn: 260837
2016-02-13 23:35:10 +00:00
Benjamin Kramer 8fdba91bc9 Make the remaining headers self-contained.
llvm-svn: 259507
2016-02-02 14:24:21 +00:00
Reid Kleckner dc78f95dc2 Fix -Wmicrosoft-enum-value warning
llvm-svn: 257383
2016-01-11 20:55:16 +00:00
Nico Weber a2abe8c66b Fix -Wdocumentation warning after r256933
llvm-svn: 256960
2016-01-06 19:13:49 +00:00
Samuel Antao ee8fb302f5 [OpenMP] Reapply rL256842: [OpenMP] Offloading descriptor registration and device codegen.
This patch attempts to fix the regressions identified when the patch was committed initially. 

Thanks to Michael Liao for identifying the fix in the offloading metadata generation 
related with side effects in evaluation of function arguments. 
 

llvm-svn: 256933
2016-01-06 13:42:12 +00:00
Samuel Antao 7d5de9a1ee [OpenMP] Revert rL256842: [OpenMP] Offloading descriptor registration and device codegen.
It was causing two regression, so I'm reverting until the cause is found.

llvm-svn: 256858
2016-01-05 19:16:13 +00:00