Commit Graph

157 Commits

Author SHA1 Message Date
James Y Knight 9871db064d [opaque pointer types] Pass function types for runtime function calls.
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.

Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.

Differential Revision: https://reviews.llvm.org/D57668

llvm-svn: 353184
2019-02-05 16:42:33 +00:00
Michael Kruse 251e1488e1 [OpenMP 5.0] Parsing/sema support for "omp declare mapper" directive.
This patch implements parsing and sema for "omp declare mapper"
directive. User defined mapper, i.e., declare mapper directive, is a new
feature in OpenMP 5.0. It is introduced to extend existing map clauses
for the purpose of simplifying the copy of complex data structures
between host and device (i.e., deep copy). An example is shown below:

    struct S {  int len;  int *d; };
    #pragma omp declare mapper(struct S s) map(s, s.d[0:s.len]) // Memory region that d points to is also mapped using this mapper.

Contributed-by: Lingda Li <lildmh@gmail.com>

Differential Revision: https://reviews.llvm.org/D56326

llvm-svn: 352906
2019-02-01 20:25:04 +00:00
Alexey Bataev e4e9ba2bea [OPENMP][NVPTX]Emit service debug variable for NVPTX.
In case of the empty module, the ptxas tool may emit error message about
empty debug info sections. This patch fixes this bug.

llvm-svn: 352421
2019-01-28 20:03:02 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Alexey Bataev 7bb3353f6a [OPENMP]Add call to __kmpc_push_target_tripcount() function.
Each we create the target regions with the teams distribute inner
region, we can better estimate number of the teams required to execute
the target region. Function __kmpc_push_target_tripcount() is used for
purpose, which accepts device_id and the number of the iterations,
performed by the associated loop.

llvm-svn: 350571
2019-01-07 21:30:43 +00:00
Alexey Bataev 25d3de8a0a [OPENMP][NVPTX]Reduce number of barriers in reductions.
After the fix for the syncthreads we don't need to generate extra
barriers for the parallel reductions.

llvm-svn: 350530
2019-01-07 15:45:09 +00:00
Alexey Bataev 8e009036c9 [OPENMP][NVPTX]Use new functions from the runtime library.
Updated codegen to use the new functions from the runtime library.

llvm-svn: 350415
2019-01-04 17:25:09 +00:00
Alexey Bataev a3924b517e [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead of
nvvm_barrier0.

Use runtime functions instead of the direct call to the nvvm intrinsics.
It allows to prevent some dangerous LLVM optimizations, that breaks the
code for the NVPTX target.

llvm-svn: 350328
2019-01-03 16:25:35 +00:00
Alexey Bataev 6a1b06bcd4 [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes
buffer.

Seems to me, nvlink has a bug with the proper support of the weakly
linked symbols. It does not allow to define several shared memory buffer
with the different sizes even with the weak linkage. Instead we always
use 128 bytes buffer to prevent nvlink from the error message emission.

llvm-svn: 349540
2018-12-18 21:01:42 +00:00
Alexey Bataev 29d47fcb30 [OPENMP][NVPTX]Added extra sync point to the inter-warp copy function.
The parallel reduction operation requires an extra synchronization point
in the inter-warp copy function to avoid divergence.

llvm-svn: 349525
2018-12-18 19:20:15 +00:00
Alexey Bataev ae51b96f99 [OPENMP][NVPTX]Improved interwarp copy function.
Inlined runtime with the current implementation of the interwarp copy
function leads to the undefined behavior because of the not quite
correct implementation of the barriers. Start using generic
__kmpc_barier function instead of the custom made barriers.

llvm-svn: 349192
2018-12-14 21:00:58 +00:00
Raphael Isemann b23ccecbb0 Misc typos fixes in ./lib folder
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned`

Reviewers: teemperor

Reviewed By: teemperor

Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D55475

llvm-svn: 348755
2018-12-10 12:37:46 +00:00
Alexey Bataev 6393eb7ec6 [OPENMP][NVPTX] Fix globalization of the mapped array sections.
If the array section is based on pointer and this sections is mapped in
target region + then it is used in the inner parallel region, it also
must be globalized as the pointer itself is passed by value, not by
reference.

llvm-svn: 348492
2018-12-06 15:35:13 +00:00
Alexey Bataev 2c1ff9dda7 [OPENMP][NVPTX]Fixed emission of the critical region.
Critical regions in NVPTX are the constructs, which, generally speaking,
are not supported by the NVPTX target. Instead we're using special
technique to handle the critical regions. Currently they are supported
only within the loop and all the threads in the loop must execute the
same critical region.
Inside of this special regions the regions still must be emitted as
critical, to avoid possible data races between the teams +
synchronization must use __kmpc_barrier functions.

llvm-svn: 348272
2018-12-04 15:25:01 +00:00
Alexey Bataev c3028cac24 [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.
__kmpc_barrier runtime functions must be marked as convergent to prevent
some dangerous optimizations. Also, for NVPTX target all barriers must
be emitted as simple barriers.

llvm-svn: 348271
2018-12-04 15:03:25 +00:00
Alexey Bataev 3ce5d827fc [OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker after
initialization.

Function __kmpc_global_thread_num  should be called only after
initialization, not earlier.

llvm-svn: 347919
2018-11-29 21:21:32 +00:00
Gheorghe-Teodor Bercea 2b40470c61 [OpenMP] Add a new version of the SPMD deinit kernel function
Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D54970

llvm-svn: 347915
2018-11-29 20:53:49 +00:00
Alexey Bataev a116602475 [OPENMP][NVPTX]Basic support for reductions across the teams.
Added basic codegen support for the reductions across the teams.

llvm-svn: 347715
2018-11-27 21:24:54 +00:00
Alexey Bataev e8ad4b7124 [OPENMP][NVPTX]Emit default locations with the correct Exec|Runtime
modes.

If the region is inside target|teams|distribute region, we can emit the
locations with the correct info for execution mode and runtime mode.
Patch adds this ability to the NVPTX codegen to help the optimizer to
produce better code.

llvm-svn: 347583
2018-11-26 18:37:09 +00:00
Alexey Bataev ceeaa48052 [OPENMP][NVPTX]Emit default locations as constant with undefined mode.
For the NVPTX target default locations should be emitted as constants +
additional info must be emitted in the reserved_2 field of the ident_t
structure. The 1st bit controls the execution mode and the 2nd bit
controls use of the lightweight runtime. The combination of the bits for
Non-SPMD mode + lightweight runtime represents special undefined mode,
used outside of the target regions for orphaned directives or functions.
Should allow and additional optimization inside of the target regions.

llvm-svn: 347425
2018-11-21 21:04:34 +00:00
Patrick Lyster 8f7f586e53 [OpenMP] Check target architecture supports unified shared memory for requires directive. Differential Review: https://reviews.llvm.org/D54493
llvm-svn: 347214
2018-11-19 15:09:33 +00:00
David L. Jones 085ec01d6b Fix unused variable warning.
llvm-svn: 347133
2018-11-17 04:48:54 +00:00
Alexey Bataev f2f39be9ed [OPENMP][NVPTX]Emit correct reduction code for teams/parallel
reductions.

Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.

llvm-svn: 347081
2018-11-16 19:38:21 +00:00
Alexey Bataev 8bcc69c054 [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.
If the statements between target|teams|distribute directives does not
require execution in master thread, like constant expressions, null
statements, simple declarations, etc., such construct can be xecuted in
SPMD mode.

llvm-svn: 346551
2018-11-09 20:03:19 +00:00
Alexey Bataev 09c9eea78f [OPENMP][NVPTX]Allow to use shared memory for the
target|teams|distribute variables.

If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.

llvm-svn: 346507
2018-11-09 16:18:04 +00:00
Alexey Bataev 1fc1f8e819 [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.
Coalesced memory access requires use of the new function
`__kmpc_data_sharing_coalesced_push_stack` instead of the
`__kmpc_data_sharing_push_stack`.

llvm-svn: 345991
2018-11-02 16:08:31 +00:00
Alexey Bataev e40901806f [OPENMP][NVPTX]Improve emission of the globalized variables for
target/teams/distribute regions.

Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.

llvm-svn: 345978
2018-11-02 14:54:07 +00:00
Alexey Bataev 6070542296 [OPENMP] Support for mapping of the lambdas in target regions.
Added support for mapping of lambdas in the target regions. It scans all
the captures by reference in the lambda, implicitly maps those variables
in the target region and then later reinstate the addresses of
references in lambda to the correct addresses of the captured|privatized
variables.

llvm-svn: 345609
2018-10-30 15:50:12 +00:00
Gheorghe-Teodor Bercea e92567601b [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for
Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53448

llvm-svn: 345509
2018-10-29 15:45:47 +00:00
Gheorghe-Teodor Bercea 669dbde7a5 [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.
Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53443

llvm-svn: 345507
2018-10-29 15:23:23 +00:00
Alexey Bataev 93a38d60be [OPENMP][NVPTX]Increment iterator only when it is used, NFC.
llvm-svn: 344574
2018-10-16 00:09:06 +00:00
Alexey Bataev 4ac58d1a4b [OPENMP][NVPTX]Reduce memory usage in target region.
Additional reduction of the global memory usage in the target regions
without parallel regions.

llvm-svn: 344413
2018-10-12 20:19:59 +00:00
Alexey Bataev 9bfe91da3d [OPENMP][NVPTX]Reduce memory usage in orphaned functions.
if the function has globalized variables and called in context of
target/teams/distribute regions, it does not need to globalize 32
copies of the same variables for memory coalescing, it is enough to
have just one copy, because there is parallel region.
Patch does this by adding call for `__kmpc_parallel_level` function and
checking its return value. If the code sees that the parallel level is
0, then only one variable is allocated, not 32.

llvm-svn: 344356
2018-10-12 16:04:20 +00:00
Alexey Bataev ff23bb6622 [OPENMP][NVPTX]Reduce memory use for globalized vars in
target/teams/distribute regions.

Previously introduced globalization scheme that uses memory coalescing
scheme may increase memory usage fr the variables that are devlared in
target/teams/distribute contexts. We don't need 32 copies of such
variables, just 1. Patch reduces memory use in this case.

llvm-svn: 344273
2018-10-11 18:30:31 +00:00
Alexey Bataev 9ea3c38597 [OPENMP][NVPTX] Support memory coalescing for globalized variables.
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.

llvm-svn: 344049
2018-10-09 14:49:00 +00:00
Alexey Bataev 6bc2732f71 [OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMD
mode.

__kmpc_global_thread_num() should be called before initialization of the
runtime.

llvm-svn: 343857
2018-10-05 15:27:47 +00:00
Alexey Bataev fd006c44ee [OPENMP] Fix emission of the __kmpc_global_thread_num.
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.

llvm-svn: 343856
2018-10-05 15:08:53 +00:00
Jonas Hahnfeld 3ca4701d35 [OpenMP][NVPTX] Simplify codegen for orphaned parallel, NFCI.
Worker threads fork off to the compiler generated worker function
directly after entering the kernel function. Hence, there is no
need to check whether the current thread is the master if we are
outside of a parallel region (neither SPMD nor parallel_level > 0).

Differential Revision: https://reviews.llvm.org/D52732

llvm-svn: 343618
2018-10-02 19:12:54 +00:00
Alexey Bataev e79451a517 [OPENMP][NVPTX] Handle `requires datasharing` flag correctly with
lightweight runtime.

The datasharing flag must be set to `1` when executing SPMD-mode compatible directive with reduction|lastprivate clauses.

llvm-svn: 343492
2018-10-01 16:20:57 +00:00
Alexey Bataev 4da1d5c33f [OPENMP] Simplify code, NFC.
llvm-svn: 343483
2018-10-01 14:40:06 +00:00
Gheorghe-Teodor Bercea 8233af90e1 [OpenMP] Make default parallel for schedule in NVPTX target regions in SPMD mode achieve coalescing
Summary: Set default schedule for parallel for loops to schedule(static, 1) when using SPMD mode on the NVPTX device offloading toolchain to ensure coalescing.

Reviewers: ABataev, Hahnfeld, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52629

llvm-svn: 343260
2018-09-27 20:29:00 +00:00
Gheorghe-Teodor Bercea 02650d4c2c [OpenMP] Make default distribute schedule for NVPTX target regions in SPMD mode achieve coalescing
Summary: For the OpenMP NVPTX toolchain choose a default distribute schedule that ensures coalescing on the GPU when in SPMD mode. This significantly increases the performance of offloaded target code and reduces the number of registers used on the GPU side.

Reviewers: ABataev, caomhin, Hahnfeld

Reviewed By: ABataev, Hahnfeld

Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D52434

llvm-svn: 343253
2018-09-27 19:22:56 +00:00
Kelvin Li 1408f91a25 [OPENMP] Add support for OMP5 requires directive + unified_address clause
Add support for OMP5.0 requires directive and unified_address clause.
Patches to follow will include support for additional clauses.

Differential Revision: https://reviews.llvm.org/D52359

llvm-svn: 343063
2018-09-26 04:28:39 +00:00
Alexey Bataev 2adecff1aa [OPENMP][NVPTX] Enable support for lastprivates in SPMD constructs.
Previously we could not use lastprivates in SPMD constructs, patch
allows supporting lastprivates in SPMD with uninitialized runtime.

llvm-svn: 342738
2018-09-21 14:22:53 +00:00
Alexey Bataev bd8ff9bd70 [OPENMP] Fix PR38710: static functions are not emitted as implicitly
'declare target'.

All the functions, referenced in implicit|explicit target regions must
be emitted during code emission for the device.

llvm-svn: 341093
2018-08-30 18:56:11 +00:00
Alexey Bataev 80a9a61ded [OPENMP][NVPTX] Add options -f[no-]openmp-cuda-force-full-runtime.
Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use
of the full runtime for OpenMP offloading to CUDA devices.

llvm-svn: 341073
2018-08-30 14:45:24 +00:00
Alexey Bataev 8d8e1235ab [OPENMP][NVPTX] Add support for lightweight runtime.
If the target construct can be executed in SPMD mode + it is a loop
based directive with static scheduling, we can use lightweight runtime
support.

llvm-svn: 340953
2018-08-29 18:32:21 +00:00
Alexey Bataev 97b722121e [OPENMP] Fix processing of declare target construct.
The attribute marked as inheritable since OpenMP 5.0 supports it +
additional fixes to support new functionality.

llvm-svn: 339704
2018-08-14 18:31:20 +00:00
Stephen Kelly f2ceec4811 Port getLocStart -> getBeginLoc
Reviewers: teemperor!

Subscribers: jholewinski, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D50350

llvm-svn: 339385
2018-08-09 21:08:08 +00:00
Alexey Bataev 8521ff6ec4 [OPENMP] ThreadId in serialized parallel regions is 0.
The first argument for the parallel outlined functions, called as
serialized parallel regions, should be a pointer to the global thread id
that always is 0.

llvm-svn: 337957
2018-07-25 20:03:01 +00:00