Commit Graph

35 Commits

Author SHA1 Message Date
Ye Luo 03111e5e7a [OpenMP] Protect unrecogonized CUDA error code
If an error code can not be recognized by cuGetErrorString, errStr remains null and causes crashing at DP() printing.
Protect this case.

Reviewed By: jhuber6, tianshilei1992

Differential Revision: https://reviews.llvm.org/D87980
2020-09-21 13:43:08 -04:00
Joseph Huber ae209397b1 [OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins
Summary:
This patch starts adding support for adding information dumps to libomptarget
and rtl plugins. The information printing is controlled by the
LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this
patch is to provide the user with additional information about the device
during kernel execution and providing the user with information dumps in the
case of failure. This patch added the ability to dump the pointer mapping table
as well as printing the number of blocks and threads in the cuda RTL.

Reviewers: jdoerfort gkistanova	ye-luo

Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo

Tags: #OpenMP

Differential Revision: https://reviews.llvm.org/D87165
2020-09-09 12:03:56 -04:00
Joseph Huber ae95ceeb8f [OpenMP] Consolidate error handling and debug messages in Libomptarget
Summary:

This patch consolidates the error handling and messaging routines to a single
file omptargetmessage. The goal is to simplify the error handling interface
prior to adding more error handling support

Reviewers: jdoerfert grokos ABataev AndreyChurbanov ronlieb JonChesterfield ye-luo tianshilei1992

Subscribers: danielkiss guansong jvesely kerbowa nhaehnle openmp-commits sstefan1 yaxunl
2020-09-01 15:28:19 -04:00
Johannes Doerfert 5272d29e2c [OpenMP][CUDA] Keep one kernel list per device, not globally.
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86039
2020-08-16 14:38:35 -05:00
Johannes Doerfert aa27cfc1e7 [OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)
Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038
2020-08-16 14:38:33 -05:00
George Rokos 40470eb27a [libomptarget][NFC] Replace `%ld` with PRId64 for data of type int64_t.
The standard way of printing `int64_t` data is via the PRId64 macro, `ld`
is for `long int` and int64_t is not guaranteed to be typedef'ed as `long int`
on all platforms. E.g. on Windows we get mismatch warnings.

Differential Revision: https://reviews.llvm.org/D85353
2020-08-05 13:28:35 -07:00
Ye Luo c5348aecd7 [OpenMP] Use primary context in CUDA plugin
Summary:
Retaining per device primary context is preferred to creating a context owned by the plugin.

From CUDA documentation
1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA.  https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX

Two issues are addressed by this patch:
1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle
2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal"

Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags.

Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld

Reviewed By: jdoerfert

Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82718
2020-07-07 10:14:51 -04:00
Ye Luo 45bb073da8 [OpenMP] fix clang warning about printf format in CUDA plugin
Summary: Warnings are printed by clang when building LIBOMPTARGET_ENABLE_DEBUG=ON due incorrect format string.

Reviewers: tianshilei1992, jdoerfert

Reviewed By: tianshilei1992

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82789
2020-06-29 22:35:39 -04:00
Shilei Tian a014fbbc21 [OpenMP] Improve D2D memcpy to use more efficient driver API
Summary:
In current implementation, D2D memcpy is first to copy data back to host and then
copy from host to device. This is very efficient if the device supports D2D
memcpy, like CUDA.

In this patch, D2D memcpy will first try to use native supported driver API. If
it fails, fall back to original way. It is worth noting that D2D memcpy in this
scenerio contains two ideas:
- Same devices: this is the D2D memcpy in the CUDA context.
- Different devices: this is the PeerToPeer memcpy in the CUDA context.
My implementation merges this two parts. It chooses the best API according to
the source device and destination device.

Reviewers: jdoerfert, AndreyChurbanov, grokos

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D80649
2020-06-04 16:59:06 -04:00
Shilei Tian cb038927ef [OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices
Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D79255
2020-05-03 15:59:06 -04:00
Shilei Tian 4031bb982b [OpenMP] Refined CUDA plugin to put all CUDA operations into class
Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jfb, yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77951
2020-04-13 13:32:46 -04:00
Shilei Tian feed674dec [OpenMP] Introduce stream pool to make sure the correctness of device synchr...
...onization

Summary: In previous patch, in order to optimize performance, we only synchronize once
for each target region. The syncrhonization is via stream synchronization.
However, in the extreme situation, the performce might be bad. Consider the
following case: There is a task that requires transferring huge amount of data
(call many times of data transferring function). It is scheduled to the first
stream. And then we have 255 very light tasks scheduled to the remaining 255
streams (by default we have 256 streams). They can be finished before we do
synchronization at the end of the first task. Next, we get another very huge
task. It will be scheduled again to the first stream. Now the first task
finishes its kernel launch and call stream synchronization. Right now, the
stream already contains two kernels, and the synchronization will wait until the
two kernels finish instead of just the first one for the first task.

In this patch, we introduce stream pool. After each synchronization, the stream
will be returned back to the pool to make sure that for each synchronization,
only expected operations are waited.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77412
2020-04-11 07:08:56 -04:00
Shilei Tian 03ff643d2e [OpenMP] Put old APIs back and added new _async series for backward compatibility
Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77822
2020-04-09 22:40:58 -04:00
Shilei Tian 32ed29271f [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream
Summary:
This patch introduces two things for offloading:
1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info*`, which is a new struct that only has one field, `void *Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future.
2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation.

Reviewers: jdoerfert, ye-luo

Reviewed By: jdoerfert

Subscribers: yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77005
2020-04-07 14:55:47 -04:00
George Rokos 0a42c9bfe4 Enable CUDA offloading on aarch64 host
Differential Revision: https://reviews.llvm.org/D76469
2020-03-20 15:38:47 -07:00
Johannes Doerfert a5153dbc36 [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D74145
2020-02-11 22:07:14 -06:00
Kazuaki Ishizaki 4c6a098ad5 [OpenMP] NFC: Fix trivial typos in comments
Reviewers: jdoerfert, Jim

Reviewed By: Jim

Subscribers: Jim, mgorny, guansong, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72285
2020-01-07 14:05:03 +08:00
Ron Lieberman dc34b1c94d Test commit: adds a . to comment. NFC 2019-11-04 16:51:03 -06:00
Michael Kruse 78769ec403 [libomptarget] Harmonize emitting CUDA errors and general debug messages.
Ensures that CUDA fail reasons (such as "No CUDA-capable device detected")
are printed together with libomptarget's debug message
(e.g. "Error when setting CUDA context"). Previously, the former was
printed only in CMAKE_BUILD_TYPE=Debug builds while the latter was
enabled by LIBOMPTARGET_ENABLE_DEBUG.

With this change, also only call cuGetErrorString when the error will be
printed.

Suggested-by: Ye Luo <xw111luoye@gmail.com>

Differential Revision: https://reviews.llvm.org/D65687

llvm-svn: 367910
2019-08-05 19:12:10 +00:00
Gheorghe-Teodor Bercea aace6d285d [OpenMP][libomptarget] Add support for declare target to clause under unified memory
Summary:
This patch adds support for handling variables under the:

```
#pragma omp declare target to()
```

clause when the 

```
#pragma omp requires unified_shared_memory
```

is used.

The address of the host variable is copied into the device pointer just like for the declare target link case.

Reviewers: ABataev, caomhin, grokos, AlexEichenberger

Reviewed By: grokos

Subscribers: jcownie, guansong, jdoerfert, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D63106

llvm-svn: 363825
2019-06-19 15:48:10 +00:00
Gheorghe-Teodor Bercea c5fe030c16 [OpenMP][libomptarget] Enable usage of unified memory for declare target link variables
Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available.

Reviewers: ABataev, caomhin, grokos

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D60884

llvm-svn: 362505
2019-06-04 15:05:53 +00:00
Chandler Carruth 57b08b0944 Update more file headers across all of the LLVM projects in the monorepo
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351648
2019-01-19 10:56:40 +00:00
Jonas Hahnfeld bb51d39871 [libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI.
cuDeviceGetProperties has apparently been deprecated since CUDA 5.0.
Nvidia started using annotations only in CUDA 9.2, so nobody noticed
nor cared before.
The new function returns the same values, tested with a P100.

Differential Revision: https://reviews.llvm.org/D51624

llvm-svn: 341372
2018-09-04 15:13:28 +00:00
Alexey Bataev 2622e9e5b3 [OPENMP, NVPTX] Support several images in the executable.
Summary:
Currently Cuda plugin supports loading of the single image, though we
may have the executable with the several images, if it has target
regions inside of the dynamically loaded library. Patch allows to load
multiple images.

Reviewers: grokos

Subscribers: guansong, openmp-commits, kkwli0

Differential Revision: https://reviews.llvm.org/D49036

llvm-svn: 336569
2018-07-09 17:46:55 +00:00
Jonas Hahnfeld 65e0b8784c [CMake] Unify install path for libraries
Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands.
This also fixes installation of libomptarget-nvptx that previously
didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX.

Differential Revision: https://reviews.llvm.org/D47130

llvm-svn: 333284
2018-05-25 15:56:41 +00:00
Guansong Zhang e1c7a46d5b [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side
Summary:
Enable the device side debug messages at compile time, use env var to control at runtime.

To achieve this, an environment data block is passed to the device lib when it is loaded.

By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D46210

llvm-svn: 331550
2018-05-04 19:29:28 +00:00
Jonas Hahnfeld a349d4820c [libomptarget] Check for library with CUDA Driver API
That's what we really need to link the CUDA plugin against,
not the CUDA runtime API in CUDA_LIBRARIES! While the latter
comes with the CUDA SDK, the Driver API is installed with
the kernel driver and there is at most one per system. As
fallback we can use the stubs library distributed with the
CUDA SDK for linking.

Differential Revision: https://reviews.llvm.org/D42643

llvm-svn: 323787
2018-01-30 16:49:13 +00:00
Jonas Hahnfeld c189523529 [libomptarget] Only use CUDA Driver API
Use equivalents for the last calls to the Runtime API. Remove
stray assert in case of an error found during review, we should
only return OFFLOAD_FAIL.

Differential Revision: https://reviews.llvm.org/D42686

llvm-svn: 323786
2018-01-30 16:49:06 +00:00
Jonas Hahnfeld 5af381acad [CMake] Refactor common settings and flags
These are needed by both libraries, so we can do that in a
common namespace and unify configuration parameters.
Also make sure that the user isn't requesting libomptarget
if the library cannot be built on the system. Issue an error
in that case.

Differential Revision: https://reviews.llvm.org/D40081

llvm-svn: 319342
2017-11-29 19:31:48 +00:00
Sergey Dmitriev b305d26b57 [OpenMP] libomptarget: move debugging dumps under control of env var LIBOMPTARGET_DEBUG
Disable default debugging dumps for libomptarget and plugins and move dumps
under control of environment variable LIBOMPTARGET_DEBUG=<integer>. Dumps
are enabled when LIBOMPTARGET_DEBUG is set to a positive integer value.

Debugging dumps are available only in debug build; release build does not
support it.

Differential Revision: https://reviews.llvm.org/D33227

llvm-svn: 310841
2017-08-14 15:09:59 +00:00
George Rokos 1546d31924 [OpenMP] Changes in the plugin interface
This patch chagnes the plugin interface so that:
1) future plugins can take advantage of systems with shared CPU/device storage
2) instead of using base addresses, target regions are launched by providing target addresseds and base offsets explicitly.

Differential revision: https://reviews.llvm.org/D33028
 

llvm-svn: 302663
2017-05-10 14:12:36 +00:00
George Rokos c13df8e5e0 [OpenMP] Optimized default kernel launch parameters in CUDA plugin
Differential Revision: https://reviews.llvm.org/D32321

llvm-svn: 301321
2017-04-25 16:34:13 +00:00
George Rokos 01954092d0 [OpenMP] CUDA plugin: More descriptive error messages
Differential Revision: https://reviews.llvm.org/D31206

llvm-svn: 298527
2017-03-22 17:36:22 +00:00
George Rokos f3fe2dd235 [OpenMP] CUDA plugin: add include directory for libelf
Allow the user to manually specify where libelf is installed.

Differential Revision: https://reviews.llvm.org/D31207

llvm-svn: 298515
2017-03-22 16:41:46 +00:00
George Rokos 3de4cd1281 [OpenMP] Initial implementation of OpenMP offloading library - libomptarget plugins.
This is the patch upstreaming the plugins part of libomptarget (CUDA, generic-elf-64).

Differential Revision: https://reviews.llvm.org/D14253

llvm-svn: 293724
2017-02-01 00:14:41 +00:00