Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics
about a lack of CUDA devices before an application runs:
```
$ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c
$ ./a.out
CUDA error: Error returned from cuInit
CUDA error: no CUDA-capable device is detected
Hello World: 4
```
This can happen when the CUDA plugin was built but all CUDA devices
are currently disabled in some manner, perhaps because
`CUDA_VISIBLE_DEVICES` is set to the empty string. As shown in the
above example, it can even happen when we haven't compiled the
application for offloading to CUDA.
The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp`
appears to be intended to handle this case, and it chooses not to
write a diagnostic to stderr unless debugging is enabled:
```
if (NumberOfDevices == 0) {
DP("There are no devices supporting CUDA.\n");
return;
}
```
The problem is that the above code is never reached because the
earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`. This patch handles
that `cuInit` case in the same manner as the above code handles the
`NumberOfDevices == 0` case.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130371
`CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to
`openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108878
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D108528
This patch fixes the error reported in D106751. When there is no CUDA SDK
installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE`
variables.
Using @zsrkmyn sugested fix
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106933
This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.
The modifications are as follows:
1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106751
[libomptarget] Add support for target allocators to dynamic cuda RTL
Follow on to D102000 which introduced new calls into libcuda. This patch adds
the corresponding entry points to dynamic_cuda, fixing the build for systems
that do not have the cuda toolkit installed.
Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D102169
[libomptarget][cuda] Handle missing _v2 symbols gracefully
Follow on from D95367. Dlsym the _v2 symbols if present, otherwise use the
unsuffixed version. Builds a hashtable for the check, can revise for zero
heap allocations later if necessary.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95415
[libomptarget][cuda] Gracefully handle missing cuda library
If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95412
[libomptarget][cuda] Gracefully handle missing cuda library
If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95412
[libomptarget][cuda] Call v2 functions explicitly
rtl.cpp calls functions like cuMemFree that are replaced by a macro
in cuda.h with cuMemFree_v2. This patch changes the source to use
the v2 names consistently.
See also D95104, D95155 for the idea. Alternatives are to use a mixture,
e.g. call the macro names and explictly dlopen the _v2 names, or to keep
the current status where the symbols are replaced by macros in both files
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95274
[libomptarget] Build cuda plugin without cuda installed locally
Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.
This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.
The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95155