Commit Graph

306 Commits

Author SHA1 Message Date
Shilei Tian c060c634ef [OpenMP][NVPTX] Fix an error in configuring #teams and #threads
It must be a copy mistake.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111407
2021-10-08 11:07:43 -04:00
Jon Chesterfield 1bc3a6e41b [libomptarget] Reapply 2bc4d48a78 which was accidentally reverted 2021-10-07 20:17:48 +01:00
Jon Chesterfield 0c554a4769 [libomptarget] Move device environment to shared header, remove divergence
Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069
2021-10-07 12:03:48 +01:00
Michał Górny 0873b9bef4 [openmp] [elf_common] Fix linking against LLVM dylib
The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries.  Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038
2021-10-04 09:29:06 +02:00
Jon Chesterfield 05ba9ff6a6 [libomptarget][amdgpu] Refactor memory pool collection 2021-10-01 14:58:01 +01:00
Jon Chesterfield b75a7481ba [libomptarget] Apply D110029 to amdgpu
Use enum for execution mode.

This is partly a port from ROCm and partly a port from D110029. Attempted to
make the same choices as ROCm as far as comments etc go to reduce the merge
conflicts.

There is some cleanup warranted here - in particular I like the cuda patch
factoring out the comparisons into named variables - but I'd like to leave
that for a follow up patch, keeping this one minimal.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D110845
2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti 6226270253 [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.
Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye)

The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110679
2021-09-29 09:22:07 -07:00
Jon Chesterfield 2bc4d48a78 [libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal 2021-09-27 22:44:35 +01:00
Jon Chesterfield 738734f655 [libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv 2021-09-27 22:13:12 +01:00
Pushpinder Singh b1695c2eb8 [AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool
Keeping all the checks in one place for future simplification.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110513
2021-09-27 12:29:00 +00:00
Pushpinder Singh 9d0eb440ff [libomptarget][nfc][amdgpu] Reorder function to clarify review diff 2021-09-27 09:30:55 +00:00
Jon Chesterfield 726a34f063 [libomptarget][amdgpu] Replace dead exit call with returning error 2021-09-27 09:43:37 +01:00
Jon Chesterfield 8cf93a35d4 [libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109511
2021-09-26 15:34:21 +01:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
Shilei Tian 49e976c934 [OpenMP][NVPTX] Fix a warning that data argument not used by format string
Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D110104
2021-09-20 17:22:14 -04:00
Joseph Huber f1c821fa85 [OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006
2021-09-17 21:25:36 -04:00
Joseph Huber ec02c34b6d [OpenMP] Add additional fields to device environment
This patch adds fields for the device number and number of devices into
the device environment struct and debugging values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110004
2021-09-17 21:25:32 -04:00
Hansang Bae ae2a5facce [OpenMP][libomptarget] Minor fix in x86_64 plugin
Call to remove() was passing invalid address for the file name.

Differential Revision: https://reviews.llvm.org/D109846
2021-09-15 15:57:06 -05:00
Jon Chesterfield 6760234e8d [libomptarget][amdgpu] Precisely manage hsa lifetime
The hsa library must be initialized before any calls into it and
destructed after the last call into it. There have been a number of bugs in
this area related to member variables which would like to use raii to manage
resources acquired from hsa.

This patch moves the init/shutdown of hsa into a class, such that when used as
the first member variable (could be a base), the lifetime of other member
variables are reliably scoped within it. This will allow other classes to use
raii reliably when used as member variables within the global.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109512
2021-09-09 17:28:11 +01:00
Jon Chesterfield d642156f8f [libomptarget][nfc] Hoist hsa_init into rtl.cpp 2021-09-09 16:09:34 +01:00
Jon Chesterfield 3153bdd547 [libomptarget][amdgpu] Drop env variables
Use the same debug print as the rest of libomptarget plugins with
the same environment control. Also drop the max queue size debugging hook as
I don't believe it is still in use, can bring it back near the rest of the env
handling in rtl.cpp if someone objects.

That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify
control flow in a couple of places.

Behaviour change is that debug prints that used to use the old environment
variable now use the new one and print in slightly different format, and the
removal of the max queue size variable.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D108784
2021-09-02 11:02:39 +01:00
Jon Chesterfield f8bcbb82a7 [libomptarget] Normalise a cmake debug string, checking it triggers CI 2021-09-01 14:24:28 +01:00
Shilei Tian e8fdacfd81 [OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin
`CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to
`openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108878
2021-08-28 18:08:10 -04:00
Shilei Tian 29df4ab3f3 [OpenMP][Offloading] Add support for event related interfaces
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D108528
2021-08-28 16:24:14 -04:00
Jon Chesterfield 78f92c3810 [openmp][amdgpu] Initial gfx10 offloading implementation
Lets wavefront size be 32 for amdgpu openmp, as well as 64.

Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108708
2021-08-27 12:34:03 +01:00
Jon Chesterfield 3d85342982 [libomptarget][amdgpu][nfc] Rename variables, delete dead code 2021-08-26 19:58:38 +01:00
Jon Chesterfield 68ab93f4d7 [libomptarget][amdgpu][nfc] Rename source files 2021-08-26 18:29:44 +01:00
Jon Chesterfield ba0af885e7 [libomptarget][amdgpu][nfc] Make grid value access match devicertl 2021-08-25 15:11:19 +01:00
Jon Chesterfield 9b2c6c07b5 [libomptarget][amdgpu] Refactor debug printing
Move most debug printing in rtl.cpp behind DP() macro
Adjust the print output for gpu arch mismatch when the architectures match
Convert an assert into graceful failure

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108562
2021-08-25 14:57:51 +01:00
Jon Chesterfield ba8547775b [libomptarget][amdgpu] Fix debug build from D104696 2021-08-25 01:27:51 +01:00
Pushpinder Singh 9b8b7c1180 [AMDGPU][Libomptarget] Delete g_atl_machine global
With uses of g_atl_machine gone, a significant portion of dead
code has been removed.

This patch depends on D104691 and D104695.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104696
2021-08-24 07:59:40 +00:00
Jon Chesterfield 77579b99e9 [openmp][nfc] Replace OMPGridValues array with struct
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.

Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108339
2021-08-19 13:25:42 +01:00
Joseph Huber edb8acdc6e [Libomptarget] Correctly default to Generic if exec_mode is not present
Currently, the runtime returns an error when the `exec_mode` global is
not present. The expected behvaiour is that the region will default to
Generic. This prevents global constructors from being called because
they do not contain execution mode globals.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108255
2021-08-18 11:24:28 -04:00
Dimitry Andric 400cd6d2f0 [libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD
On FreeBSD, the `environ` symbol is undefined at link time for shared
libraries, but resolved by the dynamic linker at runtime. Therefore,
allow the symbol to be undefined when creating a shared library, by
using the `--allow-shlib-undefined` linker flag, instead of `-z defs`
(a.k.a `--no-undefined`).

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107698
2021-08-08 13:52:44 +02:00
Dimitry Andric 71ae2e0221 [libomptarget][amdgpu] don't declare Elf_Note on FreeBSD
On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note`
indirectly (via `<sys/elf_common.h>`). This results in compile errors
when building the libomptarget amdgpu plugin. Avoid redeclaring `struct
Elf_Note` on FreeBSD to fix the errors.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107661
2021-08-06 21:45:26 +02:00
Shilei Tian 28939b6ae5 [NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp 2021-08-05 22:32:28 -04:00
Jon Chesterfield a90da62adb [libomptarget][amdgpu] Update printed plugin name 2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz 88e66fa60a [OpenMP] Fixing missing variables when CUDA SDK not in system
This patch fixes the error reported in D106751. When there is no CUDA SDK
installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE`
variables.

Using @zsrkmyn sugested fix

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106933
2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz 313c523995 [OpenMP][Tool] Introducing the `llvm-omp-device-info` tool
This patch introduces the `llvm-omp-device-info` tool, which uses the
omptarget library and interface to query the device info from all the
available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`

Since omptarget usually requires a description structure with executable
kernels, I split the initialization of the RTLs and Devices to be able to
initialize all possible devices and query each of them.

This revision relies on the patch that introduces the print device info.

A limitation is that the order in which the devices are initialized, and the
corresponding device ID is not necesarily the one seen by OpenMP.

The changes are as follows:
1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.

Example Output:
```
Device (0):
    print_device_info not implemented

Device (1):
    print_device_info not implemented

Device (2):
    print_device_info not implemented

Device (3):
    print_device_info not implemented

Device (4):
    CUDA Driver Version:                11000
    CUDA Device Number:                 0
    Device Name:                        Quadro P1000
    Global Memory Size:                 4236312576 bytes
    Number of Multiprocessors:          5
    Concurrent Copy and Execution:      Yes
    Total Constant Memory:              65536 bytes
    Max Shared Memory per Block:        49152 bytes
    Registers per Block:                65536
    Warp Size:                          32 Threads
    Maximum Threads per Block:          1024
    Maximum Block Dimensions:           1024, 1024, 64
    Maximum Grid Dimensions:            2147483647 x 65535 x 65535
    Maximum Memory Pitch:               2147483647 bytes
    Texture Alignment:                  512 bytes
    Clock Rate:                         1480500 kHz
    Execution Timeout:                  Yes
    Integrated Device:                  No
    Can Map Host Memory:                Yes
    Compute Mode:                       DEFAULT
    Concurrent Kernels:                 Yes
    ECC Enabled:                        No
    Memory Clock Rate:                  2505000 kHz
    Memory Bus Width:                   128 bits
    L2 Cache Size:                      1048576 bytes
    Max Threads Per SMP:                2048
    Async Engines:                      Yes (2)
    Unified Addressing:                 Yes
    Managed Memory:                     Yes
    Concurrent Managed Memory:          Yes
    Preemption Supported:               Yes
    Cooperative Launch:                 Yes
    Multi-Device Boars:                 No
    Compute Capabilities:               61
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106752
2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz d2f85d0910 [OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget`
This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.

The modifications are as follows:
1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106751
2021-07-27 21:47:57 -04:00
Jon Chesterfield 2a613a7790 [libomptarget] Build amdgpu plugin without hsa
Default to building the amdgpu plugin to use dlopen when hsa is
not found instead of disabling it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106600
2021-07-26 09:54:51 +01:00
Jon Chesterfield dd0b463dd9 [libomptarget][amdgpu] More robust handling of failure to init HSA
If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.

This patch:
- deletes a print that called into hsa to ask why it can't call into hsa
- drops a merge conflict block next to that print
- reliably initializes number of devices to zero
- skips the plugin destructor contents if the constructor failed to init hsa

Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106774
2021-07-25 23:15:58 +01:00
Jon Chesterfield e3251f2ec4 Revert "[libomptarget] Build amdgpu plugin without hsa"
Inaccurate error handling around hsa_init

This reverts commit e30b3b23a4.
2021-07-25 21:03:51 +01:00
Jon Chesterfield e30b3b23a4 [libomptarget] Build amdgpu plugin without hsa
Default to building the amdgpu plugin to use dlopen when hsa is
not found instead of disabling it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106600
2021-07-25 19:33:36 +01:00
Abhinav Gaba f7c92995c0 [OpenMP] Fix CUDA plugin build after 3817ba13ae.
The build was broken on machines that don't have Cuda SDK installed.

See https://reviews.llvm.org/D106627 for the original discussion.
2021-07-23 16:50:00 +08:00
Joseph Huber 76c0c0ca86 [OpenMP][NFC] Fix formatting in CUDA plugin 2021-07-22 21:50:40 -04:00
Joseph Huber 3817ba13ae [OpenMP] Add environment variables to change stack / heap size in the CUDA plugin
This patch adds support for two environment variables to configure the device.
``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread
has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory
that can be allocated using malloc / free on the device.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106627
2021-07-22 21:40:02 -04:00
Jon Chesterfield 9e05c084e5 [libomptarget][amdgpu][nfc] Normalise license headers
Reviewed By: gregrodgers, jdoerfert

Differential Revision: https://reviews.llvm.org/D106581
2021-07-22 20:23:41 +01:00
Jon Chesterfield 14e34a83b0 [libomptarget][amdgpu][nfc] Replace use of gelf.h with libelf.h
AMDGPU can assume Elf64 so doesn't need to abstract over Elf32

Drop a few other unused headers at the same time. Now only llvm elf
and libelf are used by the plugin.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106579
2021-07-22 20:04:13 +01:00
Jon Chesterfield 1a96570621 [libomptarget][amdgpu] Implement dlopen of libhsa
AMDGPU plugin equivalent of D95155, build without HSA installed locally

Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that
exposes the same symbols that the plugin presently uses from hsa. The object
file contains dlopen of hsa and cached dlsym calls. Also provides header files
corresponding to the subset that is used.

This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off.
That allows developers to build against the dlopen/dlsym implementation, e.g.
while testing this mode.

Enabling by default will cause this plugin to build on a wider variety of
machines than it does at present so may break some CI builds. That risk can
be minimised by reviewing the header dependencies of the library and ensuring
it doesn't use any libraries that are not already used by libomptarget.

Separating the implementation from enabling by default in case the latter needs
to be rolled back after wider CI results.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106559
2021-07-22 16:54:10 +01:00
Joseph Huber a158d3663f [OpenMP] Fix warnings for uninitialized block counts
Summary:
Fixes some warning given for uninitialized block counts if the exection mode is
not recognized. This shouldn't happen in practice because the execution mode is
checked when it's read from the device.
2021-07-22 09:24:07 -04:00
Jon Chesterfield dc1f6f8b92 [libomptarget][amdgpu][nfc] Drop dead signal pool setup
This class is instantiated once in rtl.cpp before hsa_init is
called. The hsa_signal_create call therefore fails leaving the pool empty.

This signal pool is a legacy from ATMI where it was constructed after hsa_init.
Moving the state into the rtl.cpp global class disabled the initial populating
of the pool without noticeably changing performance. Just rechecked with a fix
that allocates the signals after hsa_init and that also doesn't noticeably
change performance.

This patch therefore drops the initialisation. Only change from main is to
drop a DEBUG_PRINT statement that would say the pool initial size is zero.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106515
2021-07-22 10:29:32 +01:00
Joseph Huber 7d57639264 [OpenMP] Add new execution mode for SPMD execution with Generic semantics
Qualified kernels can be transformed from generic-mode to SPMD mode using an
optimization in OpenMPOpt. This patch introduces a new execution mode to
indicate kernels that have been transformed from generic-mode to SPMD-mode.
These kernels have SPMD-mode execution, but need generic-mode semantics for
scheduling the blocks and threads. Without this far too few blocks will be
scheduled for a generic region as SPMD mode expects the trip count to be
divided by the number of threads.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D106460
2021-07-21 20:57:28 -04:00
Jon Chesterfield a733bbbd17 [libomptarget][amdgpu][nfc] Refactor #includes
Create a hsa_api.h header that includes the ROCr headers in use
Drop some unused headers and _cplusplus macros

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106455
2021-07-21 17:28:07 +01:00
Jon Chesterfield ddfb074a80 [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global
[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global

Folds some duplicates logic into a helper function, passes the new environment
struct into getLaunchVals which no longer reads the DeviceInfo global.

Implemented on top of D105237

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D105239
2021-07-06 17:06:38 +01:00
Atmn Patel 21e92612c0 [Libomptarget] Experimental Remote Plugin Fixes
D97883 introduced a compile-time error in the experimental remote offloading
libomptarget plugin, this patch fixes it and resolves a number of
inconsistencies in the plugin as well:

1. Non-functional Asynchronous API
2. Unnecessarily verbose debug printing
3. Misc. code clean ups

This is not intended to make any functional changes to the plugin.

Differential Revision: https://reviews.llvm.org/D105325
2021-07-02 12:38:34 -04:00
Jon Chesterfield db89414da4 [libomptarget][nfc] Move grid size computation
Change getLaunchVals to return the integers used for launch

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D105237
2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti 98c36f0079 Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size"
This reverts commit 2240b41ee4.
A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be
set to the default. The above change was made without this assumption.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105250
2021-06-30 17:15:00 -07:00
Jon Chesterfield 4b0926b044 [libomptarget][nfc] Replace out arguments with struct return
A step towards making this function adequately self contained that it
can be tested easily. No functional change intended here, left variable
names unchanged.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D105229
2021-06-30 22:40:07 +01:00
Jon Chesterfield d86b0073cf [libomptarget][amdgpu][nfc] Fix build warnings, drop some headers
Removes stdarg header, drops uses of iostream, fix some format string errors.
Also changes a C style struct to C++ style to avoid a warning from clang/

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D104923
2021-06-30 22:23:36 +01:00
Dhruva Chakrabarti e0b713a035 [libomptarget] [amdgpu] Change default number of teams per computation unit
This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D99003
2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti 2240b41ee4 [libomptarget] [amdgpu] Fix default setting of max flat workgroup size
When max flat workgroup size is not specified, it is set to the default
workgroup size. This prevents kernel launch with a workgroup size larger
than the default. The fix is to ignore a size of 0 and treat it as
unspecified.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105073
2021-06-29 13:47:24 -07:00
Pushpinder Singh 20df2c7052 [AMDGPU][Libomptarget] Collect allocatable memory pools using HSA
The logic is almost similar to that of system.cpp with one change that
instead of adding all the memory pools to a device struct it only
keeps a single pool. The existing approach also always allocated memory on
the first HSA pool found for a GPU.

This depends on D104691. The goal of this series of patches is to remove
_atl_machine global. The next patch will drop g_atl_machine entirely.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104695
2021-06-28 11:28:04 +00:00
Jon Chesterfield 96f6873dff [OpenMP][NFC] Drop unused headers from amdgpu plugin 2021-06-25 12:08:56 +01:00
Aakanksha Patil 3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Joseph Huber 422adaa879 [OpenMP] Add thread limit environment variable support to plugins
The OpenMP 5.1 standard defines the environment variable
`OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a
single block. This patch adds support for this into the AMDGPU and CUDA
plugins.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103923
2021-06-22 16:25:40 -04:00
Pushpinder Singh 9d110f9159 [AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp
Moving this method helps eliminate a use of g_atl_machine.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104691
2021-06-22 11:44:52 +00:00
Vyacheslav Zakharin aad9e48c5f [NFC][libomptarget] Remove redundant libelf dependency for elf_common.
Differential Revision: https://reviews.llvm.org/D104549
2021-06-21 07:19:55 -07:00
Pushpinder Singh 7a97cd9da7 [AMDGPU][Libomptarget] Remove redundant functions
There does not seem to be any use of these functions. They just
put the value to a local which is never used again.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104512
2021-06-21 06:13:24 +00:00
Vyacheslav Zakharin 836992ab9a [NFC][libomptarget] Build elf_common with PIC.
Differential Revision: https://reviews.llvm.org/D104545
2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin c5b7c7c8f7 [NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build.
Differential Revision: https://reviews.llvm.org/D104535
2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin b5c4fc0f23 [NFC][libomptarget] Reduce the dependency on libelf
This change-set removes libelf usage from elf_common part of the plugins.
libelf is still used in x86_64 generic plugin code and in some plugins
(e.g. amdgpu) - these will have to be cleaned up in separate checkins.

Differential Revision: https://reviews.llvm.org/D103545
2021-06-16 08:34:23 -07:00
Pushpinder Singh cadcaf3f46 [AMDGPU][Libomptarget] Drop dead code related to g_atl_machine
This patch includes some changes which deletes the code accessing
g_atl_machine global. Some accesses related to memory_pools are
still remaining.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103813
2021-06-15 05:21:35 +00:00
Ron Lieberman 91f147792e [libomptarget][amdgpu] Remove stray fprintf in rtl.cpp
remove unintended fprintf in rtl.cpp

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D104003
2021-06-10 01:57:30 +00:00
Brendon Cahoon 294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon 211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Pushpinder Singh 4f8bc7caf4 [AMDGPU][Libomptarget] Remove atlc global
This global struct used to hold various flags for monitoring the
initialization of hsa.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103795
2021-06-07 11:09:01 +00:00
Pushpinder Singh f5f329a371 [AMDGPU][Libomptarget] Rework logic for locating kernarg pools
Previous logic was to always use the first kernarg pool found to allocate
kernel args. This patch changes this to use only the kernarg pool which
has non-zero size. This logic is also reworked to not use any globals.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103600
2021-06-07 06:41:37 +00:00
Pushpinder Singh b25546a4b4 [AMDGPU][Libomptarget][NFC] Remove bunch of dead structs
Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103509
2021-06-02 10:40:51 +00:00
Pushpinder Singh 2368170a8d [AMDGPU][Libomptarget][NFC] Remove atmi_place_t
atmi_place_t has been replaced with int DeviceId.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103508
2021-06-02 10:35:28 +00:00
Pushpinder Singh fb113264a8 [AMDGPU][Libomptarget] Remove g_atmi_machine global
Turns out the only purpose of this class was verify if device ID
was in range or not which could be done easily by using g_atl_machine.

Still getting rid of g_atl_machine is pending which would be done in
a later patch.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103443
2021-06-01 12:34:24 +00:00
Pushpinder Singh 4fc3286951 [AMDGPU][Libomptarget][NFC] Split host and device malloc
This patch splits the code path for host and device malloc.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103389
2021-05-31 12:09:18 +00:00
Pushpinder Singh 8b79dfb302 [AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t
This struct was used to specify the device on which memory was
being allocated/free in atmi_malloc/free. It has now been replaced
with int DeviceId.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103239
2021-05-27 11:53:18 +00:00
Jon Chesterfield 2fdf8bbd19 [libomptarget][nfc][amdgpu] Factor out setting upper bounds
Refactor suggested in D103037 to help avoid similar copy-paste errors.
Change is mechanical. Some parts of this would be more robust with unsigned.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D103090
2021-05-26 19:57:49 +01:00
Jon Chesterfield c5c1ec7945 [libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable
Suggested in D103059. Use a single lookup instead of two, more const, less mutation.

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D103093
2021-05-26 19:25:25 +01:00
Jon Chesterfield 07f59baad6 [libomptarget][nfc][amdgpu] Remove atmi_status_t type
ATMI_STATUS_UNKNOWN was unused, deleted references to it.
Replaced ATMI_STATUS_{SUCCESS,ERROR} with HSA_STATUS_{SUCCESS,ERROR}
Replaced atmi_status_t with hsa_status_t

Otherwise no change. In particular, conversions between atmi_status_t and
hsa_status_t will now be conversions between hsa_status_t and itself.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D103115
2021-05-26 17:02:19 +01:00
Pushpinder Singh a2d6ef5876 [AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize
After D102847, these functions can be inlined.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103075
2021-05-26 10:50:08 +00:00
Pushpinder Singh cc8661ac4a [AMDGPU][Libomptarget] Delete g_atmi_initialized
This patch drops g_atmi_initialized and inlines the Initialize &
Finalize methods from Runtime class.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102847
2021-05-26 10:46:54 +00:00
Pushpinder Singh 7648b6978e [AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy
Two globals KernelInfoTable & SymbolInfoTable are moved
into RTLDeviceInfoTy class.
This builds on the top of D102691.
[2/2]

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102692
2021-05-26 10:02:28 +00:00
Jon Chesterfield df005fa364 [libomptarget][nfc] Move hostcall required test to rtl
[libomptarget][nfc] Move hostcall required test to rtl

Remove a global, fix minor race. First of N patches to bring up hostcall.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103058
2021-05-25 22:43:17 +01:00
Jon Chesterfield 75492e20fb [libomptarget][nfc] Accept callable for hsa iterate_symbols
[libomptarget][nfc] Accept callable for hsa iterate_symbols
Candidate refactor to simplify D102692

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D103030
2021-05-25 09:29:11 +01:00
Dhruva Chakrabarti 96d70f4d28 [libomptarget] [amdgpu] Added LDS usage to the kernel trace
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103059
2021-05-24 19:33:48 -07:00
Dhruva Chakrabarti ca17b26d4d [libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case.
Fix the case where NumTeams was set incorrectly instead of NumThreads

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D103037
2021-05-24 15:23:15 -07:00
Pushpinder Singh 486110eb41 [AMDGPU][Libomptarget] Remove global KernelNameMap
KernelNameMap contains entries like "key.kd" => key which clearly
could be replaced by simple logic of removing suffix from the key.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102691
2021-05-24 08:46:08 +00:00
Jon Chesterfield d18fb09c69 [libomptarget][amdgpu] Remove majority of fatal errors
[libomptarget][amdgpu] Remove majority of fatal errors

Replaces most calls to exit() with returning an error to the library entry
point. Minor changes to error handling for clear bugs, remove some dead code.

Each exit() call site replaced is either in a library entry point or a
function that already returns error codes on some paths. The existing handling
is not well tested but replacing exit() with a fallback path should be a strict
improvement.

Remaining two early exit points are an abort() from a callback and exit() from
within msgpack. Fixes for those are less obvious and left for a later patch.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102346
2021-05-20 16:26:43 +01:00
Pushpinder Singh d7503c3bce [AMDGPU][Libomptarget] Rename & move g_executables to private
This patch moves g_executables to private member of Runtime class
and is renamed to HSAExecutables following LLVM naming convention.

This movement required making Runtime::Initialize and Runtime::Finalize
non-static. Verified the correctness of this change by running
libomptarget tests on gfx906.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102600
2021-05-18 05:43:23 +00:00
Pushpinder Singh 3bc2b97b34 [AMDGPU][libomptarget] Remove unused global variables
This initial patch removes some unused variables from global namespace.
There will more incoming patches for moving global variables to classes
or static members.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D102598
2021-05-18 05:40:49 +00:00
Aakanksha Patil 464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
Jon Chesterfield b049870d3b [libomptarget][amdgpu] Convert an assert to print and offload_fail
[libomptarget][amdgpu] Convert an assert to print and offload_fail

The kernel launched is supposed to be present in the binary, but a not yet
diagnosed bug means it is missing for some of the qmcpack test cases. Changing
from assert to print and offload_fail should help diagnose that and similar bugs.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102378
2021-05-13 17:31:36 +01:00
Jon Chesterfield 9934571eab [libomptarget][amdgpu][nfc] Expand errorcheck macros
[libomptarget][amdgpu][nfc] Expand errorcheck macros

These macros expand to continue, which is confusing, or exit,
which is incompatible with continuing execution on offloading fail.

Expanding the macros in place makes the code look untidy but the
control flow obvious and amenable to improving. In particular, exit
becomes easier to eliminate.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102230
2021-05-12 17:30:41 +01:00
Jon Chesterfield dedca78d48 [libomptarget][nfc] Drop stringify in macro
[libomptarget][nfc] Drop stringify in macro
A step towards deleting the macros entirely.

Differential Revision: https://reviews.llvm.org/D102228
2021-05-11 12:19:55 +01:00
Jon Chesterfield 6da348569c [libomptarget] Add support for target allocators to dynamic cuda RTL
[libomptarget] Add support for target allocators to dynamic cuda RTL

Follow on to D102000 which introduced new calls into libcuda. This patch adds
the corresponding entry points to dynamic_cuda, fixing the build for systems
that do not have the cuda toolkit installed.

Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D102169
2021-05-10 15:27:50 +01:00
Pushpinder Singh 9586937ef5 [AMDGPU][OpenMP] Disable tests when amdgpu-arch fails
This patch prevents runtime tests running on systems without amdgpu.

Reviewed By: protze.joachim, tianshilei1992

Differential Revision: https://reviews.llvm.org/D102054
2021-05-10 07:37:27 +00:00
Joseph Huber a15f8589f4 [libomptarget] Add support for target memory allocators to cuda RTL
Summary:
The allocator interface added in D97883 allows the RTL to allocate shared and
host-pinned memory from the cuda plugin. This patch adds support for these to
the runtime.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D102000
2021-05-07 10:27:02 -04:00
Jon Chesterfield 7e9351b9de [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin
[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin

Drops an enum that was identical to a HSA one, localises some functions where
they were only called from one TU. Covers everything internalize + adce can
identify as dead, except for msgpack::dump which is useful when debugging.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102014
2021-05-06 23:16:32 +01:00
Joseph Huber 59b6849012 [OpenMP] Replace global InfoLevel with a reference to an internal one.
Summary:
This patch improves the implementation of D100774 by replacing the global
variable introduced with a function that returns a reference to an internal
one. This removes the need to define the variable in every plugin that uses it.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D101102
2021-04-23 09:43:46 -04:00
Joseph Huber 2b6f20082e [OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime
Summary:
This patch adds a new runtime function __tgt_set_info_flag that allows the
user to set the information level at runtime without using the environment
variable. Using this will require an extern function, but will eventually be
added into an auxilliary library for OpenMP support functions.

This patch required moving the current InfoLevel to a global variable which must
be instantiated by each plugin.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100774
2021-04-22 12:48:11 -04:00
Joseph Huber 29338459fb [OpenMP] Trim error messages in CUDA plugin
Summary:
Remove some of the error messages printed when the CUDA plugin fails. The current error messages can be confusing because they are the first error messages printed after the async stream finds an error. This means that the printed values aren't related to what caused the issue, but are simply the last asyncronous operation that succeeded on the device. Remove these as they can be misleading.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D99510
2021-03-29 12:20:19 -04:00
Joseph Huber 16064e71e9 [OpenMP] Reset async stream properly upon failure
Summary:
If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D99443
2021-03-26 19:05:06 -04:00
Jon Chesterfield 626a31de15 [libomptarget] Add register usage info to kernel metadata
Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D98829
2021-03-18 17:00:42 +00:00
Jon Chesterfield 7da76aaaf4 [libomptarget] Build amdgpu plugin by default
[libomptarget] Build amdgpu plugin by default

This will build the amdgpu plugin if cmake is able to find the hsa
runtime library, which will be the case if rocm is installed or if
the hsa library has been installed somewhere cmake looks.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D98654
2021-03-15 20:12:01 +00:00
George Rokos 2468fdd9af [libomptarget] Add allocator support for target memory
This patch adds the infrastructure for allocator support for target memory.
Three allocators are introduced for device, host and shared memory.
The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard.

Differential Revision: https://reviews.llvm.org/D97883
2021-03-13 03:47:07 -08:00
Johannes Doerfert 5449fbb5d4 [OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info`
Reviewed By: grokos, tianshilei1992

Differential Revision: https://reviews.llvm.org/D96444
2021-03-11 23:31:34 -06:00
Manoel Roemmer 542d9c2154 [libomptarget] Load images in order of registration
This makes sure that images are loaded in the order in which they are registered with libomptarget.

If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin).
In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95530
2021-02-24 18:15:41 +01:00
Ron Lieberman 30c0d5b4c3 [OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask)
allow bit masking to select various trace features.
  bit 0 => Launch tracing           (stderr)
  bit 1 => timing of runtime        (stdout)
  bit 2 => detailed launch tracing  (stderr)
  bit 3 => timing goes to stdout instead of stderr

  example: LIBOMPTARGET_KERNEL_TRACE=7     does it all
           LIBOMPTARGET_KERNEL_TRACE=5     Launch + details
           LIBOMPTARGET_KERNEL_TRACE=2     timings + launch to stderr
           LIBOMPTARGET_KERNEL_TRACE=10    timings + launch to stdout

Differential Revision: https://reviews.llvm.org/D96998
2021-02-19 06:47:22 -05:00
Jon Chesterfield 53d7fd3762 [libomptarget][amdgcn] Remove lookup of .language msgpack field 2021-02-17 23:02:16 +00:00
Johannes Doerfert ea9395716e [OpenMP][NFC] Clang format the libomptarget plugins
Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D96445
2021-02-16 15:37:46 -06:00
Johannes Doerfert ad94fce845 [OpenMP][NFC] Eliminate sign comparison warning via explicit casts
Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D96812
2021-02-16 15:37:41 -06:00
Jon Chesterfield 56c446a878 [libomptarget][amdgcn] Tolerate deadstripped device_state variable
[libomptarget][amdgcn] Tolerate deadstripped device_state variable

The device_state variable may have been deadstripped. Similar to
device_environment, leave detection of missing but used symbol to loader.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D96330
2021-02-09 16:29:53 +00:00
Jon Chesterfield 4756f76bce [libomptarget][amdgcn] Tolerate deadstripped env variable
[libomptarget][amdgcn] Tolerate deadstripped env variable

Discovered by Pushpinder. If the device_environment variable is unused
it can be deadstripped, in which case we should not abort due to it
missing. This change is safe in that a missing symbol which is actually
used can be reported by both linker and loader, and a missing unused
symbol is better deadstripped than left in the image.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D96329
2021-02-09 11:58:37 +00:00
Vyacheslav Zakharin 0fc90873b2 [libomptarget][NFC] Link plugins with threads support library due to std::call_once usage.
Differential Revision: https://reviews.llvm.org/D95572
2021-01-27 19:26:18 -08:00
Atmn Patel 8a77056256 [OpenMP][Libomptarget] Fix conditional in CMake for remote plugin
The remote offloading plugin's CMakeLists was trying to build if its
flag was enabled even if it didn't find gRPC/protobuf. The conditional
was wrong, it's fixed by this.

Differential Revision: https://reviews.llvm.org/D95574
2021-01-27 21:28:25 -05:00
Jon Chesterfield 653655040f [libomptarget][cuda] Handle missing _v2 symbols gracefully
[libomptarget][cuda] Handle missing _v2 symbols gracefully

Follow on from D95367. Dlsym the _v2 symbols if present, otherwise use the
unsuffixed version. Builds a hashtable for the check, can revise for zero
heap allocations later if necessary.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95415
2021-01-27 00:22:29 +00:00
Atmn Patel 810572cc96 [OpenMP][Libomptarget] Fix cmake error on remote plugin
Requiring 3.15 causes a build breakage, I'm sure none of the contents actually require
3.15 or above.

Differential Revision: https://reviews.llvm.org/D95474
2021-01-26 16:00:40 -05:00
Jon Chesterfield 7baff00eee [libomptarget][cuda] Gracefully handle missing cuda library
[libomptarget][cuda] Gracefully handle missing cuda library

If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95412
2021-01-26 20:43:07 +00:00
Jon Chesterfield fdeffd6fb0 [libomptarget][cuda] Only run tests when sure there is cuda available
[libomptarget][cuda] Only run tests when sure there is cuda available

Prior to D95155, building the cuda plugin implied cuda was installed locally.
With that change, every machine can build a cuda plugin, but they won't all have
cuda and/or an nvptx card installed locally.

This change enables the nvptx tests when either:
- libcuda is present
- the user has forced use of the dlopen stub

The default case when there is no cuda detected will no longer attempt to
run the tests on nvptx hardware, as was the case before D95155.

Reviewed By: jdoerfert, ronlieb

Differential Revision: https://reviews.llvm.org/D95467
2021-01-26 20:41:06 +00:00
Atmn Patel ec8f4a38c8 [OpenMP][Libomptarget] Introduce Remote Offloading Plugin
This introduces a remote offloading plugin for libomptarget. This
implementation relies on gRPC and protobuf, so this library will only
build if both libraries are available on the system. The corresponding
server is compiled to `openmp-offloading-server`.

This is a large change, but the only way to split this up is into RTL/server
but I fear that could introduce an inconsistency amongst them.

Ideally, tests for this should be added to the current ones that but that is
problematic for at least one reason. Given that libomptarget registers plugin
on a first-come-first-serve basis, if we wanted to offload onto a local x86
through a different process, then we'd have to either re-order the plugin list
in `rtl.cpp` (which is what I did locally for testing) or find a better
solution for runtime plugin registration in libomptarget.

Differential Revision: https://reviews.llvm.org/D95314
2021-01-26 15:33:38 -05:00
Jon Chesterfield 357eea6e8b Revert "[libomptarget][cuda] Gracefully handle missing cuda library"
This reverts commit fafd45c01f.
2021-01-26 03:14:53 +00:00
Jon Chesterfield fafd45c01f [libomptarget][cuda] Gracefully handle missing cuda library
[libomptarget][cuda] Gracefully handle missing cuda library

If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95412
2021-01-26 02:54:00 +00:00
Jon Chesterfield 95f0d1edaf [libomptarget] Compile with older cuda, revert D95274
[libomptarget] Compile with older cuda, revert D95274

Fixes regression reported in comments of D95274.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95367
2021-01-25 16:12:56 +00:00
Jon Chesterfield e5e448aafa [libomptarget][cuda] Fix build, change missed from D95274 2021-01-24 18:30:04 +00:00
Jon Chesterfield dc70c56be5 [libomptarget][amdgpu][nfc] Update comments
[libomptarget][amdgpu][nfc] Update comments

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95295
2021-01-23 22:53:58 +00:00
Jon Chesterfield 78b0630b72 [libomptarget][cuda] Call v2 functions explicitly
[libomptarget][cuda] Call v2 functions explicitly

rtl.cpp calls functions like cuMemFree that are replaced by a macro
in cuda.h with cuMemFree_v2. This patch changes the source to use
the v2 names consistently.

See also D95104, D95155 for the idea. Alternatives are to use a mixture,
e.g. call the macro names and explictly dlopen the _v2 names, or to keep
the current status where the symbols are replaced by macros in both files

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95274
2021-01-23 20:33:13 +00:00
Jon Chesterfield 47e95e87a3 [libomptarget] Build cuda plugin without cuda installed locally
[libomptarget] Build cuda plugin without cuda installed locally

Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.

This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.

The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95155
2021-01-23 00:15:04 +00:00
Joseph Huber 119a9ea13f [OpenMP] Fix failing test due to change in offloading flags
Summary:
Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95133
2021-01-21 14:09:36 -05:00
Jon Chesterfield 5d165f0b89 [libomptarget][amdgpu] Fix kernel launch tracing to match previous behavior
Restore control of kernel launch tracing to be >= 1 as it was before

export LIBOMPTARGET_KERNEL_TRACE=1

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94695
2021-01-14 18:13:22 +00:00
Shilei Tian 68ff52ffea [OpenMP] Fixed the link error that cannot find static data member
Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D94541
2021-01-12 16:48:28 -05:00
Jon Chesterfield 33e2494bea [libomptarget][amdgpu][nfc] Fix build on centos
[libomptarget][amdgpu][nfc] Fix build on centos

rtl.cpp replaced 224 with a #define from elf.h, but that
doesn't work on a centos 7 build machine with an old elf.h

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D94528
2021-01-12 19:40:03 +00:00
Shilei Tian bdd1ad5e5c [OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES
Some LLVM headers are generated by CMake. Before the installation,
LLVM's headers are distributed everywhere, some of which are in
`${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in
`${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in
`${LLVM_INSTALLATION_ROOT}/include/llvm`.

OpenMP now depends on LLVM headers. Some headers depend on headers generated
by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`,
we need to tell OpenMP where it can find those headers, especially those still
have not been copied/installed.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D94534
2021-01-12 14:32:38 -05:00
Shilei Tian 0871d6d516 [OpenMP] Move memory manager to plugin and make it a common interface
The lifetime of `libomptarget` and its opened plugins are not aligned
and it's hard for `libomptarget` to determine when the plugins are destroyed.
As a result, some issues (see D94256 for details) occur on some platforms.
Actually, if we take target memory as target resources, same as other resources,
such as CUDA streams, in each plugin, then the memory manager should also be in
the plugin. Also considering some platforms may want to opt out the feature, it
makes sense to move the memory manager to plugin, make it a common interface, and
let plguin developers determine whether they need it. This is what this patch does.
CUDA plugin is taken as example to show how to integrate it. In this way, we can
also get a bonus that different thresholds can be set for different platforms.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D94379
2021-01-11 21:33:42 -05:00
Shilei Tian a81c68ae6b [OpenMP] Take elf_common.c as a interface library
For now `elf_common.c` is taken as a common part included into
different plugin implementations directly via
`#include "../../common/elf_common.c"`, which is not a best practice. Since it
is simple enough such that we don't need to create a real library for it, we just
take it as a interface library so that other targets can link it directly. Another
advantage of this method is, we don't need to add the folder into header search
path which can potentially pollute the search path.

VE and AMD platforms have not been tested because I don't have target machines.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94443
2021-01-11 17:34:26 -05:00
Joseph Huber 2ce16810f2 [OpenMP] Always print error messages in libomptarget CUDA plugin
Summary:
Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device.

Reviewed by: jdoerfert

Differential Revision: https://reviews.llvm.org/D94263
2021-01-07 17:47:32 -05:00
Joseph Huber fe5d51a489 [OpenMP] Add using bit flags to select Libomptarget Information
Summary:
This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in.

Reviewers: jdoerfert JonChesterfield grokos

Differential Revision: https://reviews.llvm.org/D93727
2021-01-04 12:03:15 -05:00
Jon Chesterfield 7c59614394 [libomptarget][amdgpu] clang-format src/rtl.cpp 2020-12-09 19:45:51 +00:00
Jon Chesterfield c9bc414840 [libomptarget][amdgpu] Let default number of teams equal number of CUs 2020-12-09 19:35:34 +00:00
Jon Chesterfield e191d31159 [libomptarget][amdgpu] Robust handling of device_environment symbol 2020-12-09 19:21:51 +00:00
Jon Chesterfield cab9f69235 [libomptarget][amdgpu] Improve diagnostics on arch mismatch 2020-12-09 18:55:53 +00:00
Jon Chesterfield 71f4693020 [libomptarget][amdgpu] Add plumbing to call into hostrpc lib, if linked 2020-12-07 15:24:01 +00:00
Jon Chesterfield e1b8e8a1f4 [libomptarget][amdgpu] Skip device_State allocation when using bss global 2020-12-06 12:13:56 +00:00