Add OpenMP device runtime build support for the gfx1100, gfx1101,
gfx1102, and gfx1103 targets.
Differential Revision: https://reviews.llvm.org/D134465
These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
state. This will prevent us from using device destructors but should
remove some new bugs. In the future this interface should be changed
once these problems are addressed more correctly.
This reverts commit ed0f218115.
This reverts commit 2b7203a359.
Fixes#57536
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133997
Previous support for device memory allocators used a single free
routine and did not provide the original kind of the allocation. This is
problematic as some of these memory types required different handling.
Previously this was worked around using a map in runtime to record the
original kind of each pointer. Instead, this patch introduces new free
routines similar to the existing allocation routines. This allows us to
avoid a map traversal every time we free a device pointer.
The only interfaces defined by the standard are `omp_target_alloc` and
`omp_target_free`, these do not take a kind as `omp_alloc` does. The
standard dictates the following:
"The omp_target_alloc routine returns a device pointer that references
the device address of a storage location of size bytes. The storage
location is dynamically allocated in the device data environment of the
device specified by device_num."
Which suggests that these routines only allocate the default device
memory for the kind. So this has been changed to reflect this. This
change is somewhat breaking if users were using `omp_target_free` as
previously shown in the tests.
Reviewed By: JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D133053
Summary:
The AMDGPU and CUDA plugins now relies on the Object and Support
libraries. This patch adds them explicitly rather than hoping that they
share the symbols loaded from the standard `libomptarget`.
This patch replaces the dependency on `libelf` with LLVM's ELF support.
With this patch the user no-longer needs to have `libelf` on their
system to build and configure OpenMP offloading. The replacement is
mostly mechanical, with the exception of the hash table support which
was added in D131309.
Depends on D131309
Reviewed By: JonChesterfield, saiislam
Differential Revision: https://reviews.llvm.org/D131401
The `SHT_HASH` sections in an ELF are used to look up a symbol in the
symbol table using a symbol's name. This is done by obtaining the
`SHT_HASH` section and using its `sh_link` attribute to access the
associated symbol table, from which we can access the string table
containing the associated name. We can then search for the symbol using
the hash of the name and the buckets and chains in the hash table
itself
This patch adds utility functions that allow us to look up a symbol in
an ELF file by name. It will first attempt to look through the hash
tables, and then search the section tables manually if failed. This
allows us to pull out constants necessary for setting up offloading
without first loading the object.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131309
This patch adds support for the device memory type, this is currently equivalent
to the default type so it should be treated as the same.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D133128
The cuda plugin maps TARGET_ALLOC_HOST onto cuMemAllocHost
which is page locked host memory. Fine grain HSA memory is not
necessarily page locked but has the same read/write from host or
device semantics.
The cuda plugin does this per-gpu and this patch makes it accessible
from any gpu, but it can be locked down to match the cuda behaviour
if preferred.
Enabling tests requires an equivalent to
// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda
for amdgpu which doesn't seem to be in use yet.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D132660
if(TARGET amdgpu-arch) doesn't work when ENABLE_LLVM_PROJECTS=openmp because openmp subdirectory is processed before clang subdirectory. Adopt the same logic of enabling tests like the CUDA plugin.
Differential Revision: https://reviews.llvm.org/D132579
This patch replaces uses of `dlopen` and `dlsym` with LLVM's support
with `loadPermanentLibrary` and `getSymbolAddress`. This allows us to
remove the explicit dependency on the `dl` libraries in the CMake. This
removes another explicit dependency and solves an issue encountered
while building on Windows platforms. The one downside to this is that
the LLVM library does not currently support `dlclose` functionality, but
this could be added in the future.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131507
A previous patch made the destruction of the HSA plugin more
deterministic. However, there were still other global values that are not
handled this way. When attempting to call a destructor kernel, the
device would have already been uninitialized and we could not find the
appropriate kernel to call. This is because they were stored in global
containers that had their destructors called already. Merges this global
state into the rest of the info state by putting those global values
inside of the global pointer already allocated and deallocated by the
constructor and destructor. This should allow the AMDGPU plugin to
correctly identify the destructors if we were to run them.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131011
Moves DeviceInfo global to heap to accurately control lifetime.
Moves calls from libomptarget to deinit_plugin later, plugins need to stay
alive until very shortly before libomptarget is destructed.
Leaving the deinit_plugin calls where initially inserted hits use after
free from the dynamic_module.c offloading test (verified with valgrind
that the new location is sound with respect to this)
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D130714
Will allow plugins to migrate away from using global variables to
manage lifetime, which will fix a segfault discovered in relation to D127432
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D130712
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.
Depends on D127432
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D127769
This patch extends the is_valid_binary routine to also check if the
binary's target ID matches the one parsed from the system's runtime
environment.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
AMDGPU.
It also handles compatibility testing of target IDs of the image and
the enviornment.
Depends on D127432
Differential Revision: https://reviews.llvm.org/D127769
Previously we made `libomptarget` link as an LLVM library so we have
access to the LLVM core libraries. After the initial patch stuck we can
now apply the same changes to the plugins. This will allow us to use
LLVM in all of `libomptarget` when we have uses for them. In the future
this should allow us to remove the dependencies on `libelf`, `libffi`,
and `dl`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D130262
Libomptarget grew out of a project that was originally not in LLVM. As
we develop libomptarget this has led to an increasingly large clash
between the naming conventions used. This patch fixes most of the
variable names that did not confrom to the LLVM standard, that is
`VariableName` for variables and `functionName` for functions.
This patch was primarily done using my editor's linting messages, if
there are any issues I missed arising from the automation let me know.
Reviewed By: saiislam
Differential Revision: https://reviews.llvm.org/D128997
We are planning on making LTO the default compilation mode for
offloading. In order to make sure it works we should run these tests on
the test suite. AMDGPU already uses the LTO compilation path for its
linking, but in LTO mode it also links the static library late.
Performing LTO requires the static library to be built, if we make the
change this will be a hard requirement and the old bitcode library will
go away. This means users will need to use either a two-step build or a
runtimes build for libomptarget.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D127512
This patchs adds the arguments necessary to allocate the size of the
dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE`
environment variable. This patch only allocates the memory, AMDGPU has a
limitation that shared memory can only be accessed from the kernel
directly. So this will currently only work with optimizations to inline
the accessor function.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D125252
We used to globally include the libomptarget include directory for all
projects. This caused some conflicts with the other files named
"Debug.h". This patch changes the cmake to include these files via the
target include instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D125563
Previously an opt-in flag `-fopenmp-new-driver` was used to enable the
new offloading driver. After passing tests for a few months it should be
sufficiently mature to flip the switch and make it the default. The new
offloading driver is now enabled if there is OpenMP and OpenMP
offloading present and the new `-fno-openmp-new-driver` is not present.
The new offloading driver has three main benefits over the old method:
- Static library support
- Device-side LTO
- Unified clang driver stages
Depends on D122683
Differential Revision: https://reviews.llvm.org/D122831
In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D123177
`LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for
multiple times redundantly. This patch is simply a clean up.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D121055
This patch enables running the new driver tests for AMDGPU. Previously
this was disabled because some tests failed. This was only because the
new driver tests hadn't been listed as unsupported or expected to fail.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D119240
This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D118934
This patch adds a new target to the tests to run using the new driver as
the method for generating offloading code.
Depends on D116541
Differential Revision: https://reviews.llvm.org/D118637
I missed the async info parameter in the first version of this API.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115887
This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115771
and synchronous kernel launch implementations into a single
synchronous version. This patch prepares the plugin for asynchronous
implementation by:
Privatizing actual kernel launch code (valid in both cases) into
an anonymous namespace base function (submitted at D115267)
- Separating the control flow path of asynchronous and synchronous
kernel launch functions** (this diff)
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115273