Summary:
This patch changes the `exports` file to export all `__tgt_rtl`
functions. This is a better option as not each plugin implements all of
these functions, furthermore any new functions added will be
automatically included.
This reverts commit 096f93e73d.
Revert "[Libomptarget] Make the plugins ingore undefined exported symbols"
This reverts commit 3f62314c23.
Revert "[LLD] Enable --no-undefined-version by default."
This reverts commit 7ec8b0d162.
Three commits are reverted because of the current omp build fail
with GNU ld. See discussion here: https://reviews.llvm.org/rG096f93e73dc3
Summary:
Recent changes made the default behaviour to error when given an
undefined symbol in a version script. A previous patch fixed this for
`libomptarget` by removing the single undefined symbol. However, the
plguins are expected to only define a subset of the availible functions
so we shouldn't treat it as an error. This patch updates the build flags
to work appropriately.
Summary:
A recent patch made undefined symbols in version scripts cause errors by
default. The `omp_get_interop_rc_desc` function is declared but not
defined, so it is undefined in the final link unit. This patch removes
it from the exports list, it should be added back in when actually
defined and used.
File-level dependency should not be used on files generated during the build. The next command may execute before the generating command finishes writing the file. Use add_custom_target and use target-level dependency.
Differential Revision: https://reviews.llvm.org/D135630
These debugging definitions are no longer used in the new runtime. The
old runtime has been removed since Clang-14 so we can safely get rid of
these leftover variables.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D135452
Summary:
This patch just cleans up the unused flags in the DeviceRTL. These
should no longer be necessary or are redundant. Also add the extract
tool and packager to the check and error message if not found. This will
make it easier to tell if they are not present.
The shared memory stack in the device runtime assumes no intervined uses.
D135037 breaks the assumption, potentially causing the shared stack corruption.
This patch moves the thread array to heap memory. Since it is already the slow
path, it doesn't matter that much anyway.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D135391
A previous patch merged the static and bitcode versions of the
deviceRTL. We previously used the static library's separate compilation
to set a special flag that prevented `IsSPMDMode` from being put in the
used list and preventing it from being optimized out. When they were
merged we could no longer do this separate compilation that allowed
users of LTO to get more optimal code.
This patch rearranges the code. The `IsSPMDMode` global is now
transitively used by its inclusion in the changed `__keep_alive`
function. This allows us to then manually delete the `__keep_alive`
function from the module when building the static library via
`llvm-extract`. The result is that the bitcode library correctly will
maintain the needed shared state, while the static library will be able
to internalize it and optimize it out.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D135280
If we have thread states, the program is going to be rather slow. If we
don't, we want to avoid wasting shared memory. This patch introduces a
slight penalty (malloc + indirection) for the slow path and reduces
resource usage for the fast path.
Differential Revision: https://reviews.llvm.org/D135037
We should use OpenMP atomics but they don't take variable orderings.
Maybe we should expose all of this in the header but that solves only
part of the problem anyway.
Differential Revision: https://reviews.llvm.org/D135036
It is data mapping ordering problem.
According omp spec
If one or more map clauses are present, the list item conversions that
are performed for any use_device_ptr or use_device_addr clause occur
after all variables are mapped on entry to the region according to those
map clauses.
The change is to put mapping data for use_device_addr at end of data
mapping array.
Differential Revision: https://reviews.llvm.org/D134556
Add OpenMP device runtime build support for the gfx1100, gfx1101,
gfx1102, and gfx1103 targets.
Differential Revision: https://reviews.llvm.org/D134465
This patch add codegen support for the has_device_addr clause. It use
the same logic of is_device_ptr. But passing &var instead pointer to var
to kernal.
Differential Revision: https://reviews.llvm.org/D134268
Summary: This patch add codegen support for the has_device_addr clause. It
use the same logic of is_device_ptr.
Differential Revision: https://reviews.llvm.org/D134186
These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
state. This will prevent us from using device destructors but should
remove some new bugs. In the future this interface should be changed
once these problems are addressed more correctly.
This reverts commit ed0f218115.
This reverts commit 2b7203a359.
Fixes#57536
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133997
This patch changes the CMake to instead embed the already generated
LLVM-IR bitcode library into an object file to create the static
library. This is different from the previous method which generated them
separately. This will make the build faster and allow us to perform the
same internalization into a single library we do with the bitcode
library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133952
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107
Previous support for device memory allocators used a single free
routine and did not provide the original kind of the allocation. This is
problematic as some of these memory types required different handling.
Previously this was worked around using a map in runtime to record the
original kind of each pointer. Instead, this patch introduces new free
routines similar to the existing allocation routines. This allows us to
avoid a map traversal every time we free a device pointer.
The only interfaces defined by the standard are `omp_target_alloc` and
`omp_target_free`, these do not take a kind as `omp_alloc` does. The
standard dictates the following:
"The omp_target_alloc routine returns a device pointer that references
the device address of a storage location of size bytes. The storage
location is dynamically allocated in the device data environment of the
device specified by device_num."
Which suggests that these routines only allocate the default device
memory for the kind. So this has been changed to reflect this. This
change is somewhat breaking if users were using `omp_target_free` as
previously shown in the tests.
Reviewed By: JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D133053
Sumnmary:
A previous patch introduces an `exports` file which contains all the
symbol names that are not internalized in the bitcode library. This is
done to reduce the size of the bitcode library and only export needed
functions. This export file must contain all the functoins expected to
be called from the device. Since its introduction the `__assert_fail`
function used to be provided but was mistakenly not included. This patch
adds it.
Fixes#57656
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D133594
Summary:
The AMDGPU and CUDA plugins now relies on the Object and Support
libraries. This patch adds them explicitly rather than hoping that they
share the symbols loaded from the standard `libomptarget`.
In OpenMP 5.2, §5.8.6, page 160 line 32-33, when a device pointer
allocated by omp_target_alloc has implicitly been included on a target
construct as a zero-length array, the pointer initialisation should not
find a matching mapped list item, and so should retain its value as a
firstprivate variable. Previously, we would return a null pointer if the
list item was not found. This patch updates the map handling to the
OpenMP 5.2 semantics.
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D133447
This patch replaces the dependency on `libelf` with LLVM's ELF support.
With this patch the user no-longer needs to have `libelf` on their
system to build and configure OpenMP offloading. The replacement is
mostly mechanical, with the exception of the hash table support which
was added in D131309.
Depends on D131309
Reviewed By: JonChesterfield, saiislam
Differential Revision: https://reviews.llvm.org/D131401
The `SHT_HASH` sections in an ELF are used to look up a symbol in the
symbol table using a symbol's name. This is done by obtaining the
`SHT_HASH` section and using its `sh_link` attribute to access the
associated symbol table, from which we can access the string table
containing the associated name. We can then search for the symbol using
the hash of the name and the buckets and chains in the hash table
itself
This patch adds utility functions that allow us to look up a symbol in
an ELF file by name. It will first attempt to look through the hash
tables, and then search the section tables manually if failed. This
allows us to pull out constants necessary for setting up offloading
without first loading the object.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131309
This patch adds support for the device memory type, this is currently equivalent
to the default type so it should be treated as the same.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D133128
Some code previous needed the `used` attribute to prevent the GCC
compiler versions 5 and 6 from removing it. This is no longer required
as the minimum supported GCC version for LLVM 16 is >=7.1.0.
Reviewed By: JonChesterfield, vzakhari
Differential Revision: https://reviews.llvm.org/D132976
This test is an expected failure on AMDGPU. The expected failure is a GPU memory
failure, which will typically result in the device totally failing. This isn't
an issue for some GPU configurations that do not use the offloading device to
also drive the display server. However, if the main GPU is used for testing it
will reliably result in the user's display becoming unresponsive. This makes it
difficult to run the GPU offloading tests on many systems.
This patch simply makes this test unsupported so it no longer runs and freezes
my computer when using `ninja check-openmp`.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D132891
Previously, the tripcount was set by a push call. We moved away from
this with the new interface that added the tripcount to the kernel
arguments struct, but kept around the old interface for legacy purposes
for the LLVM 15 release. This patch removes the support for the legacy
method.
This removes the support for the old method, but does not break
backwards compatibility. This will result in applications using the old
interface being slower when run on the device.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D132885
Previously time tracing features were hidden behind an optional CMake
option. This was because `libomptarget` was not based on the LLVM
libraries at that time. Now that `libomptarget` is an LLVM library we
should be able to freely use the `LLVMSupport` library whenever we want
and do not need to guard it in this way.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D132852
The only RTLs that get added to the `UsedRTLs` list have already been
checked is they were valid binaries. We shouldn't need to do this again
when we unregister all the used binaries as they wouldn't have been used
if they were invalid anyway. Let me know if I'm incorrect in this
assumption.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D131443
Recently OpenMP has transitioned to using the "new" driver which
primarily merges the device and host linking phases into a single
wrapper that handles both at the same time. This replaced a few tools
that were only used for OpenMP offloading, such as the
`clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver
carries some marked benefits compared to the old driver that is now
being deprecated. Things like device-side LTO, static library
support, and more compatible tooling. As such, we should be able to
completely deprecate the old driver, at least for OpenMP. The old driver
support will still exist for CUDA and HIP, although both of these can
currently be compiled on Linux with `--offload-new-driver` to use the new
method.
Note that this does not deprecate the `clang-offload-bundler`, although
it is unused by OpenMP now, it is still used by the HIP toolchain both
as their device binary format and object format.
When I proposed deprecating this code I heard some vendors voice
concernes about needing to update their code in their fork. They should
be able to just revert this commit if it lands.
Reviewed By: jdoerfert, MaskRay, ye-luo
Differential Revision: https://reviews.llvm.org/D130020
The cuda plugin maps TARGET_ALLOC_HOST onto cuMemAllocHost
which is page locked host memory. Fine grain HSA memory is not
necessarily page locked but has the same read/write from host or
device semantics.
The cuda plugin does this per-gpu and this patch makes it accessible
from any gpu, but it can be locked down to match the cuda behaviour
if preferred.
Enabling tests requires an equivalent to
// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda
for amdgpu which doesn't seem to be in use yet.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D132660
if(TARGET amdgpu-arch) doesn't work when ENABLE_LLVM_PROJECTS=openmp because openmp subdirectory is processed before clang subdirectory. Adopt the same logic of enabling tests like the CUDA plugin.
Differential Revision: https://reviews.llvm.org/D132579
This patch replaces uses of `dlopen` and `dlsym` with LLVM's support
with `loadPermanentLibrary` and `getSymbolAddress`. This allows us to
remove the explicit dependency on the `dl` libraries in the CMake. This
removes another explicit dependency and solves an issue encountered
while building on Windows platforms. The one downside to this is that
the LLVM library does not currently support `dlclose` functionality, but
this could be added in the future.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131507
We use the offloading entires array to determine the relative names and
addressed of device-side kernel functions. The x86_64 plugin previously
derived the device-side entry table by first identifying the
`omp_offloading_entries` section offset in the loaded elf. Then we would
use the base offset of the loaded dyanmic library to identify the
entries array within the loaded image. This relied on some more
unconventional methods which prevented us from using the LLVM dynamic
library loader for this plugin. This patch simplifies this by instead
copying the host-side entry and replacing its address with the
device-side address looked up through `dlsym`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D131516