Commit Graph

2395 Commits

Author SHA1 Message Date
Dhruva Chakrabarti 667af48179 [OpenMP] [OMPT] [1/8] Create separate categories for host, device, [no]emi events
In preparation for OMPT target changes, create separate categories of events that will be used by OMPT target support.

Split up existing macro FOREACH_OMPT_EVENT into new ones. There is no change to the original macro. Created new macros FOREACH_OMPT_HOST_EVENT, FOREACH_OMPT_DEVICE_EVENT, FOREACH_OMPT_NOEMI_EVENT, FOREACH_OMPT_EMI_EVENT, and a few other sub-categories that can be used as required. One such use is in D123974 which uses events selectively.

Patch from John Mellor-Crummey <johnmc@rice.edu>

Reviewed By: dreachem

Differential Revision: https://reviews.llvm.org/D123429
2022-10-01 00:46:40 +00:00
Vitaly Buka adf4eda004 [test][openmp] Tsan may report more warnings here 2022-09-28 18:53:09 -07:00
Jennifer Yu 30cc712eb6 [Clang][OpenMP] Fix run time crash when use_device_addr is used.
It is data mapping ordering problem.

According omp spec
If one or more map clauses are present, the list item conversions that
are performed for any use_device_ptr or use_device_addr clause occur
after all variables are mapped on entry to the region according to those
map clauses.

The change is to put mapping data for use_device_addr at end of data
mapping array.

Differential Revision: https://reviews.llvm.org/D134556
2022-09-27 11:53:57 -07:00
Dan Palermo db021abf33 [OpenMP][AMDGPU] Enable OpenMP device runtime build for gfx110[0123]
Add OpenMP device runtime build support for the gfx1100, gfx1101,
gfx1102, and gfx1103 targets.

Differential Revision: https://reviews.llvm.org/D134465
2022-09-23 01:49:51 +00:00
Jennifer Yu 48ffd40ba2 [Clang][OpenMP] Codegen generation for has_device_addr claues.
This patch add codegen support for the has_device_addr clause. It use
the same logic of is_device_ptr. But passing &var instead pointer to var
to kernal.

Differential Revision: https://reviews.llvm.org/D134268
2022-09-20 21:12:30 -07:00
Ron Lieberman d5b5289561 revert 684f76643 [Clang][OpenMP] Codegen generation for has_device_addr claues.
breaks amdgpu buildbot
2022-09-20 01:37:27 +00:00
Jennifer Yu a1df13ecd6 Fix test case which is not working for AMDGPU.
This is for the change of
Differential Revision: https://reviews.llvm.org/D134186
2022-09-19 17:07:01 -07:00
Jennifer Yu 684f766431 [Clang][OpenMP] Codegen generation for has_device_addr claues.
Summary: This patch add codegen support for the has_device_addr clause.  It
use the same logic of is_device_ptr.

Differential Revision: https://reviews.llvm.org/D134186
2022-09-19 16:14:57 -07:00
SignKirigami 6772987fc3 [OpenMP] Add LoongArch64 support
GCC, glibc, binutils, and LLVM have added support for LoongArch64.
This patch adds support for LLVM OpenMP following D59880 for RISCV64.

Reviewed By: MaskRay, SixWeining

Differential Revision: https://reviews.llvm.org/D132925
2022-09-19 22:49:15 +00:00
Joseph Huber 292cb114b0 [Libomptarget] Revert changes to AMDGPU plugin destructors
These patches exposed a lot of problems in the AMD toolchain. Rather
than keep it broken we should revert it to its old semi-functional
state. This will prevent us from using device destructors but should
remove some new bugs. In the future this interface should be changed
once these problems are addressed more correctly.

This reverts commit ed0f218115.

This reverts commit 2b7203a359.

Fixes #57536

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133997
2022-09-16 06:55:51 -05:00
Joseph Huber 4b004a0b83 [Libomptarget] Embed bitcode library in static library instead.
This patch changes the CMake to instead embed the already generated
LLVM-IR bitcode library into an object file to create the static
library. This is different from the previous method which generated them
separately. This will make the build faster and allow us to perform the
same internalization into a single library we do with the bitcode
library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133952
2022-09-15 14:05:18 -05:00
Dhruva Chakrabarti 839ac62c50 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 7539e9cf81.
2022-09-15 03:08:46 +00:00
Giorgis Georgakoudis 7539e9cf81 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6, ABataev

Differential Revision: https://reviews.llvm.org/D102107
2022-09-15 00:54:05 +00:00
Joseph Huber 23bc343855 [Libomptarget] Change device free routines to accept the allocation kind
Previous support for device memory allocators used a single free
routine and did not provide the original kind of the allocation. This is
problematic as some of these memory types required different handling.
Previously this was worked around using a map in runtime to record the
original kind of each pointer. Instead, this patch introduces new free
routines similar to the existing allocation routines. This allows us to
avoid a map traversal every time we free a device pointer.

The only interfaces defined by the standard are `omp_target_alloc` and
`omp_target_free`, these do not take a kind as `omp_alloc` does. The
standard dictates the following:

"The omp_target_alloc routine returns a device pointer that references
the device address of a storage location of size bytes. The storage
location is dynamically allocated in the device data environment of the
device specified by device_num."

Which suggests that these routines only allocate the default device
memory for the kind. So this has been changed to reflect this. This
change is somewhat breaking if users were using `omp_target_free` as
previously shown in the tests.

Reviewed By: JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D133053
2022-09-14 12:14:07 -05:00
Joseph Huber c2acb1e5d3 [Libomptarget][NFC] Remove unused variable 2022-09-09 15:26:02 -05:00
Joseph Huber 86587f2891 [Libomptarget] Fix compiling with asserts using the bitcode library
Sumnmary:
A previous patch introduces an `exports` file which contains all the
symbol names that are not internalized in the bitcode library. This is
done to reduce the size of the bitcode library and only export needed
functions. This export file must contain all the functoins expected to
be called from the device. Since its introduction the `__assert_fail`
function used to be provided but was mistakenly not included. This patch
adds it.

Fixes #57656

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133594
2022-09-09 15:25:24 -05:00
Joseph Huber 83fcba82cc [Libomptarget] Add proper LLVM libraries now that the AMDGPU plugin uses them
Summary:
The AMDGPU and CUDA plugins now relies on the Object and Support
libraries. This patch adds them explicitly rather than hoping that they
share the symbols loaded from the standard `libomptarget`.
2022-09-09 10:33:26 -05:00
serge-sans-paille 6f2ed8fd3f [OpenMP] Install ompt-multiplex.h alongside omp.h
The default install direction may not be in the compiler search path.

Differential Revision: https://reviews.llvm.org/D133420
2022-09-09 09:42:08 +02:00
Jonathan Peyton e5ac98fa01 [OpenMP][libomp] Cleanup __kmpc_flush() code
Have it be simple KMP_MFENCE() which incorporates x86-specific logic and
reduces to KMP_MB() for other architectures.

Differential Revision: https://reviews.llvm.org/D130928
2022-09-08 16:17:20 -05:00
Joseph Huber 6e8d93e5c2 [Libomptarget] Implement OpenMP 5.2 semantics for device pointers
In OpenMP 5.2, §5.8.6, page 160 line 32-33, when a device pointer
allocated by omp_target_alloc has implicitly been included on a target
construct as a zero-length array, the pointer initialisation should not
find a matching mapped list item, and so should retain its value as a
firstprivate variable. Previously, we would return a null pointer if the
list item was not found. This patch updates the map handling to the
OpenMP 5.2 semantics.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D133447
2022-09-07 17:01:14 -05:00
Joseph Huber 8d2a447bf9 [Libomptarget] Remove leftover ELF header from x86 plugin
Summary:
We removed the linking support for `gelf.h` in a previous patch. This
header was incorrectly leftover causing build problems on some systems.
2022-09-07 13:41:40 -05:00
Joseph Huber 300155911a [Libomptarget] Replace libelf with LLVM's Elf libraries
This patch replaces the dependency on `libelf` with LLVM's ELF support.
With this patch the user no-longer needs to have `libelf` on their
system to build and configure OpenMP offloading. The replacement is
mostly mechanical, with the exception of the hash table support which
was added in D131309.

Depends on D131309

Reviewed By: JonChesterfield, saiislam

Differential Revision: https://reviews.llvm.org/D131401
2022-09-07 12:38:51 -05:00
Joseph Huber 894531f59b [Libomptarget] Add utility functions for loading an ELF symbol by name
The `SHT_HASH` sections in an ELF are used to look up a symbol in the
symbol table using a symbol's name. This is done by obtaining the
`SHT_HASH` section and using its `sh_link` attribute to access the
associated symbol table, from which we can access the string table
containing the associated name. We can then search for the symbol using
the hash of the name and the buckets and chains in the hash table
itself

This patch adds utility functions that allow us to look up a symbol in
an ELF file by name. It will first attempt to look through the hash
tables, and then search the section tables manually if failed. This
allows us to pull out constants necessary for setting up offloading
without first loading the object.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131309
2022-09-07 12:38:50 -05:00
Joseph Huber 31f434ee3b [Libomptarget][NFC] Clean up CUDA plugin and address warnings 2022-09-06 15:28:57 -05:00
Vignesh Balasubramanian d2a6e165e8 [OpenMP][OMPD] GDB plugin code to leverage libompd to provide debugging
support for OpenMP programs.

This is 5th of 6 patches started from https://reviews.llvm.org/D100181
This plugin code, when loaded in gdb, adds a few commands like
ompd icv, ompd bt, ompd parallel.
These commands create an interface for GDB to read the OpenMP
runtime through libompd.

Reviewed By: @dreachem
Differential Revision: https://reviews.llvm.org/D100185
2022-09-06 11:28:55 +05:30
Ye Luo 0e68f483d4 [OpenMP] add a offload test involving std::complex
Taken from the https://github.com/llvm/llvm-project/issues/57064 reproducer.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133258
2022-09-03 13:28:11 -05:00
Joseph Huber f8b1f93f26 [libomptarget] Enable the device allocator for AMDGPU
This patch adds support for the device memory type, this is currently equivalent
to the default type so it should be treated as the same.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D133128
2022-09-01 12:40:59 -05:00
Joseph Huber 56cf3d626f [Libomptarget] Remove old workaround for GCC 5,6 from libomptarget
Some code previous needed the `used` attribute to prevent the GCC
compiler versions 5 and 6 from removing it. This is no longer required
as the minimum supported GCC version for LLVM 16 is >=7.1.0.

Reviewed By: JonChesterfield, vzakhari

Differential Revision: https://reviews.llvm.org/D132976
2022-08-30 19:13:48 -05:00
Joseph Huber 52556c3c0f [Libomptarget] Make unified shared memory test unsupported on AMDGPU
This test is an expected failure on AMDGPU. The expected failure is a GPU memory
failure, which will typically result in the device totally failing. This isn't
an issue for some GPU configurations that do not use the offloading device to
also drive the display server. However, if the main GPU is used for testing it
will reliably result in the user's display becoming unresponsive. This makes it
difficult to run the GPU offloading tests on many systems.

This patch simply makes this test unsupported so it no longer runs and freezes
my computer when using `ninja check-openmp`.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D132891
2022-08-30 12:14:25 -05:00
Joseph Huber dc400f8612 [libomptarget] Deprecate old method for setting the tripcount
Previously, the tripcount was set by a push call. We moved away from
this with the new interface that added the tripcount to the kernel
arguments struct, but kept around the old interface for legacy purposes
for the LLVM 15 release. This patch removes the support for the legacy
method.

This removes the support for the old method, but does not break
backwards compatibility. This will result in applications using the old
interface being slower when run on the device.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D132885
2022-08-29 20:08:26 -05:00
Joseph Huber 04ae35e592 [libomptarget] Always enable time tracing in libomptarget
Previously time tracing features were hidden behind an optional CMake
option. This was because `libomptarget` was not based on the LLVM
libraries at that time. Now that `libomptarget` is an LLVM library we
should be able to freely use the `LLVMSupport` library whenever we want
and do not need to guard it in this way.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D132852
2022-08-29 14:49:03 -05:00
Joseph Huber 22d71e72c9 [Libomptarget] Do not check for valid binaries twice.
The only RTLs that get added to the `UsedRTLs` list have already been
checked is they were valid binaries. We shouldn't need to do this again
when we unregister all the used binaries as they wouldn't have been used
if they were invalid anyway. Let me know if I'm incorrect in this
assumption.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D131443
2022-08-29 08:36:50 -05:00
Joseph Huber 47166968db [OpenMP] Deprecate the old driver for OpenMP offloading
Recently OpenMP has transitioned to using the "new" driver which
primarily merges the device and host linking phases into a single
wrapper that handles both at the same time. This replaced a few tools
that were only used for OpenMP offloading, such as the
`clang-offload-wrapper` and `clang-nvlink-wrapper`. The new driver
carries some marked benefits compared to the old driver that is now
being deprecated. Things like device-side LTO, static library
support, and more compatible tooling. As such, we should be able to
completely deprecate the old driver, at least for OpenMP. The old driver
support will still exist for CUDA and HIP, although both of these can
currently be compiled on Linux with `--offload-new-driver` to use the new
method.

Note that this does not deprecate the `clang-offload-bundler`, although
it is unused by OpenMP now, it is still used by the HIP toolchain both
as their device binary format and object format.

When I proposed deprecating this code I heard some vendors voice
concernes about needing to update their code in their fork. They should
be able to just revert this commit if it lands.

Reviewed By: jdoerfert, MaskRay, ye-luo

Differential Revision: https://reviews.llvm.org/D130020
2022-08-26 13:47:09 -05:00
Jon Chesterfield ffabe997a5 [openmp][amdgpu] Implement target_alloc_host as fine grain HSA memory
The cuda plugin maps TARGET_ALLOC_HOST onto cuMemAllocHost
which is page locked host memory. Fine grain HSA memory is not
necessarily page locked but has the same read/write from host or
device semantics.

The cuda plugin does this per-gpu and this patch makes it accessible
from any gpu, but it can be locked down to match the cuda behaviour
if preferred.

Enabling tests requires an equivalent to
// RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda
for amdgpu which doesn't seem to be in use yet.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D132660
2022-08-25 16:27:52 +01:00
Ye Luo 322ea53144 [libomptarget][amdgpu] enable tests whenever possible.
if(TARGET amdgpu-arch) doesn't work when ENABLE_LLVM_PROJECTS=openmp because openmp subdirectory is processed before clang subdirectory. Adopt the same logic of enabling tests like the CUDA plugin.

Differential Revision: https://reviews.llvm.org/D132579
2022-08-24 14:33:28 -05:00
Joseph Huber 540a13652f [Libomptarget] Replace use of `dlopen` with LLVM's dynamic library support
This patch replaces uses of `dlopen` and `dlsym` with LLVM's support
with `loadPermanentLibrary` and `getSymbolAddress`. This allows us to
remove the explicit dependency on the `dl` libraries in the CMake. This
removes another explicit dependency and solves an issue encountered
while building on Windows platforms. The one downside to this is that
the LLVM library does not currently support `dlclose` functionality, but
this could be added in the future.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131507
2022-08-24 10:46:21 -05:00
Joseph Huber 30efb459e0 [Libomptarget] Remove use of ELF link_address in x86_64 plugin
We use the offloading entires array to determine the relative names and
addressed of device-side kernel functions. The x86_64 plugin previously
derived the device-side entry table by first identifying the
`omp_offloading_entries` section offset in the loaded elf. Then we would
use the base offset of the loaded dyanmic library to identify the
entries array within the loaded image. This relied on some more
unconventional methods which prevented us from using the LLVM dynamic
library loader for this plugin. This patch simplifies this by instead
copying the host-side entry and replacing its address with the
device-side address looked up through `dlsym`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D131516
2022-08-24 10:46:20 -05:00
Vitaly Buka 3195449f2b [test][openmp] Relax condition in test
It runs 8 threads. Sometimes tsan is able to detect more than one of the
same race.
2022-08-23 14:29:06 -07:00
Joseph Huber 2b8f722e63 [OpenMP] Add option to assert no nested OpenMP parallelism on the GPU
The OpenMP device runtime needs to support the OpenMP standard. However
constructs like nested parallelism are very uncommon in real application
yet lead to complexity in the runtime that is sometimes difficult to
optimize out. As a stop-gap for performance we should supply an argument
that selectively disables this feature. This patch adds the
`-fopenmp-assume-no-nested-parallelism` argument which explicitly
disables the usee of nested parallelism in OpenMP.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D132074
2022-08-23 14:09:51 -05:00
utsumi 2e2caea37f [Clang][OpenMP] Make copyin clause on combined and composite construct work (patch by Yuichiro Utsumi (utsumi.yuichiro@fujitsu.com))
Make copyin clause on the following constructs work.

- parallel for
- parallel for simd
- parallel sections

Fixes https://github.com/llvm/llvm-project/issues/55547

Patch by Yuichiro Utsumi (utsumi.yuichiro@fujitsu.com)

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D132209
2022-08-23 07:58:35 -07:00
John Ericson e941b031d3 Revert "[cmake] Use `CMAKE_INSTALL_LIBDIR` too"
This reverts commit f7a33090a9.

Unfortunately this causes a number of failures that didn't show up in my
local build.
2022-08-18 22:46:32 -04:00
John Ericson f7a33090a9 [cmake] Use `CMAKE_INSTALL_LIBDIR` too
We held off on this before as `LLVM_LIBDIR_SUFFIX` conflicted with it.
Now we return this.

`LLVM_LIBDIR_SUFFIX` is kept as a deprecated way to set
`CMAKE_INSTALL_LIBDIR`. The other `*_LIBDIR_SUFFIX` are just removed
entirely.

I imagine this is too potentially-breaking to make LLVM 15. That's fine.
I have a more minimal version of this in the disto (NixOS) patches for
LLVM 15 (like previous versions). This more expansive version I will
test harder after the release is cut.

Reviewed By: sebastian-ne, ldionne, #libc, #libc_abi

Differential Revision: https://reviews.llvm.org/D130586
2022-08-18 15:33:35 -04:00
Kevin Sala Penads 1081bb08cc [OpenMP][libomptarget] Fix run region async condition
This patch fixes a condition in the openmp/libomptarget/src/device.cpp file. The code was checking if the run_region plugin API function was implemented, but it should actually check the run_region_async function instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D131782
2022-08-15 13:08:45 -04:00
Fangrui Song acfe0d3b15 [openmp] Remove __ANDROID_API__ < 19 workaround
https://github.com/android/ndk/wiki/Changelog-r24 shows that the NDK has
moved forward to at least a minimum target API of 19. Remove old workaround.
2022-08-12 22:15:38 -07:00
Jennifer Yu 2ca27206f9 [OpenMP] Fix segmentation fault when data field is used in is_device_pt
Currently, the field just emit map info for this pointer variable. It is
failed at run time. For the fields, the PartialStruct is created and it
needs call to emitCombinedEntry which create the base that covers all
the pieces.

The change is to generate map info as regular fields.

Differential Revision: https://reviews.llvm.org/D129608
2022-08-12 17:10:26 -07:00
Jonathan Peyton 56f36f85e0 [OpenMP][OMPT] Fix memory leak when using GCC compatibility code
Serialized parallels allocate lightweight task teams on the heap
but never free them in the corresponding join. This patch adds a wrapper
around the allocation (if ompt enabled) and also adds the corresponding
free in the join call.

Differential Revision: https://reviews.llvm.org/D131690
2022-08-11 15:26:09 -05:00
Johannes Doerfert a8cda32909 [OpenMP][FIX] Ensure __kmpc_kernel_parallel is reachable
The problem is we create the call to __kmpc_kernel_parallel in the
openmp-opt pass but while we optimize the code, the call is not there
yet. Thus, we assume we never reach it from __kmpc_target_deinit. That
allows us to remove the store in there (`ParallelRegionFn = nullptr`),
which leads to bad results later on.

This is a shortstop solution until we come up with something better.

Fixes https://github.com/llvm/llvm-project/issues/57064
2022-08-11 09:55:56 -05:00
Joseph Huber fdbb15355e [Libomptarget][CUDA] Check CUDA compatibilty correctly
We recently added support for multi-architecture binaries in
libomptarget. This is done by extracting the architecture from the
embedded image and comparing it with the major and minor version
supported by the current CUDA installation. Previously we just compared
these directly, which was not correct for binary compatibility. The CUDA
documentation states that we can consider any image with an equivalent
major or a greater or equal to minor compatible with the current image.
Change the check to use this new logic in the CUDA plugin.

Fixes #57049

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D131567
2022-08-10 11:15:27 -04:00
Ron Lieberman 9ff0cc7e0f [openmp] Fix enumeration build issue for openmp library
integer value 40962 is outside the valid range of values [0, 31] for this enumeration type [-Wenum-constexpr-conversion]` (Issue #57022)

turn on -Wno-enum-constexpr-conversion to buy some time to fix the more egregious issue in hsa_agent_into_t and hsa_amd_agent_info_t interfaces.

relates to https://reviews.llvm.org/D131307/new/

Differential Revision: https://reviews.llvm.org/D131477
2022-08-09 10:25:03 +00:00
Fangrui Song 0972a390b9 LLVM_FALLTHROUGH => [[fallthrough]]. NFC 2022-08-09 04:06:52 +00:00