Commit Graph

120 Commits

Author SHA1 Message Date
Joseph Huber fd5853dae6 [Libomptarget] Reduce shared memory stack size to 512 and a message when it is exceeded
Reduces the shared memory size used for globalization to 512 bytes from
2048 to reduce the pressure on shared memory. This patch ado adds a
debug mesage to indicate when the shared memory was insufficient.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118625
2022-01-31 15:19:48 -05:00
Ron Lieberman 619f44b0ed Revert "[OpenMP] Ensure broken assumptions print once, not thousands of times."
This reverts commit 27c799ecc9.
2022-01-28 01:41:10 +00:00
Joseph Huber 27c799ecc9 [OpenMP] Ensure broken assumptions print once, not thousands of times.
If we have a broken assumption we want to print a message to the user.
If the assumption is broken by many threads in many teams this can
become a problem. To avoid it we use a hash that tracks if a broken
assumption has (likely) been printed and avoid printing it again. This
is not fool proof and has some caveats that might cause problems in
the future (see comment) but it should improve the situation
considerably for now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112156
2022-01-27 18:43:45 -05:00
Johannes Doerfert 1e12156896 [OpenMP][NFCI] Pipe the IdentTy object through more new RT functions
IdentTy objects are useful for debugging and profiling so we want to
keep them around in more places, especially those that have a large
impact on performance, e.g., everything related to state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112494
2022-01-27 15:36:55 -05:00
Joseph Huber 26feef0846 [Libomptarget] Change visibility to hidden for device RTL
This patch changes the visibility for all construct in the new device
RTL to be hidden by default. This is done after the changes introduced
in D117806 changed the visibility from being hidden by default for all
device compilations. This asserts that the visibility for the device
runtime library will be hidden except for the internal environment
variable. This is done to aid optimization and linking of the device
library.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D117807
2022-01-20 21:06:28 -05:00
Joseph Huber 28d718602a [OpenMP] Expand short verisions of OpenMP offloading triples
The OpenMP offloading libraries are built with fixed triples and linked
in during compile time. This would cause un-helpful errors if the user
passed in the wrong expansion of the triple used for the bitcode
library. because we only support these triples for OpenMP offloading we
can normalize them to the full verion used in the bitcode library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D117634
2022-01-19 20:26:37 -05:00
Joseph Huber 4863fed933 [Libomptarget] Fix external visibility for internal variables
After the changes in D117362 made variables declared inside of a target
declare directive visible outside the plugin, some variables inside the
runtime were given visiblity that conflicted with their address space
type. This caused problems when shared or local memory was made
externally visible. This patch fixes this issue by making these
varialbes static within the module, therefore limiting their visibility
to being internal.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117526
2022-01-18 18:19:57 -05:00
Joseph Huber 138cc5a001 Revert "[Libomptarget] Fix external visibility for internal variables"
Reverting to investigate break on AMDGPU. This reverts commit
0203ff1960.
2022-01-18 14:44:11 -05:00
Joseph Huber 0203ff1960 [Libomptarget] Fix external visibility for internal variables
After the changes in D117362 made variables declared inside of a target
declare directive visible outside the plugin, some variables inside the
runtime were given visiblity that conflicted with their address space
type. This caused problems when shared or local memory was made
externally visible. This patch fixes this issue by making these
varialbes static within the module, therefore limiting their visibility
to being internal.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117526
2022-01-18 12:53:24 -05:00
Joseph Huber 4869a22d1d [Libomptarget] Add `cold` to KeepAlive attributes
This patch adds the `cold` attribute to the keepAlive functions in the
RTL. This dummy function exists to keep certain RTL calls alive without
them being optimized out, but it is never called and can be declared
cold. This also helps some erroneous remarks being given on this
function because it has weak linkage and cannot be made internal.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D117513
2022-01-17 17:29:26 -05:00
Jon Chesterfield d53b979596 [openmp][devicertl] Handle missing clang_tool
Fixes github issues/52910

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D117230
2022-01-13 22:43:26 +00:00
Joseph Huber 4746e38f67 [Libomptarget] Fix multiply defined symbol during linking
This patch adds the `weak` identifier to the openmp device environment
variable. The changes introduced in https://reviews.llvm.org/D117211
result in multiply defined symbols. Because the symbol is potentially
included multiple times for each offloading file we will get symbol
colisions, and because it needs to have external visiblity it should be
weak.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D117231
2022-01-13 11:57:33 -05:00
Jon Chesterfield 4395608939 [openmp] Mark used variables as retain as well
D97446 changed the behaviour of 'used'. Compensate.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D117211
2022-01-13 13:57:32 +00:00
Joseph Huber 7cdaa5a94e [OpenMP][FIX] Change globalization alignment to 16
This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.

Depends on D115888

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115971
2021-12-27 16:58:25 -05:00
Joseph Huber bc9c4d7216 [OpenMP][FIX] Pass the num_threads value directly to parallel_51
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.

In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D113623
2021-12-09 16:30:29 -05:00
Jon Chesterfield 3ab150f6e4 [openmp][devicertl] Add a missing loader_uninitialized attribute 2021-11-29 23:54:37 +00:00
Joseph Huber 374cd0fb61 [OpenMP] Fix initializer not working on AMDGPU
The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D113963
2021-11-16 08:17:15 -05:00
Joel E. Denny c9dfe322ee [OpenMP] Fix main thread barrier for Pascal and amdgpu
Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113602
2021-11-12 11:18:45 -05:00
Jon Chesterfield 27177b82d4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-10 15:30:56 +00:00
Atmn Patel 737c4a2673 [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 15:11:05 -05:00
Atmn Patel ef717f3852 Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files"
This reverts commit 81a7cad2ff.
2021-11-09 02:10:42 -05:00
Atmn Patel 81a7cad2ff [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 01:52:52 -05:00
Jon Chesterfield 0fa45d6d80 Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"
This reverts commit db81d8f6c4.
2021-11-08 20:28:57 +00:00
Jon Chesterfield db81d8f6c4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

The exact set of changes to check-openmp probably needs revision before commit

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-08 18:38:00 +00:00
Johannes Doerfert d4b1cf8f9c [OpenMP] Build device runtimes for sm_86
Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D113111
2021-11-04 17:54:59 -05:00
Johannes Doerfert 93bebdc78f [OpenMP][NFCI] Cleanup new device RT mapping interface
Minimize the `impl` interface and clean up some uses of mapping
functions.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D112154
2021-11-04 17:54:53 -05:00
Johannes Doerfert 73720c8059 [OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112893
2021-11-02 23:22:01 -05:00
Johannes Doerfert ccb5d2726a [OpenMP][FIX] Avoid a race between initialization and first state reads
When we pick state 0 to initialize state but thread N is going to be the
"main thread", in generic mode, we would require extra synchronization.
Instead, we should pick the main thread to initialize state in generic
mode and any thread in SPMD mode.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112874
2021-11-02 23:21:49 -05:00
Shilei Tian 025f549240 [OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3
The synchronization at the end of parallel region cannot make sure all threads
exit the scope. As a result, the assertions right after it might be hit, and
further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may
not hold as well. We either add a synchronization right after the parallel region,
or remove the assertions and assuptions. Here we choose the first one as those
assertions and assumptions can help optimizations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112861
2021-10-30 14:44:29 -04:00
Joseph Huber 927c74d4da [OpenMP] Fix assert macro expr
Summary:
A previous patch changed the check and mistakenly only did `!expr` when
this is a macro expansion and could only apply to the left side of an
expression.
2021-10-29 17:44:13 -04:00
Joseph Huber 2c6a4e5678 [OpenMP] Use the assertion formatting from assert.h
This patch changes the `assert_assume` function used for internal
assumptions in the device runtime to use a more standard formatting for
the assumption message.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112842
2021-10-29 16:44:01 -04:00
Joseph Huber 6dd791bca8 [OpenMP] Check output of malloc in the device for debug
A common problem is the device running out of global heap memory and
crashing due to a nullptr dereference when using the data sharing stack.
This explicitly checks that a nullptr was not returned by malloc when
debugging field 1 is enabled.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112005
2021-10-29 14:57:12 -04:00
Joseph Huber 74f91741b6 [OpenMP] Use function tracing RAII for runtime functions.
This patch adds support for using function tracing features to track the
executino of runtime functions in the device runtime library. This is
enabled by first compiling the new runtime with
`-fopenmp-target-debug=3` and running with
`LIBOMPTARGET_DEVICE_RTL_DEBUG=3`. The output only tracks team 0 and
thread 0 so there isn't much output when using a generic region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112002
2021-10-29 14:57:11 -04:00
Jon Chesterfield 4d50803ce4 [libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.
CI is showing a different set of pass/fails to local, committing this
without the tests enabled by default while debugging that difference.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227
2021-10-28 12:34:01 +01:00
Jon Chesterfield 22bd75be70 [openmp] Fix a git misfire in cf37a94c1e 2021-10-28 01:35:25 +01:00
Jon Chesterfield 6c7b203d1d Revert "[libomptarget] Build DeviceRTL for amdgpu"
- more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb7b.
2021-10-28 01:01:53 +01:00
Jon Chesterfield cf37a94c1e [openmp] Add amdgpu impl missed from D112153 2021-10-28 00:55:53 +01:00
Jon Chesterfield 33427fdb7b [libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227
2021-10-28 00:41:45 +01:00
Johannes Doerfert 48877525cf [OpenMP] Remove obsolete external interface for device RT
We do not generate _serialized_parallel calls in device mode, no
need for an external API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D112145
2021-10-27 18:22:35 -05:00
Johannes Doerfert 5102c3c61e [OpenMP][FIX] Do not adjust the level after the environment was popped
Exiting a data environment will reset all values, it is wrong to adjust
them afterwards.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112144
2021-10-27 18:22:33 -05:00
Johannes Doerfert b16aadf0a7 [OpenMP] Introduce aligned synchronization into the new device RT
We will later use the fact that a barrier is aligned to reason about
thread divergence. For now we introduce the assumption and some more
documentation.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112153
2021-10-27 18:22:31 -05:00
Johannes Doerfert ef922c692f [OpenMP][FIX] Query proper thread ID information to support nesting
The OpenMP thread ID is not the hardware thread ID if we have nesting.
We need to ask the runtime properly to ensure correct results.

Note that the loop interface is going to change soon so we do not adjust
it now but simply ignore the extra argument.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111950
2021-10-27 18:18:44 -05:00
Johannes Doerfert 4c88341d17 [OpenMP][FIX] Do check the level before return team size
The team size could/should be an ICV but since we know it is either 1 or
a value we can leave it in the team state for now. However, we still
need to determine if the current level is nested before we use it.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D111949
2021-10-27 18:18:42 -05:00
Johannes Doerfert dc72960967 [OpenMP][FIX] Do not dereference a potential nullptr
The first thread state in the new GPU runtime doesn't have a previous
one and we should not dereference the nullptr placeholder.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111946
2021-10-27 18:18:39 -05:00
Jon Chesterfield e42e5785ad [libomptarget][nfc]Generalise DeviceRTL cmake to allow building for amdgpu
Essentially moves the foreach over sm integers into a macro and instantiates it for nvptx.

NFC in that the macro is not presently instantiated for amdgpu as the corresponding code doesn't compile yet.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111987
2021-10-26 21:18:21 +01:00
Jon Chesterfield a602c2b51d [libomptarget][DeviceRTL] Generalise and simplify cmakelists
Step towards building the DeviceRTL for amdgpu.

Mostly replaces cuda-specific toolchain finding logic with the
generic logic currently found in the amdgpu deviceRTL cmake. Also
deletes dead code and changes the default to build on systems
without cuda installed, as the library doesn't use cuda and the
amdgpu-only systems generally won't have cuda installed.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D111983
2021-10-21 16:14:29 +01:00
Jon Chesterfield 7272982e1d [libomptarget] Refactor DeviceRTL prior to AMDGPU bringup
Subset of D111993. Fix typos, rename read to load.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111999
2021-10-19 08:05:06 +01:00
Joseph Huber bad44d5f39 [OpenMP] Add RTL function for getting number of threads in block.
This patch adds support for the
`__kmpc_get_hardware_num_threads_in_block` function that returns the
number of threads. This was missing in the new runtime and was used by
the AMDGPU plugin which prevented it from using the new runtime. This
patchs also unified the interface for getting the thread numbers in the
frontend.

Originally authored by jdoerfert.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D111475
2021-10-08 22:21:59 -04:00
Joseph Huber 85ad566335 [OpenMP] Avoid calling `isSPMDMode` during RT initialization
Until we hit the first barrier we should not call `mapping::isSPMDMode`
with all threads. Instead, we now have (and use during initialization) a
`mapping::isMainThreadInGenericMode` overload that takes the known
SPMD-mode state and one that queries it.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111381
2021-10-08 22:00:41 -04:00
Joseph Huber 208f900527 [Libomptarget] Add an external interface to dynamic shared memory
This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
``llvm_omp_get_dynamic_shared``. This includes a host-side
definition that only returns a null pointer so that it can be used when
host-fallback is enabled without crashing. Support for dynamic shared
memory was also ported to the old device runtime.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110957
2021-10-08 15:36:57 -04:00
Shilei Tian af4599b8ab [OpenMP][DeviceRTL] Add the support for printf in a freestanding way
For NVPTX, `printf` can be used just with a function declaration. For AMDGCN, an
function definition is added, but it simply returns.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109728
2021-10-07 22:15:37 -04:00
Johannes Doerfert 44710940af [OpenMP][FIX] Data race in the SPMD execution of the new runtime
We need to synchronize the threads *before* we destroy the RAII objects
that hold the old values and not after to avoid threads executing the
parallel region but seeing an inconsistent state.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D111369
2021-10-07 21:01:24 -04:00
Jon Chesterfield 0c554a4769 [libomptarget] Move device environment to shared header, remove divergence
Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069
2021-10-07 12:03:48 +01:00
Joseph Huber 74d622dea4 [OpenMP] Add new worksharing definitions into device RTL
This path defines the newly added `__kmpc_disitrute_static_init`
functions in the device runtime library. These functions are currently
exact copies of the current worksharing method but can be tuned later.

Depends on D110429

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110430
2021-09-27 11:36:41 -04:00
Michael Kruse 1b242dccff [OpenMP][CMake] Use in-project clang as CUDA->IR compiler for new DeviceRTL.
Use the in-project clang, llvm-link and opt if available and unless
CMake cache variables specify to use a different compiler. This applies
D101265 to the new DeviceRTL's CMakeLists.txt which was copied before
D101265 was applied.

Fixes the openmp-offloading-cuda-runtime builder which was failing
since D110006.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110251
2021-09-27 07:14:19 -05:00
Joseph Huber d83ca624a1 [OpenMP] Fix data-race in new device RTL
This patch fixes a data-race observed when using the new device runtime
library. The Internal control variable for the parallel level is read in
the `__kmpc_parallel_51` function while it could potentially be written
by other threads. This causes data corruption and will cause
nondetermistic behaviour in the runtime. This patch fixes this by adding
an explicit synchronization before the region starts.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110366
2021-09-23 17:28:07 -04:00
Shilei Tian 423d34f74a [OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit`
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Joseph Huber 60a40cf379 [OpenMP] Fix KeepAlive usage
Summary:
Functions were called the wrong way around, this didn't keep the symbol
alive.
2021-09-22 14:38:19 -04:00
Joseph Huber 277b681ede [OpenMP] Add function tracing debugging to device RTL
This patch adds support for an RAII struct that will print function
traces when placed inside of a function declaration. Each successive
call will increase the indentation to make it easier to visually
inspect.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110202
2021-09-22 12:25:29 -04:00
Joseph Huber 1cf86df883 [OpenMP] Make sure the Thread ID function is not removed
Summary:
The thread ID function was reintroduced in D110195, but could
potentially be removed by the optimizer. Make the function noinline to
preserve the call sites and add it to the externalization RAII so its
definition is not removed by the attributor.
2021-09-22 10:13:18 -04:00
Joseph Huber e95731cca7 [OpenMP] Add thread ID function into new RTL
The new device runtime library currently lacks the
`kmpc_get_hardware_thread_id_in_block` function which is currently used
when doing the SPMDzation optimization. This call would be introduced
through the optimization and then cause a linking error because it was
not present. This patch adds support for this runtime call.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110195
2021-09-21 17:43:50 -04:00
Joseph Huber f1c821fa85 [OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006
2021-09-17 21:25:36 -04:00
Joseph Huber ec02c34b6d [OpenMP] Add additional fields to device environment
This patch adds fields for the device number and number of devices into
the device environment struct and debugging values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110004
2021-09-17 21:25:32 -04:00
Joseph Huber b266bcb135 [OpenMP] Implement __assert_fail in the new device runtime
This patch implements the `__assert_fail` function in the new device
runtime. This allows users and developers to use the standars assert
function inside of the device.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109886
2021-09-17 21:25:28 -04:00
Jon Chesterfield 842f875c8b [openmp] Use llvm GridValues from devicertl
Add include path to the cmakefiles and set the target_impl enums
from the llvm constants instead of copying the values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108391
2021-08-23 20:25:24 +01:00
Joachim Protze 4bb36df144 [libomptarget][amdcgn] Add build dependency for llvm-link and opt
D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime
(LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same
cmake instance.

We could limit the dependency to cases where libomptarget/plugins are really
built. But compared to the whole llvm project, building openmp runtime is
negligible and postponing the build of OpenMP runtime after the dependencies
are ready seems reasonable.

The direct dependency introduced in D107156 and D107320 is necessary for the
case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp).

Differential Revision: https://reviews.llvm.org/D108404
2021-08-20 01:57:58 +02:00
Jon Chesterfield 21d91a8ef3 [libomptarget][devicertl] Replace lanemask with uint64 at interface
Use uint64_t for lanemask on all GPU architectures at the interface
with clang. Updates tests. The deviceRTL is always linked as IR so the zext
and trunc introduced for wave32 architectures will fold after inlining.

Simplification partly motivated by amdgpu gfx10 which will be wave32 and
is awkward to express in the current arch-dependant typedef interface.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108317
2021-08-18 20:47:33 +01:00
Johannes Doerfert ed7ec860f0 [OpenMP] Improve alignment handling in the new device runtime 2021-07-27 17:50:27 -05:00
Joseph Huber e3ee76245e [Libomptarget] Revert new variable sharing to use the old method
The new method of sharing variables introduces a `__kmpc_alloc_shared` call
that cannot be removed in the middle end because of its non-constant argument
and unconnected free. This patch reverts this to the old method that used a
static amount of shared memory for sharing variables.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106905
2021-07-27 18:14:01 -04:00
Johannes Doerfert 67ab875ff5 [OpenMP] Prototype opt-in new GPU device RTL
The "old" OpenMP GPU device runtime (D14254) has served us well for many
years but modernizing it has caused some pain recently. This patch
introduces an alternative which is mostly written from scratch embracing
OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual
interfaces. This new runtime is opt-in through a clang flag (D106793).
The new runtime is currently only build for nvptx and has "-new" in its
name.

The design is tailored towards middle-end optimizations rather than
front-end code generation choices, a trend we already started in the old
runtime a while back. In contrast to the old one, state is organized in
a simple manner rather than a "smart" one. While this can induce costs
it helps optimizations. Our expectation is that the majority of codes
can be optimized and a "simple" design is therefore preferable. The new
runtime does also avoid users to pay for things they do not use,
especially wrt. memory. The unlikely case of nested parallelism is
supported but costly to make the more likely case use less resources.

The worksharing and reduction implementation have been taken from the
old runtime and will be rewritten in the future if necessary.

Documentation and debug features are still mostly missing and will be
added over time.

All external symbols start with `__kmpc` for legacy reasons but should
be renamed once we switch over to a single runtime. All internal symbols
are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name
clashes with user symbols.

Differential Revision: https://reviews.llvm.org/D106803
2021-07-27 00:56:05 -05:00