llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Jon Chesterfield	7e9351b9de	[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin Drops an enum that was identical to a HSA one, localises some functions where they were only called from one TU. Covers everything internalize + adce can identify as dead, except for msgpack::dump which is useful when debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102014	2021-05-06 23:16:32 +01:00
Joseph Huber	59b6849012	[OpenMP] Replace global InfoLevel with a reference to an internal one. Summary: This patch improves the implementation of D100774 by replacing the global variable introduced with a function that returns a reference to an internal one. This removes the need to define the variable in every plugin that uses it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101102	2021-04-23 09:43:46 -04:00
Joseph Huber	2b6f20082e	[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime Summary: This patch adds a new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable. Using this will require an extern function, but will eventually be added into an auxilliary library for OpenMP support functions. This patch required moving the current InfoLevel to a global variable which must be instantiated by each plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100774	2021-04-22 12:48:11 -04:00
Joseph Huber	29338459fb	[OpenMP] Trim error messages in CUDA plugin Summary: Remove some of the error messages printed when the CUDA plugin fails. The current error messages can be confusing because they are the first error messages printed after the async stream finds an error. This means that the printed values aren't related to what caused the issue, but are simply the last asyncronous operation that succeeded on the device. Remove these as they can be misleading. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99510	2021-03-29 12:20:19 -04:00
Joseph Huber	16064e71e9	[OpenMP] Reset async stream properly upon failure Summary: If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99443	2021-03-26 19:05:06 -04:00
Jon Chesterfield	626a31de15	[libomptarget] Add register usage info to kernel metadata Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D98829	2021-03-18 17:00:42 +00:00
Jon Chesterfield	7da76aaaf4	[libomptarget] Build amdgpu plugin by default [libomptarget] Build amdgpu plugin by default This will build the amdgpu plugin if cmake is able to find the hsa runtime library, which will be the case if rocm is installed or if the hsa library has been installed somewhere cmake looks. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98654	2021-03-15 20:12:01 +00:00
George Rokos	2468fdd9af	[libomptarget] Add allocator support for target memory This patch adds the infrastructure for allocator support for target memory. Three allocators are introduced for device, host and shared memory. The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard. Differential Revision: https://reviews.llvm.org/D97883	2021-03-13 03:47:07 -08:00
Johannes Doerfert	5449fbb5d4	[OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info` Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96444	2021-03-11 23:31:34 -06:00
Manoel Roemmer	542d9c2154	[libomptarget] Load images in order of registration This makes sure that images are loaded in the order in which they are registered with libomptarget. If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin). In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95530	2021-02-24 18:15:41 +01:00
Ron Lieberman	30c0d5b4c3	[OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask) allow bit masking to select various trace features. bit 0 => Launch tracing (stderr) bit 1 => timing of runtime (stdout) bit 2 => detailed launch tracing (stderr) bit 3 => timing goes to stdout instead of stderr example: LIBOMPTARGET_KERNEL_TRACE=7 does it all LIBOMPTARGET_KERNEL_TRACE=5 Launch + details LIBOMPTARGET_KERNEL_TRACE=2 timings + launch to stderr LIBOMPTARGET_KERNEL_TRACE=10 timings + launch to stdout Differential Revision: https://reviews.llvm.org/D96998	2021-02-19 06:47:22 -05:00
Jon Chesterfield	53d7fd3762	[libomptarget][amdgcn] Remove lookup of .language msgpack field	2021-02-17 23:02:16 +00:00
Johannes Doerfert	ea9395716e	[OpenMP][NFC] Clang format the libomptarget plugins Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96445	2021-02-16 15:37:46 -06:00
Johannes Doerfert	ad94fce845	[OpenMP][NFC] Eliminate sign comparison warning via explicit casts Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D96812	2021-02-16 15:37:41 -06:00
Jon Chesterfield	56c446a878	[libomptarget][amdgcn] Tolerate deadstripped device_state variable [libomptarget][amdgcn] Tolerate deadstripped device_state variable The device_state variable may have been deadstripped. Similar to device_environment, leave detection of missing but used symbol to loader. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96330	2021-02-09 16:29:53 +00:00
Jon Chesterfield	4756f76bce	[libomptarget][amdgcn] Tolerate deadstripped env variable [libomptarget][amdgcn] Tolerate deadstripped env variable Discovered by Pushpinder. If the device_environment variable is unused it can be deadstripped, in which case we should not abort due to it missing. This change is safe in that a missing symbol which is actually used can be reported by both linker and loader, and a missing unused symbol is better deadstripped than left in the image. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D96329	2021-02-09 11:58:37 +00:00
Vyacheslav Zakharin	0fc90873b2	[libomptarget][NFC] Link plugins with threads support library due to std::call_once usage. Differential Revision: https://reviews.llvm.org/D95572	2021-01-27 19:26:18 -08:00
Atmn Patel	8a77056256	[OpenMP][Libomptarget] Fix conditional in CMake for remote plugin The remote offloading plugin's CMakeLists was trying to build if its flag was enabled even if it didn't find gRPC/protobuf. The conditional was wrong, it's fixed by this. Differential Revision: https://reviews.llvm.org/D95574	2021-01-27 21:28:25 -05:00
Jon Chesterfield	653655040f	[libomptarget][cuda] Handle missing _v2 symbols gracefully [libomptarget][cuda] Handle missing _v2 symbols gracefully Follow on from D95367. Dlsym the _v2 symbols if present, otherwise use the unsuffixed version. Builds a hashtable for the check, can revise for zero heap allocations later if necessary. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95415	2021-01-27 00:22:29 +00:00
Atmn Patel	810572cc96	[OpenMP][Libomptarget] Fix cmake error on remote plugin Requiring 3.15 causes a build breakage, I'm sure none of the contents actually require 3.15 or above. Differential Revision: https://reviews.llvm.org/D95474	2021-01-26 16:00:40 -05:00
Jon Chesterfield	7baff00eee	[libomptarget][cuda] Gracefully handle missing cuda library [libomptarget][cuda] Gracefully handle missing cuda library If using dynamic cuda, and it failed to load, it is not safe to call cuGetErrorString. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95412	2021-01-26 20:43:07 +00:00
Jon Chesterfield	fdeffd6fb0	[libomptarget][cuda] Only run tests when sure there is cuda available [libomptarget][cuda] Only run tests when sure there is cuda available Prior to D95155, building the cuda plugin implied cuda was installed locally. With that change, every machine can build a cuda plugin, but they won't all have cuda and/or an nvptx card installed locally. This change enables the nvptx tests when either: - libcuda is present - the user has forced use of the dlopen stub The default case when there is no cuda detected will no longer attempt to run the tests on nvptx hardware, as was the case before D95155. Reviewed By: jdoerfert, ronlieb Differential Revision: https://reviews.llvm.org/D95467	2021-01-26 20:41:06 +00:00
Atmn Patel	ec8f4a38c8	[OpenMP][Libomptarget] Introduce Remote Offloading Plugin This introduces a remote offloading plugin for libomptarget. This implementation relies on gRPC and protobuf, so this library will only build if both libraries are available on the system. The corresponding server is compiled to `openmp-offloading-server`. This is a large change, but the only way to split this up is into RTL/server but I fear that could introduce an inconsistency amongst them. Ideally, tests for this should be added to the current ones that but that is problematic for at least one reason. Given that libomptarget registers plugin on a first-come-first-serve basis, if we wanted to offload onto a local x86 through a different process, then we'd have to either re-order the plugin list in `rtl.cpp` (which is what I did locally for testing) or find a better solution for runtime plugin registration in libomptarget. Differential Revision: https://reviews.llvm.org/D95314	2021-01-26 15:33:38 -05:00
Jon Chesterfield	357eea6e8b	Revert "[libomptarget][cuda] Gracefully handle missing cuda library" This reverts commit `fafd45c01f`.	2021-01-26 03:14:53 +00:00
Jon Chesterfield	fafd45c01f	[libomptarget][cuda] Gracefully handle missing cuda library [libomptarget][cuda] Gracefully handle missing cuda library If using dynamic cuda, and it failed to load, it is not safe to call cuGetErrorString. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95412	2021-01-26 02:54:00 +00:00
Jon Chesterfield	95f0d1edaf	[libomptarget] Compile with older cuda, revert D95274 [libomptarget] Compile with older cuda, revert D95274 Fixes regression reported in comments of D95274. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95367	2021-01-25 16:12:56 +00:00
Jon Chesterfield	e5e448aafa	[libomptarget][cuda] Fix build, change missed from D95274	2021-01-24 18:30:04 +00:00
Jon Chesterfield	dc70c56be5	[libomptarget][amdgpu][nfc] Update comments [libomptarget][amdgpu][nfc] Update comments Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95295	2021-01-23 22:53:58 +00:00
Jon Chesterfield	78b0630b72	[libomptarget][cuda] Call v2 functions explicitly [libomptarget][cuda] Call v2 functions explicitly rtl.cpp calls functions like cuMemFree that are replaced by a macro in cuda.h with cuMemFree_v2. This patch changes the source to use the v2 names consistently. See also D95104, D95155 for the idea. Alternatives are to use a mixture, e.g. call the macro names and explictly dlopen the _v2 names, or to keep the current status where the symbols are replaced by macros in both files Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95274	2021-01-23 20:33:13 +00:00
Jon Chesterfield	47e95e87a3	[libomptarget] Build cuda plugin without cuda installed locally [libomptarget] Build cuda plugin without cuda installed locally Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used. This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp. The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95155	2021-01-23 00:15:04 +00:00
Joseph Huber	119a9ea13f	[OpenMP] Fix failing test due to change in offloading flags Summary: Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95133	2021-01-21 14:09:36 -05:00
Jon Chesterfield	5d165f0b89	[libomptarget][amdgpu] Fix kernel launch tracing to match previous behavior Restore control of kernel launch tracing to be >= 1 as it was before export LIBOMPTARGET_KERNEL_TRACE=1 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94695	2021-01-14 18:13:22 +00:00
Shilei Tian	68ff52ffea	[OpenMP] Fixed the link error that cannot find static data member Constant static data member can be defined in the class without another define after the class in C++17. Although it is C++17, Clang can still handle it even w/o the flag for C++17. Unluckily, GCC cannot handle that. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D94541	2021-01-12 16:48:28 -05:00
Jon Chesterfield	33e2494bea	[libomptarget][amdgpu][nfc] Fix build on centos [libomptarget][amdgpu][nfc] Fix build on centos rtl.cpp replaced 224 with a #define from elf.h, but that doesn't work on a centos 7 build machine with an old elf.h Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D94528	2021-01-12 19:40:03 +00:00
Shilei Tian	bdd1ad5e5c	[OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES Some LLVM headers are generated by CMake. Before the installation, LLVM's headers are distributed everywhere, some of which are in `${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in `${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in `${LLVM_INSTALLATION_ROOT}/include/llvm`. OpenMP now depends on LLVM headers. Some headers depend on headers generated by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`, we need to tell OpenMP where it can find those headers, especially those still have not been copied/installed. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D94534	2021-01-12 14:32:38 -05:00
Shilei Tian	0871d6d516	[OpenMP] Move memory manager to plugin and make it a common interface The lifetime of `libomptarget` and its opened plugins are not aligned and it's hard for `libomptarget` to determine when the plugins are destroyed. As a result, some issues (see D94256 for details) occur on some platforms. Actually, if we take target memory as target resources, same as other resources, such as CUDA streams, in each plugin, then the memory manager should also be in the plugin. Also considering some platforms may want to opt out the feature, it makes sense to move the memory manager to plugin, make it a common interface, and let plguin developers determine whether they need it. This is what this patch does. CUDA plugin is taken as example to show how to integrate it. In this way, we can also get a bonus that different thresholds can be set for different platforms. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D94379	2021-01-11 21:33:42 -05:00
Shilei Tian	a81c68ae6b	[OpenMP] Take elf_common.c as a interface library For now `elf_common.c` is taken as a common part included into different plugin implementations directly via `#include "../../common/elf_common.c"`, which is not a best practice. Since it is simple enough such that we don't need to create a real library for it, we just take it as a interface library so that other targets can link it directly. Another advantage of this method is, we don't need to add the folder into header search path which can potentially pollute the search path. VE and AMD platforms have not been tested because I don't have target machines. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94443	2021-01-11 17:34:26 -05:00
Joseph Huber	2ce16810f2	[OpenMP] Always print error messages in libomptarget CUDA plugin Summary: Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device. Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D94263	2021-01-07 17:47:32 -05:00
Joseph Huber	fe5d51a489	[OpenMP] Add using bit flags to select Libomptarget Information Summary: This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in. Reviewers: jdoerfert JonChesterfield grokos Differential Revision: https://reviews.llvm.org/D93727	2021-01-04 12:03:15 -05:00
Jon Chesterfield	7c59614394	[libomptarget][amdgpu] clang-format src/rtl.cpp	2020-12-09 19:45:51 +00:00
Jon Chesterfield	c9bc414840	[libomptarget][amdgpu] Let default number of teams equal number of CUs	2020-12-09 19:35:34 +00:00
Jon Chesterfield	e191d31159	[libomptarget][amdgpu] Robust handling of device_environment symbol	2020-12-09 19:21:51 +00:00
Jon Chesterfield	cab9f69235	[libomptarget][amdgpu] Improve diagnostics on arch mismatch	2020-12-09 18:55:53 +00:00
Jon Chesterfield	71f4693020	[libomptarget][amdgpu] Add plumbing to call into hostrpc lib, if linked	2020-12-07 15:24:01 +00:00
Jon Chesterfield	e1b8e8a1f4	[libomptarget][amdgpu] Skip device_State allocation when using bss global	2020-12-06 12:13:56 +00:00
Jon Chesterfield	f628eef98a	[libomptarget][amdgpu] Fix latent race in load binary	2020-12-04 16:29:09 +00:00
Jon Chesterfield	ae9d96a656	[libomptarget][amdgpu] Address compiler warnings, drive by fixes [libomptarget][amdgpu] Address compiler warnings, drive by fixes Initialize some variables, remove unused ones. Changes the debug printing condition to align with the aomp test suite. Differential Revision: https://reviews.llvm.org/D92559	2020-12-03 11:09:12 +00:00
Pushpinder Singh	afc09c6fe4	[libomptarget][AMDGPU] Remove MaxParallelLevel Removes MaxParallelLevel references from rtl.cpp and drops resulting dead code. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D92463	2020-12-03 00:27:03 -05:00
Jon Chesterfield	89a0f48c58	[libomptarget][cuda] Detect missing symbols in plugin at build time [libomptarget][cuda] Detect missing symbols in plugin at build time Passes -z,defs to the linker. Error on unresolved symbol references. Otherwise, those unresolved symbols present as target code running on the host as the plugin fails to load. This is significantly harder to debug than a link time error. Flag matches that passed by amdgcn and ve plugins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D92143	2020-11-27 15:39:41 +00:00
Jon Chesterfield	26790ed248	[libomptarget] Require LLVM source tree to build libomptarget [libomptarget] Require LLVM source tree to build libomptarget This is to permit reliably #including files from the LLVM tree in libomptarget, as an improvement on the copy and paste that is currently in use. See D87841 for the first example of removing duplication given this new requirement. The weekly openmp dev call reached consensus on this approach. See also D87841 for some alternatives that were considered. In the future, we may want to introduce a new top level repo for shared constants, or start using the ADT library within openmp. This will break sufficiently exotic build systems, trivial fixes as below. Building libomptarget as part of the monorepo will continue to work. If openmp is built separately, it now requires a cmake macro indicating where to find the LLVM source tree. If openmp is built separately, without the llvm source tree already on disk, the build machine will need a copy of a subset of the llvm source tree and the cmake macro indicating where it is. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D89426	2020-10-21 18:53:00 +01:00
JonChesterfield	55dc123555	[libomptarget][amdgcn] Refactor memcpy to eliminate maps [libomptarget][amdgcn] Refactor memcpy to eliminate maps Builds on D89776 to remove now dead code. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D89888	2020-10-21 16:59:33 +01:00
Pushpinder Singh	aa616efbb3	[libomptarget][AMDGPU][NFC] Split atmi_memcpy for h2d and d2h The calls to atmi_memcpy presently determine the direction of copy (host to device or device to host) by storing pointers in a map during malloc and looking up the pointers during memcpy. As each call site already knows the direction, this stash+lookup can be eliminated. This NFC will be followed by a functional one that deletes those map lookups. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D89776 Change-Id: I1d9089bc1e56b3a9a30e334735fa07dee1f84990	2020-10-20 06:29:32 -04:00
JonChesterfield	7d2ecef5ed	[openmp][libomptarget] Include header from LLVM source tree [openmp][libomptarget] Include header from LLVM source tree The change is to the amdgpu plugin so is unlikely to break anything. The point of contention is whether libomptarget can depend on LLVM. A community discussion was cautiously not opposed yesterday. This introduces a compile time dependency on the LLVM source tree, in this case expressed as skipping the building of the plugin if LLVM_MAIN_INCLUDE_DIR is not set. One the source files will #include llvm/Frontend/OpenMP/OMPGridValues.h, instead of copy&pasting the numbers across. For users that download the monorepo, the llvm tree is already on disk. This will inconvenience users who download only the openmp source as a tar, as they would now also have to download (at least a file or two) from the llvm source, if they want to build the parts of the openmp project that (post this patch) depend on llvm. There was interest expressed in going further - using llvm tools as part of building libomp, or linking against llvm libraries. That seems less clear cut an improvement and worthy of further discussion. This patch seeks only to change policy to support openmp depending on the llvm source tree. Including in the other direction, or using libraries / tools etc, are purposefully out of scope. Reviewers are a best guess at interested parties, please feel free to add others Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D87841	2020-10-15 15:46:19 +01:00
Manoel Roemmer	c816ee13ad	[OpenMP][VE plugin] Fixing failure to build VE plugin with consolidated error handling in libomptarget The libomptarget VE plugin [[ http://lab.llvm.org:8014/builders/clang-ve-ninja/builds/8937/steps/build-unified-tree/logs/stdio \| fails zu build ]] after `ae95ceeb8f` . Differential Revision: https://reviews.llvm.org/D88476	2020-09-29 17:38:01 +02:00
Ye Luo	03111e5e7a	[OpenMP] Protect unrecogonized CUDA error code If an error code can not be recognized by cuGetErrorString, errStr remains null and causes crashing at DP() printing. Protect this case. Reviewed By: jhuber6, tianshilei1992 Differential Revision: https://reviews.llvm.org/D87980	2020-09-21 13:43:08 -04:00
JonChesterfield	a9be2b5cb2	[libomptarget] Disable build of amdgpu plugin as it doesn't build with rocm.	2020-09-18 18:10:27 +01:00
Joseph Huber	ae209397b1	[OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins Summary: This patch starts adding support for adding information dumps to libomptarget and rtl plugins. The information printing is controlled by the LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this patch is to provide the user with additional information about the device during kernel execution and providing the user with information dumps in the case of failure. This patch added the ability to dump the pointer mapping table as well as printing the number of blocks and threads in the cuda RTL. Reviewers: jdoerfort gkistanova ye-luo Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D87165	2020-09-09 12:03:56 -04:00
Joseph Huber	ae95ceeb8f	[OpenMP] Consolidate error handling and debug messages in Libomptarget Summary: This patch consolidates the error handling and messaging routines to a single file omptargetmessage. The goal is to simplify the error handling interface prior to adding more error handling support Reviewers: jdoerfert grokos ABataev AndreyChurbanov ronlieb JonChesterfield ye-luo tianshilei1992 Subscribers: danielkiss guansong jvesely kerbowa nhaehnle openmp-commits sstefan1 yaxunl	2020-09-01 15:28:19 -04:00
JonChesterfield	5d989fb37d	[libomptarget][amdgpu] Improve thread safety, remove dead code	2020-08-26 22:04:03 +01:00
Jon Chesterfield	28fbf422f2	[libomptarget][amdgpu] Update plugin CMake to work with latest rocr library	2020-08-26 20:01:42 +01:00
Jon Chesterfield	6e1b11087f	[libomptarget][amdgpu] Support building with static rocm libraries	2020-08-19 15:44:30 +01:00
Johannes Doerfert	5272d29e2c	[OpenMP][CUDA] Keep one kernel list per device, not globally. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86039	2020-08-16 14:38:35 -05:00
Johannes Doerfert	aa27cfc1e7	[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel) Instead of calling `cuFuncGetAttribute` with `CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation, we can do it for the first one and cache the result as part of the `KernelInfo` struct. The only functional change is that we now expect `cuFuncGetAttribute` to succeed and otherwise propagate the error. Ignoring any error seems like a slippery slope... Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86038	2020-08-16 14:38:33 -05:00
Jon Chesterfield	d0b312955f	[libomptarget] Implement host plugin for amdgpu [libomptarget] Implement host plugin for amdgpu Replacement for D71384. Primary difference is inlining the dependency on atmi followed by extensive simplification and bugfixes. This is the latest version from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with minor patches and a rename from hsa to amdgpu, on the basis that this can't be used by other implementations of hsa without additional work. This will not build unless the ROCM_DIR variable is passed so won't break other builds. That variable is used to locate two amdgpu specific libraries that ship as part of rocm: libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime These libraries build from source. The build scripts in those repos are for shared libraries, but can be adapted to statically link both into this plugin. There are caveats. - This works well enough to run various tests and benchmarks, and will be used to support the current clang bring up - It is adequately thread safe for the above but there will be races remaining - It is not stylistically correct for llvm, though has had clang-format run - It has suboptimal memory management and locking strategies - The debug printing / error handling is inconsistent I would like to contribute this pretty much as-is and then improve it in-tree. This would be advantagous because the aomp12 branch that was in use for fixing this codebase has just been joined with the amd internal rocm dev process. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85742	2020-08-15 23:58:28 +01:00
George Rokos	40470eb27a	[libomptarget][NFC] Replace `%ld` with PRId64 for data of type int64_t. The standard way of printing `int64_t` data is via the PRId64 macro, `ld` is for `long int` and int64_t is not guaranteed to be typedef'ed as `long int` on all platforms. E.g. on Windows we get mismatch warnings. Differential Revision: https://reviews.llvm.org/D85353	2020-08-05 13:28:35 -07:00
Ye Luo	c5348aecd7	[OpenMP] Use primary context in CUDA plugin Summary: Retaining per device primary context is preferred to creating a context owned by the plugin. From CUDA documentation 1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html 2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf 3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX Two issues are addressed by this patch: 1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle 2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal" Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags. Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld Reviewed By: jdoerfert Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992 Tags: #openmp Differential Revision: https://reviews.llvm.org/D82718	2020-07-07 10:14:51 -04:00
Ye Luo	45bb073da8	[OpenMP] fix clang warning about printf format in CUDA plugin Summary: Warnings are printed by clang when building LIBOMPTARGET_ENABLE_DEBUG=ON due incorrect format string. Reviewers: tianshilei1992, jdoerfert Reviewed By: tianshilei1992 Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D82789	2020-06-29 22:35:39 -04:00
Shilei Tian	a014fbbc21	[OpenMP] Improve D2D memcpy to use more efficient driver API Summary: In current implementation, D2D memcpy is first to copy data back to host and then copy from host to device. This is very efficient if the device supports D2D memcpy, like CUDA. In this patch, D2D memcpy will first try to use native supported driver API. If it fails, fall back to original way. It is worth noting that D2D memcpy in this scenerio contains two ideas: - Same devices: this is the D2D memcpy in the CUDA context. - Different devices: this is the PeerToPeer memcpy in the CUDA context. My implementation merges this two parts. It chooses the best API according to the source device and destination device. Reviewers: jdoerfert, AndreyChurbanov, grokos Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D80649	2020-06-04 16:59:06 -04:00
Manoel Roemmer	6b9e43c67e	[Openmp][VE] Libomptarget plugin for NEC SX-Aurora This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector Engine (VE target). The code is largely based on the existing generic-elf plugin and uses the NEC VEO and VEOSINFO libraries for offloading. Differential Revision: https://reviews.llvm.org/D76843	2020-05-12 10:47:30 +02:00
Shilei Tian	cb038927ef	[OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D79255	2020-05-03 15:59:06 -04:00
Shilei Tian	4031bb982b	[OpenMP] Refined CUDA plugin to put all CUDA operations into class Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jfb, yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77951	2020-04-13 13:32:46 -04:00
Shilei Tian	feed674dec	[OpenMP] Introduce stream pool to make sure the correctness of device synchr... ...onization Summary: In previous patch, in order to optimize performance, we only synchronize once for each target region. The syncrhonization is via stream synchronization. However, in the extreme situation, the performce might be bad. Consider the following case: There is a task that requires transferring huge amount of data (call many times of data transferring function). It is scheduled to the first stream. And then we have 255 very light tasks scheduled to the remaining 255 streams (by default we have 256 streams). They can be finished before we do synchronization at the end of the first task. Next, we get another very huge task. It will be scheduled again to the first stream. Now the first task finishes its kernel launch and call stream synchronization. Right now, the stream already contains two kernels, and the synchronization will wait until the two kernels finish instead of just the first one for the first task. In this patch, we introduce stream pool. After each synchronization, the stream will be returned back to the pool to make sure that for each synchronization, only expected operations are waited. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77412	2020-04-11 07:08:56 -04:00
Shilei Tian	03ff643d2e	[OpenMP] Put old APIs back and added new _async series for backward compatibility Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77822	2020-04-09 22:40:58 -04:00
Shilei Tian	32ed29271f	[OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream Summary: This patch introduces two things for offloading: 1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info`, which is a new struct that only has one field, `void Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future. 2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation. Reviewers: jdoerfert, ye-luo Reviewed By: jdoerfert Subscribers: yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77005	2020-04-07 14:55:47 -04:00
Jon Chesterfield	856c995436	[libomptarget] Add missing elf_end call in elf_common.c Summary: [libomptarget] Add missing elf_end call in elf_common.c Noticed when reviewing D76843. Reviewers: simoll, jdoerfert, efocht, AndreyChurbanov, grokos, manorom Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76874	2020-03-26 19:07:33 +00:00
George Rokos	0a42c9bfe4	Enable CUDA offloading on aarch64 host Differential Revision: https://reviews.llvm.org/D76469	2020-03-20 15:38:47 -07:00
Johannes Doerfert	a5153dbc36	[OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D74145	2020-02-11 22:07:14 -06:00
Kazuaki Ishizaki	4c6a098ad5	[OpenMP] NFC: Fix trivial typos in comments Reviewers: jdoerfert, Jim Reviewed By: Jim Subscribers: Jim, mgorny, guansong, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72285	2020-01-07 14:05:03 +08:00
Bryan Chan	4d3198e243	[OpenMP] build offload plugins before testing them Summary: "make check-all" or "make check-libomptarget" would attempt to run offloading tests before the offload plugins are built. This patch corrects that by adding dependencies to the libomptarget CMake rules. Reviewers: jdoerfert Subscribers: mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70803	2019-11-28 17:43:56 -05:00
Ron Lieberman	dc34b1c94d	Test commit: adds a . to comment. NFC	2019-11-04 16:51:03 -06:00
Sergey Dmitriev	4b343fd84c	[Clang][OpenMP Offload] Create start/end symbols for the offloading entry table with a help of a linker Linker automatically provides __start_<section name> and __stop_<section name> symbols to satisfy unresolved references if <section name> is representable as a C identifier (see https://sourceware.org/binutils/docs/ld/Input-Section-Example.html for details). These symbols indicate the start address and end address of the output section respectively. Therefore, renaming OpenMP offload entries section name from ".omp.offloading_entries" to "omp_offloading_entries" to use this feature. This is the first part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943). Differential Revision: https://reviews.llvm.org/D68070 llvm-svn: 373118	2019-09-27 20:00:51 +00:00
Michael Kruse	78769ec403	[libomptarget] Harmonize emitting CUDA errors and general debug messages. Ensures that CUDA fail reasons (such as "No CUDA-capable device detected") are printed together with libomptarget's debug message (e.g. "Error when setting CUDA context"). Previously, the former was printed only in CMAKE_BUILD_TYPE=Debug builds while the latter was enabled by LIBOMPTARGET_ENABLE_DEBUG. With this change, also only call cuGetErrorString when the error will be printed. Suggested-by: Ye Luo <xw111luoye@gmail.com> Differential Revision: https://reviews.llvm.org/D65687 llvm-svn: 367910	2019-08-05 19:12:10 +00:00
Gheorghe-Teodor Bercea	aace6d285d	[OpenMP][libomptarget] Add support for declare target to clause under unified memory Summary: This patch adds support for handling variables under the: ``` #pragma omp declare target to() ``` clause when the ``` #pragma omp requires unified_shared_memory ``` is used. The address of the host variable is copied into the device pointer just like for the declare target link case. Reviewers: ABataev, caomhin, grokos, AlexEichenberger Reviewed By: grokos Subscribers: jcownie, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D63106 llvm-svn: 363825	2019-06-19 15:48:10 +00:00
Gheorghe-Teodor Bercea	c5fe030c16	[OpenMP][libomptarget] Enable usage of unified memory for declare target link variables Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available. Reviewers: ABataev, caomhin, grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D60884 llvm-svn: 362505	2019-06-04 15:05:53 +00:00
Chandler Carruth	57b08b0944	Update more file headers across all of the LLVM projects in the monorepo to reflect the new license. These used slightly different spellings that defeated my regular expressions. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351648	2019-01-19 10:56:40 +00:00
Jonas Hahnfeld	bb51d39871	[libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI. cuDeviceGetProperties has apparently been deprecated since CUDA 5.0. Nvidia started using annotations only in CUDA 9.2, so nobody noticed nor cared before. The new function returns the same values, tested with a P100. Differential Revision: https://reviews.llvm.org/D51624 llvm-svn: 341372	2018-09-04 15:13:28 +00:00
Joachim Protze	bb869f42b7	[libomptarget] Also support several images for elf In revision r336569 (D49036) libomptarget support for multiple nvidia images has been fixed in case a target region resides inside one or multiple libraries and in the compiled application. But the issues is still present for elf images. This fix will also support multiple images for elf. Patch by Jannis Klinkenberg Reviewers: protze.joachim, ABataev, grokos Reviewed By: protze.joachim, ABataev, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49418 llvm-svn: 337355	2018-07-18 07:23:46 +00:00
Alexey Bataev	2622e9e5b3	[OPENMP, NVPTX] Support several images in the executable. Summary: Currently Cuda plugin supports loading of the single image, though we may have the executable with the several images, if it has target regions inside of the dynamically loaded library. Patch allows to load multiple images. Reviewers: grokos Subscribers: guansong, openmp-commits, kkwli0 Differential Revision: https://reviews.llvm.org/D49036 llvm-svn: 336569	2018-07-09 17:46:55 +00:00
Jonas Hahnfeld	65e0b8784c	[CMake] Unify install path for libraries Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands. This also fixes installation of libomptarget-nvptx that previously didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX. Differential Revision: https://reviews.llvm.org/D47130 llvm-svn: 333284	2018-05-25 15:56:41 +00:00
Guansong Zhang	e1c7a46d5b	[OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side Summary: Enable the device side debug messages at compile time, use env var to control at runtime. To achieve this, an environment data block is passed to the device lib when it is loaded. By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D46210 llvm-svn: 331550	2018-05-04 19:29:28 +00:00
Jonas Hahnfeld	a349d4820c	[libomptarget] Check for library with CUDA Driver API That's what we really need to link the CUDA plugin against, not the CUDA runtime API in CUDA_LIBRARIES! While the latter comes with the CUDA SDK, the Driver API is installed with the kernel driver and there is at most one per system. As fallback we can use the stubs library distributed with the CUDA SDK for linking. Differential Revision: https://reviews.llvm.org/D42643 llvm-svn: 323787	2018-01-30 16:49:13 +00:00
Jonas Hahnfeld	c189523529	[libomptarget] Only use CUDA Driver API Use equivalents for the last calls to the Runtime API. Remove stray assert in case of an error found during review, we should only return OFFLOAD_FAIL. Differential Revision: https://reviews.llvm.org/D42686 llvm-svn: 323786	2018-01-30 16:49:06 +00:00
Jonas Hahnfeld	5af381acad	[CMake] Refactor common settings and flags These are needed by both libraries, so we can do that in a common namespace and unify configuration parameters. Also make sure that the user isn't requesting libomptarget if the library cannot be built on the system. Issue an error in that case. Differential Revision: https://reviews.llvm.org/D40081 llvm-svn: 319342	2017-11-29 19:31:48 +00:00
Sergey Dmitriev	b305d26b57	[OpenMP] libomptarget: move debugging dumps under control of env var LIBOMPTARGET_DEBUG Disable default debugging dumps for libomptarget and plugins and move dumps under control of environment variable LIBOMPTARGET_DEBUG=<integer>. Dumps are enabled when LIBOMPTARGET_DEBUG is set to a positive integer value. Debugging dumps are available only in debug build; release build does not support it. Differential Revision: https://reviews.llvm.org/D33227 llvm-svn: 310841	2017-08-14 15:09:59 +00:00
George Rokos	0e86bfb5bb	[OpenMP] libomptarget: eliminate compiler warnings at build Thanks to Sergey Dmitriev for submitting the patch. Differential Revision: https://reviews.llvm.org/D33851 llvm-svn: 304601	2017-06-02 22:41:35 +00:00
George Rokos	1546d31924	[OpenMP] Changes in the plugin interface This patch chagnes the plugin interface so that: 1) future plugins can take advantage of systems with shared CPU/device storage 2) instead of using base addresses, target regions are launched by providing target addresseds and base offsets explicitly. Differential revision: https://reviews.llvm.org/D33028 llvm-svn: 302663	2017-05-10 14:12:36 +00:00
George Rokos	c13df8e5e0	[OpenMP] Optimized default kernel launch parameters in CUDA plugin Differential Revision: https://reviews.llvm.org/D32321 llvm-svn: 301321	2017-04-25 16:34:13 +00:00
George Rokos	01954092d0	[OpenMP] CUDA plugin: More descriptive error messages Differential Revision: https://reviews.llvm.org/D31206 llvm-svn: 298527	2017-03-22 17:36:22 +00:00
George Rokos	f3fe2dd235	[OpenMP] CUDA plugin: add include directory for libelf Allow the user to manually specify where libelf is installed. Differential Revision: https://reviews.llvm.org/D31207 llvm-svn: 298515	2017-03-22 16:41:46 +00:00

1 2 3 4

152 Commits