llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	e0b9364b5c	[AMDGPU] Add gfx90a and gfx940 to get_elf_mach_gfx_name.cpp Differential Revision: https://reviews.llvm.org/D120849	2022-03-17 13:05:07 -07:00
Shilei Tian	f6639a424b	[OpenMP][CUDA] Fix the check of `setContext`	2022-03-09 18:48:44 -05:00
Shilei Tian	39d3283a08	[OpenMP][CUDA] Avoid calling `cuCtxSetCurrent` redundantly Currently we set ccontext everywhere accordingly, but that causes many unnecessary function calls. For example, in the resource pool, if we need to resize the pool, we need to get from allocator. Each call to allocate sets the current context once, which is unnecessary. In this patch, we set the context only in the entry interface functions, if needed. Actually in the best way this should be implemented via RAII, but since `cuCtxSetCurrent` could return error, and we don't use exception, we can't stop the execution if RAII fails. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121322	2022-03-09 16:32:47 -05:00
Shilei Tian	5105c7cd78	[OpenMP][CUDA] Fix an issue that multiple `CUmodule` are could be overwritten This patch fixes the issue introduced in `14de0820e8` and D120089, that if dynamic libraries are used, the `CUmodule` array could be overwritten. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D121308	2022-03-09 14:55:20 -05:00
Johannes Doerfert	14de0820e8	[OpenMP][FIX] Ensure the modules vector is filled as others are The modules vector was for some reason special which could lead to it not being of the same size (=num devices). Easiest solution is to treat it like we do all the other vectors.	2022-03-08 23:45:43 -06:00
Johannes Doerfert	1660288b28	[OpenMP][CUDA] Use one event pool per device An event pool, similar to the stream pool, needs to be kept per device. For one, events are associated with cuda contexts which means we cannot destroy the former after the latter. Also, CUDA documentation states streams and events need to be associated with the same context, which we did not ensure at all. Differential Revision: https://reviews.llvm.org/D120142	2022-03-07 23:43:05 -06:00
Johannes Doerfert	10aa83ff74	[OpenMP] Allow to explicitly deinitialize device resources There are two problems this patch tries to address: 1) We currently free resources in a random order wrt. plugin and libomptarget destruction. This patch should ensure the CUDA plugin is less fragile if something during the deinitialization goes wrong. 2) We need to support (hard) pause runtime calls eventually. This patch allows us to free all associated resources, though we cannot reinitialize the device yet. Follow up patch will associate one event pool per device/context. Differential Revision: https://reviews.llvm.org/D120089	2022-03-07 23:43:04 -06:00
Shilei Tian	7f7c2c34b6	[OpenMP][CMake] Clean up the CMake variable `LIBOMPTARGET_LLVM_INCLUDE_DIRS` `LIBOMPTARGET_LLVM_INCLUDE_DIRS` is currently checked and included for multiple times redundantly. This patch is simply a clean up. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121055	2022-03-05 22:37:59 -05:00
Aakanksha	840695814a	[AMDGPU] Add gfx1036 target Differential Revision: https://reviews.llvm.org/D120846	2022-03-02 23:26:38 +00:00
Joseph Huber	777039a51c	[Libomptarget] Run CPU offloading tests using the new driver This patch adds a new target to the OpenMP CPU offloading tests. This tests the usage of the new driver for CPU offloading. If this all works then we can move to transition to the new driver as the default. Depends on D119613 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119736	2022-02-15 15:05:32 -05:00
Shilei Tian	aca33b0b37	[OpenMP][CUDA] Remove the hard team limit Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119313	2022-02-10 18:07:46 -05:00
Shilei Tian	f6685f7746	[OpenMP][CUDA] Refine the logic to determine grid size This patch refines the logic to determine grid size as previous method can escape the check of whether `CudaBlocksPerGrid` could be greater than the actual hardware limit. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119311	2022-02-10 14:13:32 -05:00
Joseph Huber	f8ffac5987	[OpenMP] Enable new driver tests for AMDGPU This patch enables running the new driver tests for AMDGPU. Previously this was disabled because some tests failed. This was only because the new driver tests hadn't been listed as unsupported or expected to fail. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D119240	2022-02-08 09:55:29 -05:00
Joseph Huber	034adaf5be	[OpenMP] Completely remove old device runtime This patch completely removes the old OpenMP device runtime. Previously, the old runtime had the prefix `libomptarget-new-` and the old runtime was simply called `libomptarget-`. This patch makes the formerly new runtime the only runtime available. The entire project has been deleted, and all references to the `libomptarget-new` runtime has been replaced with `libomptarget-`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D118934	2022-02-04 15:31:33 -05:00
Joseph Huber	4d4587d5b0	[OpenMP] Remove new driver tests for AMDGPU Some of the new driver tests are flaky on AMDGPU, remove for now.	2022-01-31 23:32:33 -05:00
Joseph Huber	0ac799b5c9	[Libomptarget] Run GPU offloading tests using the new drvier This patch adds a new target to the tests to run using the new driver as the method for generating offloading code. Depends on D116541 Differential Revision: https://reviews.llvm.org/D118637	2022-01-31 23:11:43 -05:00
Sri Hari Krishna Narayanan	f44e41af41	Runtime for Interop directive This implements the runtime portion of the interop directive. It expects the frontend and IRBuilder portions to be in place for proper execution. It currently works only for GPUs and has several TODOs that should be addressed going forward. Reviewed By: RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D106674	2022-01-27 15:16:24 -05:00
Jon Chesterfield	e08f3bfe58	[openmp] Disable build of old runtimes by default The old runtime is not tested by CI. Disable the build prior to the llvm-14 branch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118268	2022-01-26 19:17:31 +00:00
Jon Chesterfield	ca84c43d69	[openmp][amdgpu] Disable tests on old runtime, enable tests on new one	2022-01-19 15:49:47 +00:00
Jon Chesterfield	e35c8f541c	[openmp][amdgpu] Temporarily disable tests on old runtime	2022-01-19 15:39:00 +00:00
Jon Chesterfield	a74826d30a	[openmp][amdgpu] Replace unsigned long with uint64_t Some types need to be 64 bit. Unsigned long is a hazard there. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116963	2022-01-10 22:19:30 +00:00
Shilei Tian	943d1d83dd	[OpenMP][CUDA] Add resource pool for CUevent Following D111954, this patch adds the resource pool for CUevent. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D116315	2021-12-28 17:42:38 -05:00
Shilei Tian	357c8031ff	[OpenMP][Plugin] Minor adjustments to ResourcePool This patch makes some minor adjustments to `ResourcePool`: - Don't initialize the resources if `Size` is 0 which can avoid assertion. - Add a new interface function `clear` to release all hold resources. - If initial size is 0, resize to 1 when the first request is encountered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116340	2021-12-28 16:11:03 -05:00
Shilei Tian	a697a0a4b6	[OpenMP][Plugin] Introduce generic resource pool Currently CUDA streams are managed by `StreamManagerTy`. It works very well. Now we have the need that some resources, such as CUDA stream and event, will be hold by `libomptarget`. It is always good to buffer those resources. What's more important, given the way that `libomptarget` and plugins are connected, we cannot make sure whether plugins are still alive when `libomptarget` is destroyed. That leads to an issue that those resouces hold by `libomptarget` might not be released correctly. As a result, we need an unified management of all the resources that can be shared between `libomptarget` and plugins. `ResourcePoolTy` is designed to manage the type of resource for one device. It has to work with an allocator which is supposed to provide `create` and `destroy`. In this way, when the plugin is destroyed, we can make sure that all resources allocated from native runtime library will be released correctly, no matter whether `libomptarget` starts its destroy. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111954	2021-12-27 11:32:14 -05:00
Jon Chesterfield	38af5b4fd1	[libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches	2021-12-17 22:28:31 +00:00
Jon Chesterfield	91dfb32f2f	[openmp][amdgpu][nfc] Mark all external functions extern C to get type checking	2021-12-17 18:46:43 +00:00
Carlo Bertolli	d3abb04e14	[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter I missed the async info parameter in the first version of this API. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115887	2021-12-17 15:58:18 +00:00
Carlo Bertolli	d83dc4c648	[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115771	2021-12-15 15:33:17 +00:00
Carlo Bertolli	28309c5436	[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function (submitted at D115267) - Separating the control flow path of asynchronous and synchronous kernel launch functions** (this diff) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115273	2021-12-10 19:21:05 +00:00
Carlo Bertolli	cc8dc5e28b	[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279	2021-12-08 23:02:39 +00:00
Jon Chesterfield	14ff611fe1	Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version" This reverts commit `6de698bf10`. It didn't build in the dynamic_hsa configuration	2021-12-08 08:23:12 +00:00
Carlo Bertolli	6de698bf10	[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy. Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115279	2021-12-07 23:05:23 +00:00
Carlo Bertolli	d9b1d827d2	[NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version. This patch prepares the plugin for asynchronous implementation by: - Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function Actual separation of kernel launch code (async vs sync) is a following patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115267	2021-12-07 21:02:45 +00:00
Ye Luo	21a51cebf1	[OpenMP][libomptarget] amdgpu plugin adds runpath for dependencies amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D115198	2021-12-06 18:19:18 -06:00
Jon Chesterfield	a05a0c3c2f	[libomptarget] Add cmake variables to disable building the amdgpu or cuda plugins Analogous to the controls on building device runtimes Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115148	2021-12-06 16:42:26 +00:00
Jon Chesterfield	9e08c2054a	[openmp] Enable tests on new devicertl on amdgpu Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D114891	2021-12-06 15:26:18 +00:00
Matt Arsenault	935abeaace	OpenMP: Correctly query location for amdgpu-arch This was trying to figure out the build path for amdgpu-arch, and making assumptions about where it is which were not working on my system. Whether a standalone build or not, we should have a proper imported target to get the location from.	2021-11-29 16:31:32 -05:00
Jon Chesterfield	ae5348a38e	[openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments OpenMP (compiler) does not currently request any implicit kernel arguments. OpenMP (runtime) allocates and initialises a reasonable guess at the implicit kernel arguments anyway. This change makes the plugin check the number of explicit arguments, instead of all arguments, and puts the pointer to hostcall buffer in both the current location and at the offset expected when implicit arguments are added to the metadata by D113538. This is intended to keep things running while fixing the oversight in the compiler (in D113538). Once that patch lands, and a following one marks openmp kernels that use printf such that the backend emits an args element with the right type (instead of hidden_node), the over-allocation can be removed and the hardcoded 8*e+3 offset replaced with one read from the .offset of the corresponding metadata element. Reviewed By: estewart08 Differential Revision: https://reviews.llvm.org/D114274	2021-11-22 23:00:20 +00:00
Jon Chesterfield	04954824ee	[openmp][amdgpu][nfc] Simplify implicit args handling Removes a +x/-x pair on the only store/load of a variable and deletes some nearby dead code. Also reduces the size of the implicit struct to reflect the code currently emitted by clang. Differential Revision: https://reviews.llvm.org/D114270	2021-11-19 20:18:23 +00:00
Jon Chesterfield	9cdaf0b01b	[openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller	2021-11-19 18:45:17 +00:00
Jon Chesterfield	4f4c826e75	[libomptarget] Drop remote plugin cmake version requirement to match llvm LLVM docs at https://llvm.org/docs/CMake.html#quick-start state 3.13.4 Reviewed By: atmnpatel Differential Revision: https://reviews.llvm.org/D113271	2021-11-05 17:34:28 +00:00
Kazu Hirata	3cfc1757c5	Ensure newlines at the end of files (NFC)	2021-10-29 20:26:09 -07:00
Jon Chesterfield	6c7b203d1d	Revert "[libomptarget] Build DeviceRTL for amdgpu" - more tests failing on CI than failed locally when writing this patch This reverts commit `33427fdb7b`.	2021-10-28 01:01:53 +01:00
Jon Chesterfield	33427fdb7b	[libomptarget] Build DeviceRTL for amdgpu Passes same tests as the current deviceRTL. Includes cmake change from D111987. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D112227	2021-10-28 00:41:45 +01:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Jon Chesterfield	bf6f955f39	[libomptarget] Run GPU offloading tests on both new and old runtime Implemented by patching python config instead of modifying all the tests so that -generic and XFAIL work as usual. Expectation is for this to be reverted once the old runtime is deleted. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D112225	2021-10-22 23:28:44 +01:00
Joseph Huber	b1ce454930	[OpenMP] Remove macro guards for device debugging The plugin currently uses a macro to check if this is a debug built before assigning the debug kind variable to the device environment struct. This is being deprecated because the new device runtime does not maintain separate debug builds and should always be availible. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112083	2021-10-19 12:21:43 -04:00
Ron Lieberman	d022f39d9f	[libomptarget][amdgpu][NFC] tweak a comment	2021-10-09 12:51:53 -04:00
Shilei Tian	c060c634ef	[OpenMP][NVPTX] Fix an error in configuring #teams and #threads It must be a copy mistake. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D111407	2021-10-08 11:07:43 -04:00
Jon Chesterfield	1bc3a6e41b	[libomptarget] Reapply `2bc4d48a78` which was accidentally reverted	2021-10-07 20:17:48 +01:00

1 2 3 4 5 ...

254 Commits