llvm-project

Commit Graph

Author	SHA1	Message	Date
Dan Palermo	db021abf33	[OpenMP][AMDGPU] Enable OpenMP device runtime build for gfx110[0123] Add OpenMP device runtime build support for the gfx1100, gfx1101, gfx1102, and gfx1103 targets. Differential Revision: https://reviews.llvm.org/D134465	2022-09-23 01:49:51 +00:00
Joseph Huber	292cb114b0	[Libomptarget] Revert changes to AMDGPU plugin destructors These patches exposed a lot of problems in the AMD toolchain. Rather than keep it broken we should revert it to its old semi-functional state. This will prevent us from using device destructors but should remove some new bugs. In the future this interface should be changed once these problems are addressed more correctly. This reverts commit `ed0f218115`. This reverts commit `2b7203a359`. Fixes #57536 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D133997	2022-09-16 06:55:51 -05:00
Joseph Huber	23bc343855	[Libomptarget] Change device free routines to accept the allocation kind Previous support for device memory allocators used a single free routine and did not provide the original kind of the allocation. This is problematic as some of these memory types required different handling. Previously this was worked around using a map in runtime to record the original kind of each pointer. Instead, this patch introduces new free routines similar to the existing allocation routines. This allows us to avoid a map traversal every time we free a device pointer. The only interfaces defined by the standard are `omp_target_alloc` and `omp_target_free`, these do not take a kind as `omp_alloc` does. The standard dictates the following: "The omp_target_alloc routine returns a device pointer that references the device address of a storage location of size bytes. The storage location is dynamically allocated in the device data environment of the device specified by device_num." Which suggests that these routines only allocate the default device memory for the kind. So this has been changed to reflect this. This change is somewhat breaking if users were using `omp_target_free` as previously shown in the tests. Reviewed By: JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D133053	2022-09-14 12:14:07 -05:00
Joseph Huber	c2acb1e5d3	[Libomptarget][NFC] Remove unused variable	2022-09-09 15:26:02 -05:00
Joseph Huber	83fcba82cc	[Libomptarget] Add proper LLVM libraries now that the AMDGPU plugin uses them Summary: The AMDGPU and CUDA plugins now relies on the Object and Support libraries. This patch adds them explicitly rather than hoping that they share the symbols loaded from the standard `libomptarget`.	2022-09-09 10:33:26 -05:00
Joseph Huber	8d2a447bf9	[Libomptarget] Remove leftover ELF header from x86 plugin Summary: We removed the linking support for `gelf.h` in a previous patch. This header was incorrectly leftover causing build problems on some systems.	2022-09-07 13:41:40 -05:00
Joseph Huber	300155911a	[Libomptarget] Replace libelf with LLVM's Elf libraries This patch replaces the dependency on `libelf` with LLVM's ELF support. With this patch the user no-longer needs to have `libelf` on their system to build and configure OpenMP offloading. The replacement is mostly mechanical, with the exception of the hash table support which was added in D131309. Depends on D131309 Reviewed By: JonChesterfield, saiislam Differential Revision: https://reviews.llvm.org/D131401	2022-09-07 12:38:51 -05:00
Joseph Huber	894531f59b	[Libomptarget] Add utility functions for loading an ELF symbol by name The `SHT_HASH` sections in an ELF are used to look up a symbol in the symbol table using a symbol's name. This is done by obtaining the `SHT_HASH` section and using its `sh_link` attribute to access the associated symbol table, from which we can access the string table containing the associated name. We can then search for the symbol using the hash of the name and the buckets and chains in the hash table itself This patch adds utility functions that allow us to look up a symbol in an ELF file by name. It will first attempt to look through the hash tables, and then search the section tables manually if failed. This allows us to pull out constants necessary for setting up offloading without first loading the object. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D131309	2022-09-07 12:38:50 -05:00
Joseph Huber	31f434ee3b	[Libomptarget][NFC] Clean up CUDA plugin and address warnings	2022-09-06 15:28:57 -05:00
Joseph Huber	f8b1f93f26	[libomptarget] Enable the device allocator for AMDGPU This patch adds support for the device memory type, this is currently equivalent to the default type so it should be treated as the same. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D133128	2022-09-01 12:40:59 -05:00
Jon Chesterfield	ffabe997a5	[openmp][amdgpu] Implement target_alloc_host as fine grain HSA memory The cuda plugin maps TARGET_ALLOC_HOST onto cuMemAllocHost which is page locked host memory. Fine grain HSA memory is not necessarily page locked but has the same read/write from host or device semantics. The cuda plugin does this per-gpu and this patch makes it accessible from any gpu, but it can be locked down to match the cuda behaviour if preferred. Enabling tests requires an equivalent to // RUN: %libomptarget-compile-run-and-check-nvptx64-nvidia-cuda for amdgpu which doesn't seem to be in use yet. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D132660	2022-08-25 16:27:52 +01:00
Ye Luo	322ea53144	[libomptarget][amdgpu] enable tests whenever possible. if(TARGET amdgpu-arch) doesn't work when ENABLE_LLVM_PROJECTS=openmp because openmp subdirectory is processed before clang subdirectory. Adopt the same logic of enabling tests like the CUDA plugin. Differential Revision: https://reviews.llvm.org/D132579	2022-08-24 14:33:28 -05:00
Joseph Huber	540a13652f	[Libomptarget] Replace use of `dlopen` with LLVM's dynamic library support This patch replaces uses of `dlopen` and `dlsym` with LLVM's support with `loadPermanentLibrary` and `getSymbolAddress`. This allows us to remove the explicit dependency on the `dl` libraries in the CMake. This removes another explicit dependency and solves an issue encountered while building on Windows platforms. The one downside to this is that the LLVM library does not currently support `dlclose` functionality, but this could be added in the future. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D131507	2022-08-24 10:46:21 -05:00
Joseph Huber	30efb459e0	[Libomptarget] Remove use of ELF link_address in x86_64 plugin We use the offloading entires array to determine the relative names and addressed of device-side kernel functions. The x86_64 plugin previously derived the device-side entry table by first identifying the `omp_offloading_entries` section offset in the loaded elf. Then we would use the base offset of the loaded dyanmic library to identify the entries array within the loaded image. This relied on some more unconventional methods which prevented us from using the LLVM dynamic library loader for this plugin. This patch simplifies this by instead copying the host-side entry and replacing its address with the device-side address looked up through `dlsym`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D131516	2022-08-24 10:46:20 -05:00
Joseph Huber	fdbb15355e	[Libomptarget][CUDA] Check CUDA compatibilty correctly We recently added support for multi-architecture binaries in libomptarget. This is done by extracting the architecture from the embedded image and comparing it with the major and minor version supported by the current CUDA installation. Previously we just compared these directly, which was not correct for binary compatibility. The CUDA documentation states that we can consider any image with an equivalent major or a greater or equal to minor compatible with the current image. Change the check to use this new logic in the CUDA plugin. Fixes #57049 Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D131567	2022-08-10 11:15:27 -04:00
Fangrui Song	0972a390b9	LLVM_FALLTHROUGH => [[fallthrough]]. NFC	2022-08-09 04:06:52 +00:00
Jon Chesterfield	104f11630a	[nfc][openmp] clang-format system.cpp prior to D131401	2022-08-08 16:24:34 +01:00
Joseph Huber	b3335e8ed7	[Libomptarget][NFC] Clang format the AMDGPU plugin Summary: A previous patch did not format the plugin again after making changes. Ensure that libomptarget stays formatted.	2022-08-03 15:18:16 -04:00
Joseph Huber	2b7203a359	[Libomptarget] Deinitialize AMDGPU global state more intentionally A previous patch made the destruction of the HSA plugin more deterministic. However, there were still other global values that are not handled this way. When attempting to call a destructor kernel, the device would have already been uninitialized and we could not find the appropriate kernel to call. This is because they were stored in global containers that had their destructors called already. Merges this global state into the rest of the info state by putting those global values inside of the global pointer already allocated and deallocated by the constructor and destructor. This should allow the AMDGPU plugin to correctly identify the destructors if we were to run them. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D131011	2022-08-02 18:24:39 -04:00
Jon Chesterfield	ed0f218115	[openmp][amdgpu] Tear down amdgpu plugin accurately Moves DeviceInfo global to heap to accurately control lifetime. Moves calls from libomptarget to deinit_plugin later, plugins need to stay alive until very shortly before libomptarget is destructed. Leaving the deinit_plugin calls where initially inserted hits use after free from the dynamic_module.c offloading test (verified with valgrind that the new location is sound with respect to this) Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D130714	2022-07-28 20:00:03 +01:00
Jon Chesterfield	c214cb6a68	[amdgpu][openmp][nfc] Restore stb_local on DeviceInfo symbol	2022-07-28 16:50:46 +01:00
Jon Chesterfield	75aa521064	[openmp][amdgpu] Move global DeviceInfo behind call syntax prior to using D130712	2022-07-28 16:40:42 +01:00
Jon Chesterfield	1f9d3974e4	[openmp] Introduce optional plugin init/deinit functions Will allow plugins to migrate away from using global variables to manage lifetime, which will fix a segfault discovered in relation to D127432 Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D130712	2022-07-28 16:21:38 +01:00
Saiyedul Islam	4075a811ad	[Libomptarget] Add checks for AMDGPU TargetID using new image info This patch extends the is_valid_binary routine to also check if the binary's target ID matches the one parsed from the system's runtime environment. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for AMDGPU. It also handles compatibility testing of target IDs of the image and the enviornment. Depends on D127432 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127769	2022-07-26 02:44:31 -05:00
Saiyedul Islam	4cf30c5157	Revert "Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info""" This reverts commit `281eb9223c`.	2022-07-25 11:35:37 -05:00
Saiyedul Islam	281eb9223c	Revert "Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info"" This reverts commit `8cbf4a386b`.	2022-07-25 08:32:26 -05:00
Saiyedul Islam	8cbf4a386b	Revert "[Libomptarget] Add checks for AMDGPU TargetID using new image info" This reverts commit `471f2abc62`.	2022-07-25 05:32:59 -05:00
Saiyedul Islam	471f2abc62	[Libomptarget] Add checks for AMDGPU TargetID using new image info This patch extends the is_valid_binary routine to also check if the binary's target ID matches the one parsed from the system's runtime environment. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for AMDGPU. It also handles compatibility testing of target IDs of the image and the enviornment. Depends on D127432 Differential Revision: https://reviews.llvm.org/D127769	2022-07-25 04:44:36 -05:00
Joel E. Denny	cfa6e79df3	[Libomptarget] Don't report lack of CUDA devices Sometimes libomptarget's CUDA plugin produces unhelpful diagnostics about a lack of CUDA devices before an application runs: ``` $ clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa hello-world.c $ ./a.out CUDA error: Error returned from cuInit CUDA error: no CUDA-capable device is detected Hello World: 4 ``` This can happen when the CUDA plugin was built but all CUDA devices are currently disabled in some manner, perhaps because `CUDA_VISIBLE_DEVICES` is set to the empty string. As shown in the above example, it can even happen when we haven't compiled the application for offloading to CUDA. The following code from `openmp/libomptarget/plugins/cuda/src/rtl.cpp` appears to be intended to handle this case, and it chooses not to write a diagnostic to stderr unless debugging is enabled: ``` if (NumberOfDevices == 0) { DP("There are no devices supporting CUDA.\n"); return; } ``` The problem is that the above code is never reached because the earlier `cuInit` returns `CUDA_ERROR_NO_DEVICE`. This patch handles that `cuInit` case in the same manner as the above code handles the `NumberOfDevices == 0` case. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D130371	2022-07-22 14:46:45 -04:00
Joseph Huber	a3804a3145	[Libomptarget] Make the plugins link as LLVM libraries Previously we made `libomptarget` link as an LLVM library so we have access to the LLVM core libraries. After the initial patch stuck we can now apply the same changes to the plugins. This will allow us to use LLVM in all of `libomptarget` when we have uses for them. In the future this should allow us to remove the dependencies on `libelf`, `libffi`, and `dl`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D130262	2022-07-22 09:34:12 -04:00
Joseph Huber	e01ce4e88a	[Libomptarget] Add checks for CUDA subarchitecture using new info This patch extends the `is_valid_binary` routine to also check if the binary's architecture string matches the one parsed from the runtime. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for CUDA. Depends on D127432 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D127505	2022-07-21 13:20:06 -04:00
Joseph Huber	1f940b69c3	[Libomptarget][NFC] Fix signed comparison warnings Summary: Non-functional change, just fixing some sign comparison warnings by making both match.	2022-07-15 13:22:55 -04:00
Joseph Huber	d27d0a673c	[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly large clash between the naming conventions used. This patch fixes most of the variable names that did not confrom to the LLVM standard, that is `VariableName` for variables and `functionName` for functions. This patch was primarily done using my editor's linting messages, if there are any issues I missed arising from the automation let me know. Reviewed By: saiislam Differential Revision: https://reviews.llvm.org/D128997	2022-07-05 14:53:38 -04:00
Shilei Tian	696bca9bb2	[NFC][OpenMP][CUDA] Remove unnecessary default label	2022-07-01 09:50:29 -04:00
Shilei Tian	2695e23ad9	[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122764	2022-06-28 15:32:03 -04:00
Joseph Huber	d5d836635c	[Libomptarget] Add test config for compiling in LTO-mode We are planning on making LTO the default compilation mode for offloading. In order to make sure it works we should run these tests on the test suite. AMDGPU already uses the LTO compilation path for its linking, but in LTO mode it also links the static library late. Performing LTO requires the static library to be built, if we make the change this will be a hard requirement and the old bitcode library will go away. This means users will need to use either a two-step build or a runtimes build for libomptarget. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127512	2022-06-14 10:16:03 -04:00
Jose Manuel Monsalve Diaz	15ed5c0a07	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-09 11:58:39 +00:00
Jose Manuel Monsalve Diaz	84e020a061	Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info" This reverts commit `d16a0877d8`.	2022-06-09 10:46:03 +00:00
Jose Manuel Monsalve Diaz	d16a0877d8	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-08 16:31:12 +00:00
Joseph Huber	f4f23de1a4	[Libomptarget] Add basic support for dynamic shared memory on AMDGPU This patchs adds the arguments necessary to allocate the size of the dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable. This patch only allocates the memory, AMDGPU has a limitation that shared memory can only be accessed from the kernel directly. So this will currently only work with optimizations to inline the accessor function. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D125252	2022-06-01 13:32:50 -04:00
Joseph Huber	9ffa945c40	[Libomptarget] Remove global include directory from libomptarget We used to globally include the libomptarget include directory for all projects. This caused some conflicts with the other files named "Debug.h". This patch changes the cmake to include these files via the target include instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D125563	2022-05-13 14:38:47 -04:00
Atmn Patel	c44420e90d	[Libomptarget][remote] Add OpenMP linker flag to the plugin The remote offloading server and plugin rely on OpenMP, so this needs to be added as a linker flag. Without this, applications segfault. Differential Revision: https://reviews.llvm.org/D124200	2022-04-21 15:45:29 -04:00
Atmn Patel	489894f363	[Libomptarget][remote] Fix compile-time error This fixes a compile-time error recently introduced within the remote offloading plugin. This patch also removes some extra linker flags that are unnecessary, and adds an explicit abseil linker flag without which we occasionally get problems. Differential Revision: https://reviews.llvm.org/D119984	2022-04-19 16:46:01 -04:00
Joseph Huber	ae23be84cb	[OpenMP] Make the new offloading driver the default Previously an opt-in flag `-fopenmp-new-driver` was used to enable the new offloading driver. After passing tests for a few months it should be sufficiently mature to flip the switch and make it the default. The new offloading driver is now enabled if there is OpenMP and OpenMP offloading present and the new `-fno-openmp-new-driver` is not present. The new offloading driver has three main benefits over the old method: - Static library support - Device-side LTO - Unified clang driver stages Depends on D122683 Differential Revision: https://reviews.llvm.org/D122831	2022-04-18 15:05:09 -04:00
Dhruva Chakrabarti	7086a1db80	[libomptarget] [amdgpu] Hostcall offset check should consider implicit args Fixed hostcall offset check to compare against kernarg segment size and implicit arguments. Improved the corresponding debug print. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D123827	2022-04-15 00:53:47 +00:00
Saiyedul Islam	54a6cc3405	[libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata Code object version 5 adds support of hidden_heap_v1 kernarg metadata field [1]. It is a global address space pointer to an initialized memory buffer that conforms to the requirements of the malloc/free device library V1 version implementation. [1] https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v5 Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D123527	2022-04-13 03:57:29 +00:00
Johannes Doerfert	a3a42c3ca2	[OpenMP][FIX] Ensure to set the context for wait events if necessary Differential Revision: https://reviews.llvm.org/D123445	2022-04-12 16:42:50 -05:00
Michael Kruse	7fa7b0cbd8	[libomptarget] Add device RTL to regression test dependencies. In a clean build directory, `check-openmp` or `check-libomptarget` will fail because of missing device RTL .bc files. Ensure that the new targets new custom targets `omptarget.devicertl.nvptx` and `omptarget.devicertl.amdgpu` (corresponding to the plugin rtl targets `omptarget.rtl.cuda`, respectively `omptarget.rlt.amdgpu` ) are dependencies of the regression tests. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D123177	2022-04-06 20:01:47 -05:00
Johannes Doerfert	ba93e4e33e	[OpenMP][NFC] Add missing virtual destructor to silence warning	2022-03-28 22:33:18 -05:00
Shilei Tian	545fcc3d84	[OpenMP][CUDA] Fix potential program crash caused by double free resources As we mentioned in the code comments for function `ResourcePoolTy::release`, at some point there could be two identical resources on the two sides of `Next` mark. It is usually not an issue, unless the following case: 1. Some resources are not returned. 2. We need to iterate the pool and free the element. That will cause double free, which is the case for event pool. Since we don't release events hold by the data map, it can happen that the `Next` mark is not reset, and we have two identical items in the pool. When the pool is destroyed, we will call `cuEventDestroy` twice on the same event. In the best case, we can only observe CUDA errors. In the worst case, it can cause internal failures in CUDART and further crash. This patch fixes the issue by tracking all resources that have been given using an `unordered_set`. We don't remove it when a resource is returned. When the pool is destroyed, we merge the pool (a `vector`) and the set. In this way, we can make sure that the set contains all resources allocated from the device. We just need to iterate the set and free the resource accordingly. For now, only event pool is set to use it. Stream pool is not because we can make sure all streams are returned when the plugin is destroyed. Someone might be wondering, why don't we release all events hold in the data map. That is because, plugins are determined to be destroyed before `libomptarget`. If we can somehow make the plugin outlast `libomptarget`, life will be much easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122014	2022-03-25 22:49:32 -04:00

1 2 3 4 5 ...

306 Commits