llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Chesterfield	c2574e63ff	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-23 16:19:11 +01:00
Jon Chesterfield	b1efeface7	Revert "[openmp][nfc] Refactor GridValues" Failed a nvptx codegen test This reverts commit `2a47a84b40`.	2021-08-20 18:17:27 +01:00
Jon Chesterfield	2a47a84b40	[openmp][nfc] Refactor GridValues Remove redundant fields and replace pointer with virtual function Of fourteen fields, three are dead and four can be computed from the remainder. This leaves a couple of currently dead fields in place as they are expected to be used from the deviceRTL shortly. Two of the fields that can be computed are only used from codegen and require a log2() implementation so are inlined into codegen instead. This change leaves the new methods in the same location in the struct as the previous fields for convenience at review. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108380	2021-08-20 16:41:26 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Nikita Popov	2c68ecccc9	[OpaquePtr] Remove uses of CreateGEP() without element type Remove uses of to-be-deprecated API. In cases where the correct element type was not immediately obvious to me, fall back to explicit getPointerElementType().	2021-07-17 22:56:27 +02:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Alexey Bataev	b3c80dd894	[OPENMP]Remove const firstprivate allocation as a variable in a constant space. Current implementation is not compatible with asynchronous target regions, need to remove it. Differential Revision: https://reviews.llvm.org/D105375	2021-07-07 05:56:48 -07:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Michael Kruse	83ff0ff463	[Clang][OpenMP] Allow unified_shared_memory for Pascal-generation GPUs. The Pascal architecture supports the page migration engine required for unified_shared_memory, as indicated by NVIDIA: * https://developer.nvidia.com/blog/unified-memory-cuda-beginners/ * https://developer.nvidia.com/blog/beyond-gpu-memory-limits-unified-memory-pascal/ * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements The limitation was introduced in D54493 which justified the cut-off by the requirement for unified addressing. However, Unified Virtual Addressing (UVA) is already available with sm20 (Fermi, Kepler, Maxwell): * https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#basics-of-uva-cuda-memory-management Unified shared memory might even be possible with these, but with migration of entire allocations on kernel startup. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101595	2021-05-13 17:15:34 -05:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Johannes Doerfert	03cc8a1ba0	[OpenMP][NFC] Move the `noinline` to the parallel entry point The `noinline` for non-SPMD parallel functions is probably not necessary but as long as we use it we should put it on the outermost parallel function, which is the wrapper, not the actual outlined function. Resolves PR49752 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D99506	2021-03-30 01:12:45 -05:00
Nikita Popov	42eb658f65	[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC) This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.	2021-03-12 21:01:16 +01:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Artem Belevich	2aa01ccec3	[CUDA, NVPTX] Allow targeting sm_86 GPUs. The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o adding any new functionality. Differential Revision: https://reviews.llvm.org/D95974	2021-02-09 11:01:10 -08:00
Johannes Doerfert	bd756286d2	[OpenMP][FIX] Enforce a function boundary for a new data environment Whenever we enter a new OpenMP data environment we want to enter a function to simplify reasoning. Later we probably want to remove the entire specialization wrt. the if clause and pass the result to the runtime, for now this should fix PR48686. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D94315	2021-01-25 22:43:37 -06:00
Thorsten Schütt	2fd11e0b1e	Revert "[NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II)" This reverts commit `efc82c4ad2`.	2021-01-04 23:17:45 +01:00
Thorsten Schütt	efc82c4ad2	[NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II) Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D93765	2021-01-04 22:58:26 +01:00
Tim Renouf	89d41f3a2b	[AMDGPU] Add gfx1033 target Differential Revision: https://reviews.llvm.org/D90447 Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761	2020-11-03 16:27:48 +00:00
Tim Renouf	ee3e642627	[AMDGPU] Add gfx90c target This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were previously included in gfx909. Differential Revision: https://reviews.llvm.org/D90419 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-11-03 16:27:43 +00:00
JonChesterfield	5d02ca49a2	[libomptarget][nvptx] Undef, weak shared variables [libomptarget][nvptx] Undef, weak shared variables Shared variables on nvptx, and LDS on amdgcn, are uninitialized at the start of kernel execution. Therefore create the variables with undef instead of zeros, motivated in part by the amdgcn back end rejecting LDS+initializer. Common is zero initialized, which seems incompatible with shared. Thus change them to weak, following the direction of https://reviews.llvm.org/rG7b3eabdcd215 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90248	2020-10-28 14:25:36 +00:00
Stanislav Mekhanoshin	d1beb95d12	[AMDGPU] gfx1032 target Differential Revision: https://reviews.llvm.org/D89487	2020-10-15 12:41:18 -07:00
Tim Renouf	666ef0db20	[AMDGPU] Add gfx602, gfx705, gfx805 targets At AMD, in an internal audit of our code, we found some corner cases where we were not quite differentiating targets enough for some old hardware. This commit is part of fixing that by adding three new targets: * The "Oland" and "Hainan" variants of gfx601 are now split out into gfx602. LLPC (in the GPUOpen driver) and other front-ends could use that to avoid using the shaderZExport workaround on gfx602. * One variant of gfx703 is now split out into gfx705. LLPC and other front-ends could use that to avoid using the shaderSpiCsRegAllocFragmentation workaround on gfx705. * The "TongaPro" variant of gfx802 is now split out into gfx805. TongaPro has a faster 64-bit shift than its former friends in gfx802, and a subtarget feature could be set up for that to take advantage of it. This commit does not make that change; it just adds the target. V2: Add clang changes. Put TargetParser list in order. V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order, so fix the GPUKind order. Differential Revision: https://reviews.llvm.org/D88916 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-10-10 17:22:22 +01:00
Joseph Huber	3cc1f1fc1d	[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def Summary: Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU for OpenMP device code generation with ones in OMPKinds.def and use OMPIRBuilder for generating runtime calls. This allows us to consolidate more OpenMP code generation into the OMPIRBuilder. Future additions to the GPU runtime functions should now go in OMPKinds.def Reviewers: jdoerfert Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl Tags: #OpenMP #LLVM #clang Differential Revision: https://reviews.llvm.org/D88430	2020-10-08 14:00:22 -04:00
Pushpinder Singh	3a12ff0dac	[OpenMP][RTL] Remove dead code RequiresDataSharing was always 0, resulting dead code in device runtime library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D88829	2020-10-06 05:43:47 -04:00
Yaxun (Sam) Liu	cbd420c5ed	[CUDA][HIP] Fix bound arch for offload action for fat binary Currently CUDA/HIP toolchain uses "unknown" as bound arch for offload action for fat binary. This causes -mcpu or -march with "unknown" added in HIPToolChain::TranslateArgs or CUDAToolChain::TranslateArgs. This causes issue for https://reviews.llvm.org/D88377 since HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs. The bound arch of offload action for fat binary is not really used, therefore set it to CudaArch::UNUSED. Differential Revision: https://reviews.llvm.org/D88524	2020-10-02 19:05:51 -04:00
Joseph Huber	1b60f63e4f	Revert "[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def" Failing tests on Arm due to the tests automatically populating incomatible pointer width architectures. Reverting until the tests are updated. Failing tests: OpenMP/distribute_parallel_for_num_threads_codegen.cpp OpenMP/distribute_parallel_for_if_codegen.cpp OpenMP/distribute_parallel_for_simd_if_codegen.cpp OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp OpenMP/teams_distribute_parallel_for_if_codegen.cpp OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp This reverts commit `90eaedda9b`.	2020-09-30 15:12:21 -04:00
Joseph Huber	90eaedda9b	[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def Summary: Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU for OpenMP device code generation with ones in OMPKinds.def and use OMPIRBuilder for generating runtime calls. This allows us to consolidate more OpenMP code generation into the OMPIRBuilder. This patch also invalidates specifying target architectures with conflicting pointer sizes. Reviewers: jdoerfert Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl Tags: #OpenMP #Clang #LLVM Differential Revision: https://reviews.llvm.org/D88430	2020-09-30 14:00:01 -04:00
Johannes Doerfert	95a25e4c32	[OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156 When we implement OpenMP GPU reductions we use type punning a lot during the shuffle and reduce operations. This is not always compatible with language rules on aliasing. So far we generated TBAA which later allowed to remove some of the reduce code as accesses and initialization were "known to not alias". With this patch we avoid TBAA in this step, hopefully for all accesses that we need to. Verified on the reproducer of PR46156 and QMCPack. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D86037	2020-08-16 14:38:31 -05:00
Craig Topper	5c1fe4e20f	[Target] Cache the command line derived feature map in TargetOptions. We can use this to remove some calls to initFeatureMap from Sema and CodeGen when a function doesn't have a target attribute. This reduces compile time of the linux kernel where this map is needed to diagnose some inline assembly constraints based on whether sse, avx, or avx512 is enabled. Differential Revision: https://reviews.llvm.org/D85807	2020-08-12 12:37:23 -07:00
Stanislav Mekhanoshin	105608a4c2	[AMDGPU] Added missing gfx1031 cases to CGOpenMPRuntimeGPU.cpp	2020-08-05 12:39:03 -07:00
Saiyedul Islam	160ff83765	[OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3 Provides AMDGCN and NVPTX specific specialization of getGPUWarpSize, getGPUThreadID, and getGPUNumThreads methods. Adds tests for AMDGCN codegen for these methods in generic and simd modes. Also changes the precondition in InitTempAlloca to be slightly more permissive. Useful for AMDGCN OpenMP codegen where allocas are created with a cast to an address space. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D84260	2020-08-03 05:38:39 +00:00
Saiyedul Islam	fc7d2908ab	[OpenMP] Use common interface to access GPU Grid Values Use common interface for accessing target specific GPU grid values in NVPTX OpenMP codegen as proposed in https://reviews.llvm.org/D80917 Originally authored by Greg Rodgers (@gregrodgers). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83492	2020-07-21 05:25:46 +00:00
Saiyedul Islam	c7562e77b3	[OpenMP][NFC] Generalize CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU Refactors CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU to make it a generalization for OpenMP GPU Codegen. Target specific specialized methods for NVPTX are defined in class CGOpenMPRuntimeNVPTX. This paves the way for a clean and maintainable extension to more GPU targets for OpenMP Codegen. For original author (git blame) list of CGOpenMPRuntimeGPU code, look in history of CGOpenMPRuntimeNVPTX.cpp and .h, after this commit. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D83723	2020-07-17 14:38:04 +00:00

1 2

93 Commits