llvm-project

Commit Graph

Author	SHA1	Message	Date
Jose M Monsalve Diaz	0276db1416	[OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In commit D106033 we have the optimization phase. This commit adds the attributes to the outlined function for the grid size. the two attributes are `omp_target_num_teams` and `omp_target_thread_limit`. These values are added as long as they are constant. Two functions are created `getNumThreadsExprForTargetDirective` and `getNumTeamsExprForTargetDirective`. The original functions `emitNumTeamsForTargetDirective` and `emitNumThreadsForTargetDirective` identify the expresion and emit the code. However, for the Device version of the outlined function, we cannot emit anything. Therefore, this is a first attempt to separate emision of code from deduction of the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106298	2021-07-27 17:21:04 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Giorgis Georgakoudis	207b08a913	[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101849	2021-05-05 20:08:38 -07:00
Giorgis Georgakoudis	78a7d8c4dd	[Utils][NFC] Rename replace-function-regex in update_cc_test_checks This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101934	2021-05-05 14:19:30 -07:00
Giorgis Georgakoudis	f016c06abb	Revert "[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks" This reverts commit `956cae2f09`.	2021-05-04 17:12:32 -07:00
Giorgis Georgakoudis	956cae2f09	[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101849	2021-05-04 16:58:45 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Pushpinder Singh	3a12ff0dac	[OpenMP][RTL] Remove dead code RequiresDataSharing was always 0, resulting dead code in device runtime library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D88829	2020-10-06 05:43:47 -04:00
Saiyedul Islam	a1bdf8f545	[OpenMP] Ensure testing for versions 4.5 and default - Part 2 Many OpenMP Clang tests do not RUN for version 4.5 and the default version. This second patch in the series handles test cases which require updation in CHECK lines along with adding RUN lines for the default version. It involves updating line number of pragmas. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D85150	2020-08-27 18:50:52 +00:00
Alexey Bataev	8c5555c39a	[OPENMP][NVPTX]Mark more functions as always_inline for better performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269	2019-05-21 15:11:58 +00:00
Alexey Bataev	e0eb13135f	[OPENMP][NVPTX]Run parallel regions with num_threads clauses in SPMD mode. After the previous patch with the more correct handling of the number of threads in parallel regions, the parallel regions with num_threads clauses can be executed in SPMD mode. llvm-svn: 358445	2019-04-15 20:38:10 +00:00
Alexey Bataev	2a3320a928	[OPENMP, NVPTX] Do not globalize variables with reference/pointer types. In generic data-sharing mode we do not need to globalize variables/parameters of reference/pointer types. They already are placed in the global memory. llvm-svn: 332380	2018-05-15 18:01:01 +00:00
Carlo Bertolli	79712097c7	[OpenMP] Extend NVPTX SPMD implementation of combined constructs Differential Revision: https://reviews.llvm.org/D43852 This patch extends the SPMD implementation to all target constructs and guards this implementation under a new flag. llvm-svn: 326368	2018-02-28 20:48:35 +00:00
Arpith Chacko Jacob	e04da5dee2	[OpenMP] Support for the num_threads-clause on 'target parallel' on the NVPTX device. This patch adds support for the Spmd construct 'target parallel' on the NVPTX device. This involves ignoring the num_threads clause on the device since the number of threads in this combined construct is already set on the host through the call to __tgt_target_teams(). Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29083 llvm-svn: 292999	2017-01-25 01:18:34 +00:00

17 Commits