llvm-project

Commit Graph

Author	SHA1	Message	Date
Dhruva Chakrabarti	839ac62c50	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `7539e9cf81`.	2022-09-15 03:08:46 +00:00
Giorgis Georgakoudis	7539e9cf81	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6, ABataev Differential Revision: https://reviews.llvm.org/D102107	2022-09-15 00:54:05 +00:00
Johannes Doerfert	178674e23a	[OpenMP][NFC] Remove unused check lines in Clang/OpenMP tests The check lines are not referenced in RUN lines, hence useless. Differential Revision: https://reviews.llvm.org/D128685	2022-06-28 17:18:13 -05:00
Nikita Popov	532dc62b90	[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC) This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be part of the migration approach described in https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9. The patch has been produced by replacing %clang_cc1 with %clang_cc1 -no-opaque-pointers for tests that fail with opaque pointers enabled. Worth noting that this doesn't cover all tests, there's a remaining ~40 tests not using %clang_cc1 that will need a followup change. Differential Revision: https://reviews.llvm.org/D123115	2022-04-07 12:09:47 +02:00
Nikita Popov	7a2e12e0a7	[CodeGen][OpenMP] Use correct type in EmitLoadOfPointer() The EmitLoadOfPointer() call already specified the right pointer type, but it did not match the Address we're loading from, so we need to insert a bitcast first.	2022-03-21 15:22:37 +01:00
Joseph Huber	53d5757ea2	[OpenMP] Add kernel string attribute to kernel function This patch adds a function attribute to the kernel function generated in OpenMP offloading. We already create a `nvvm.annotations` metadata node indicating the kernels present in the program. However, this created some indirection when trying to identify if a specific function was an entry. We add a single function attribute for each function now to simplify this. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D118708	2022-02-01 13:49:31 -05:00
hyeongyu kim	1b1c8d83d3	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2022-01-16 18:54:17 +09:00
Joseph Huber	7cdaa5a94e	[OpenMP][FIX] Change globalization alignment to 16 This patch changes the default aligntment from 8 to 16, and encodes this information in the `__kmpc_alloc_shared` runtime call to communicate it to the HeapToStack pass. The previous alignment of 8 was not sufficient for the maximum size of primitive types on 64-bit systems, and needs to be increaesd. This reduces the amount of space availible in the data sharing stack, so this implementation will need to be improved later to include the alignment requirements in the allocation call, and use it properly in the data sharing stack in the runtime. Depends on D115888 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115971	2021-12-27 16:58:25 -05:00
Nikita Popov	1201a0f395	[OpenMP] Fix incorrect type when casting from uintptr MakeNaturalAlignAddrLValue() expects the pointee type, but the pointer type was passed. As a result, the natural alignment of the pointer (usually 8) was always used in place of the natural alignment of the value type. Differential Revision: https://reviews.llvm.org/D116171	2021-12-23 08:57:11 +01:00
Atmn Patel	737c4a2673	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 15:11:05 -05:00
Atmn Patel	ef717f3852	Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files" This reverts commit `81a7cad2ff`.	2021-11-09 02:10:42 -05:00
Atmn Patel	81a7cad2ff	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 01:52:52 -05:00
hyeongyu kim	fd9b099906	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit `aacfbb953e`. Revert "Fix lit test failures in CodeGenCoroutines" This reverts commit `63fff0f5bf`.	2021-11-09 02:15:55 +09:00
hyeongyukim	aacfbb953e	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2) This patch updates test files after D105169. Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows: (1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached. (2) The remaining tests are updated manually. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108453 Resolve lit failures in clang after 8ca4b3e's land Fix lit test failures in clang-ppc* and clang-x64-windows-msvc Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc Fix internal_clone(aarch64) inline assembly	2021-11-06 19:19:22 +09:00
Juneyoung Lee	89ad2822af	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit `7584ef766a`.	2021-11-06 15:39:19 +09:00
Juneyoung Lee	7584ef766a	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2021-11-06 15:36:42 +09:00
Juneyoung Lee	f193bcc701	Revert D105169 due to the two-stage failure in ASAN This reverts the following commits: `37ca7a795b` `9aa6c72b92` `705387c507` `8ca4b3ef19` `80dba72a66`	2021-10-18 23:52:46 +09:00
Juneyoung Lee	8ca4b3ef19	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2) This patch updates test files after D105169. Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows: (1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached. (2) The remaining tests are updated manually. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108453	2021-10-16 12:01:41 +09:00
hsmahesha	f7de6962c8	[CFE][Codegen][In-progress] Remove CodeGenFunction::InitTempAlloca() Sequel patch to https://reviews.llvm.org/D111293. Remove call to CodeGenFunction::InitTempAlloca() from OpenMP related codegen part. Also remove the metadata `!llvm.access.group` from the updated lit tests. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D111316	2021-10-12 10:01:46 +05:30
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Giorgis Georgakoudis	ac90dfc43a	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `1d66649adf`. Revert to fix AMG GPU issue.	2021-09-21 13:20:39 -07:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Giorgis Georgakoudis	207b08a913	[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101849	2021-05-05 20:08:38 -07:00
Giorgis Georgakoudis	78a7d8c4dd	[Utils][NFC] Rename replace-function-regex in update_cc_test_checks This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101934	2021-05-05 14:19:30 -07:00
Giorgis Georgakoudis	f016c06abb	Revert "[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks" This reverts commit `956cae2f09`.	2021-05-04 17:12:32 -07:00
Giorgis Georgakoudis	956cae2f09	[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101849	2021-05-04 16:58:45 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
JonChesterfield	5d02ca49a2	[libomptarget][nvptx] Undef, weak shared variables [libomptarget][nvptx] Undef, weak shared variables Shared variables on nvptx, and LDS on amdgcn, are uninitialized at the start of kernel execution. Therefore create the variables with undef instead of zeros, motivated in part by the amdgcn back end rejecting LDS+initializer. Common is zero initialized, which seems incompatible with shared. Thus change them to weak, following the direction of https://reviews.llvm.org/rG7b3eabdcd215 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90248	2020-10-28 14:25:36 +00:00
Alexey Bataev	32ea3397be	[OPENMP]Dynamic globalization for parallel target regions. Summary: Added support for dynamic memory allocation for globalized variables in case if execution of target regions in parallel is required. Reviewers: jdoerfert Subscribers: jholewinski, yaxunl, guansong, sstefan1, cfe-commits, caomhin Tags: #clang Differential Revision: https://reviews.llvm.org/D82324	2020-06-25 08:25:24 -04:00
Eli Friedman	62f3ef2b53	[CGCall] Annotate references with "align" attribute. If we're going to assume references are dereferenceable, we should also assume they're aligned: otherwise, we can't actually dereference them. See also D80072. Differential Revision: https://reviews.llvm.org/D80166	2020-05-19 20:21:30 -07:00
Alexey Bataev	f89cf21337	[OPENMP]Use different addresses for zeroed thread_id/bound_id. When the parallel region is called directly in the sequential region, the zeroed tid/bound id are used. But they must point to the different memory locations as the parameters are marked as noalias. llvm-svn: 375017	2019-10-16 16:59:01 +00:00
Tim Northover	a009a60a91	IR: print value numbers for unnamed function arguments For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755	2019-08-03 14:28:34 +00:00
Alexey Bataev	8c5555c39a	[OPENMP][NVPTX]Mark more functions as always_inline for better performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269	2019-05-21 15:11:58 +00:00
Alexey Bataev	7b3eabdcd2	[OPENMP][NVPTX]Fix PR40893: Size doesn't match for '_openmp_teams_reductions_buffer_$_. nvlink does not handle weak linkage correctly, same symbols with the different sizes are reported as erroneous though the largest size must be chosen instead. Patch fixes this problem by using Internal linkage instead of the Common. llvm-svn: 356072	2019-03-13 18:21:10 +00:00
Alexey Bataev	8061acd501	[OPENMP][NVPTX]Use faster teams reduction algorithm. A faster way to reduce the values in teams reductions was found, the codegen is updated to use this faster algorithm and new runtime functions. llvm-svn: 354479	2019-02-20 16:36:22 +00:00
Alexey Bataev	25d3de8a0a	[OPENMP][NVPTX]Reduce number of barriers in reductions. After the fix for the syncthreads we don't need to generate extra barriers for the parallel reductions. llvm-svn: 350530	2019-01-07 15:45:09 +00:00
Alexey Bataev	8e009036c9	[OPENMP][NVPTX]Use new functions from the runtime library. Updated codegen to use the new functions from the runtime library. llvm-svn: 350415	2019-01-04 17:25:09 +00:00
Alexey Bataev	6a1b06bcd4	[OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540	2018-12-18 21:01:42 +00:00
Alexey Bataev	ae51b96f99	[OPENMP][NVPTX]Improved interwarp copy function. Inlined runtime with the current implementation of the interwarp copy function leads to the undefined behavior because of the not quite correct implementation of the barriers. Start using generic __kmpc_barier function instead of the custom made barriers. llvm-svn: 349192	2018-12-14 21:00:58 +00:00
Gheorghe-Teodor Bercea	2b40470c61	[OpenMP] Add a new version of the SPMD deinit kernel function Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D54970 llvm-svn: 347915	2018-11-29 20:53:49 +00:00
Alexey Bataev	a116602475	[OPENMP][NVPTX]Basic support for reductions across the teams. Added basic codegen support for the reductions across the teams. llvm-svn: 347715	2018-11-27 21:24:54 +00:00
Alexey Bataev	f2f39be9ed	[OPENMP][NVPTX]Emit correct reduction code for teams/parallel reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081	2018-11-16 19:38:21 +00:00
Alexey Bataev	9ea3c38597	[OPENMP][NVPTX] Support memory coalescing for globalized variables. Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049	2018-10-09 14:49:00 +00:00
Alexey Bataev	12c62908b5	[OPENMP, NVPTX] Fix reduction of the big data types/structures. If the shuffle is required for the reduced structures/big data type, current code may cause compiler crash because of the loading of the aggregate values. Patch fixes this problem. llvm-svn: 335377	2018-06-22 19:10:38 +00:00
Alexey Bataev	9ff8083d98	[OPENMP] General code improvements. llvm-svn: 330154	2018-04-16 20:16:21 +00:00

1 2

54 Commits