llvm-project

Commit Graph

Author	SHA1	Message	Date
serge-sans-paille	f1985a3f85	Cleanup includes: Transforms/IPO Preprocessor output diff: -238205 lines Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D122183	2022-03-22 10:06:28 +01:00
Johannes Doerfert	4166738c38	[OpenMP][FIX] Do not crash when kernels are debug wrapper functions With debug information enabled (-g) Clang will wrap the actual target region into a new function which is called from the "kernel". The problem is that the "kernel" is now basically a wrapper without all the things we expect. More importantly, if we end up asking for an AAKernelInfo for the "target region function" we might try to turn it into SPMD mode. That used to cause an assertion as that function doesn't have an appropriately named `_exec_mode` global. While the global is going away soon we still need to make sure to properly handle this case, e.g., perform optimizations reliably. Differential Revision: https://reviews.llvm.org/D122043	2022-03-19 14:15:55 -05:00
Johannes Doerfert	59a6b668ab	[OpenMP][FIX] Initialize member to avoid undefined value in debug output	2022-03-17 17:42:32 -05:00
Nikita Popov	875782bd9e	[OpenMPOpt] Avoid pointer element type access during region merging Hardcode the function type as ParallelTask, which is the guaranteed pointee type of this runtime function argument (if pointee types exist). The elimination of the callee bitcast is left for InstCombine. Differential Revision: https://reviews.llvm.org/D120885	2022-03-15 09:52:46 +01:00
Johannes Doerfert	5b4acb20ff	[OpenMP][FIX] Ensure flag to disable de-globalization works properly If the user disables de-globalization we did not seed the AAHeapToShared and AAHeapToStack but we still could end up with them through in-flight lookups. With this patch we disable AAHeapToShared completely if the user disabled de-globalization. Heap-2-stack is still run though. Differential Revision: https://reviews.llvm.org/D121059	2022-03-07 23:43:05 -06:00
Johannes Doerfert	192a34ddb0	[Attributor][OpenMPOpt][FIX] Register simplification callbacks Heap-2-stack and heap-2-shared can replace an allocation call with something else. To avoid us deriving information from the allocator implementation we register a simplification callback now that will force us to stop at the call site. We probably should create the replacement memory eagerly and return that instead though.	2022-03-06 21:28:38 -06:00
Johannes Doerfert	f9c2d6005e	[OpenMP][FIX] Ensure custom state machine works The custom state machine had a check for surplus threads that filtered the main thread if the kernel was executed by a single warp only. We now first check for the main thread, then for surplus threads, avoiding to filter the former out. Fixes #54214. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121011	2022-03-04 13:51:19 -05:00
Joseph Huber	6632180745	[OpenMP][NFC] Add an option to print the module before in OpenMPOpt Previously there was a debug flag to print the module after optimizations. Sometimes we wanted to print the module before optimizations so this is being split into two flags. `-openmp-opt-print-module` is now `-openmp-opt-print-module-after`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120768	2022-03-01 17:09:09 -05:00
Joseph Huber	0136a4401f	[OpenMP] Add an option to limit shared memory usage in OpenMPOpt One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping the runtime call in all cases, however if too many variables are pushed to hared memory the kernel will crash. Since this is an optimization and not something the user specified explicitly, there should be an option to limit this optimization in those cases. This path introduces the `-openmp-opt-shared-limit=` option to limit the amount of bytes that will be placed in shared memory from HeapToShared. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120079	2022-02-18 08:35:26 -05:00
Joseph Huber	74cacf212b	[OpenMP] Add RTL function to externalization RAII This patch adds the '_kmpc_get_hardware_num_threads_in_block' OpenMP RTL function to the externalization RAII struct. This was getting optimized out and then being replaced with an undefined value once added back in, causing bugs for complex reductions. Fixes #53909. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120076	2022-02-17 14:30:58 -05:00
Johannes Doerfert	ede248e614	[OpenMP][FIX] The `llvm.amdgcn.s.barrier` is actually not aligned If we assume `llvm.amdgcn.s.barrier` is aligned we may remove it and cause OpenMP GPU applications on the AMD GPU to be stuck or wrongly synchronized. Reported by Carlo Bertolli.	2022-02-11 12:42:50 -06:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Johannes Doerfert	3c8a4c6f47	[OpenMP] Eliminate redundant barriers in the same block Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and bugs introduced later by me. This patch allows us to remove redundant barriers if they are part of a "consecutive" pair of barriers in a basic block with no impacted memory effect (read or write) in-between them. Memory accesses to local (=thread private) or constant memory are allowed to appear. Technically we could also allow any other memory that is not used to share information between threads, e.g., the result of a malloc that is also not captured. However, it will be easier to do more reasoning once the code is put into an AA. That will also allow us to look through phis/selects reasonably. At that point we should also deal with calls, barriers in different blocks, and other complexities. Differential Revision: https://reviews.llvm.org/D118002	2022-02-01 01:07:50 -06:00
Johannes Doerfert	989674f110	[OpenMP] Ensure to remove noinline from all runtime functions eventually We used to remove noinline from known OpenMP runtime functions (which are declared in OMPKinds.td). Now we remove noinline from all functions with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_	2022-02-01 01:07:50 -06:00
Nikita Popov	9e7a2bfcf7	[OpenMPOpt] Add const qualifier (NFC) Make it clear that this large lambda does not modify the vector.	2022-01-26 10:35:57 +01:00
Giorgis Georgakoudis	7cb4c26173	[OMPIRBuilder] Generate aggregate argument for parallel region outlined functions Summary: This patch modifies code generation in OpenMPIRBuilder to pass arguments to the parallel region outlined function in an aggregate (struct), besides the global_tid and bound_tid arguments. It depends on the updated CodeExtractor (see D96854) for support. It mirrors functionality of Clang codegen (see D102107). Differential Revision: https://reviews.llvm.org/D110114	2022-01-25 20:53:45 -05:00
Joseph Huber	5eb49009eb	[OpenMP] Add more identifier to created shared globals Currenly we push some variables to a global constant containing shared memory as an optimization. This generated constant had internal linkage and should not have collided with any known identifiers in the translation unit. However, there have been observed cases of this optimiztaion unintentionally colliding with undocumented PTX identifiers. This patch adds a suffix to the created globals to hopefully bypass this. Depends on D118059 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D118068	2022-01-24 20:37:54 -05:00
Joseph Huber	06cfdd5224	[OpenMP][Fix] Properly inherit calling convention Previously in OpenMPOpt we did not correctly inherit the calling convention of the callee when creating new OpenMP runtime calls. This created issues when the calling convention was changed during `GlobalOpt` but a new call was creating without the correct calling convention. This lead to the call being replaced with a poison value in `InstCombine` due to undefined behaviour and causing large portions of the program to be incorrectly eliminated. This patch correctly inherits the existing calling convention from the callee. Reviewed By: tianshilei1992, jdoerfert Differential Revision: https://reviews.llvm.org/D118059	2022-01-24 20:37:52 -05:00
Johannes Doerfert	b4a7559844	[OpenMP][FIX] Replace ICVs only with values valid at the getter position While we might know the value if an ICV at a getter position it is not always clear that we can simply use it. Verify the value is valid first to avoid invalid IR. Fixes #53300.	2022-01-19 18:40:13 -06:00
Eli Friedman	86cdff0e21	[OpenMPOpt] Use SetVector to store list of kernels. Fixes test failures on reverse-iteration buildbot.	2022-01-19 13:55:32 -08:00
Simon Pilgrim	274359cf09	[OpenMPOpt] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr. NFC	2022-01-08 13:47:35 +00:00
Kazu Hirata	2aed08131d	[llvm] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-07 00:39:14 -08:00
Johannes Doerfert	944aa0421c	Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t" This reverts commit `73ece231ee` and reapplies `7bfcdbcbf3` with mlir changes. Also reverts commit `423ba12971` and includes the unit test changes of `16da214004`.	2021-12-29 01:10:38 -06:00
Mehdi Amini	73ece231ee	Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t" This reverts commit `7bfcdbcbf3`. Broke MLIR build	2021-12-29 06:57:36 +00:00
Johannes Doerfert	3e0c512ce6	[OpenMP] Simplify all stores in the device code Similar to loads, we want to be aggressive when it comes to store simplification. Not everything in LLVM handles dead stores well when address space casts are involved, we can simply ask the Attributor to do it for us though. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109998	2021-12-29 00:19:38 -06:00
Johannes Doerfert	7bfcdbcbf3	[OpenMP][NFCI] Embed the source location string size in the ident_t One of the unused ident_t fields now holds the size of the string (=const char *) field so we have an easier time dealing with those in the future. Differential Revision: https://reviews.llvm.org/D113126	2021-12-28 23:53:29 -06:00
Johannes Doerfert	9f04a0ea43	[OpenMP][FIX] Make AAExecutionDomain deterministic	2021-12-28 23:53:29 -06:00
Johannes Doerfert	ba70f3a5d9	[OpenMP][FIX] Make heap2shared deterministic Issue #52875 reported non-determinism, this is the first step to avoid it. We iterate over MallocCalls so we should keep the order stable.	2021-12-28 23:53:28 -06:00
Johannes Doerfert	7de5da2a67	[OpenMP][NFC] Move address space enum into OMPConstants header	2021-12-28 23:53:28 -06:00
Joseph Huber	6e220296d7	[OpenMP] Use alignment information in HeapToShared This patch uses the return alignment attribute now present in the `__kmpc_alloc_shared` runtime call to set the alignment of the shared memory global created to replace it. Depends on D115971 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116319	2021-12-27 16:58:27 -05:00
Joseph Huber	744aa09f52	[OpenMP] Make reduction functions SPMD compatible Reduction functions were guarded before which was wrong, these are SPMD compatible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115159	2021-12-06 12:32:02 -05:00
Joseph Huber	9ea5b97203	[OpenMP][FIX] Invalidate the SPMDCompatibilityTracker explicitly Before SPMDzation it was sufficient to add an incompatible instruction to the SPMDCompatibilityTracker. However, now adding instructions means they need guarding. As calls cannot be guarded in general we need to explicitly prevent SPMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115158	2021-12-06 12:31:57 -05:00
Joseph Huber	058c312a44	[OpenMP][FIX] SPMDzation guarding needs to account for all reaching kernels If two reaching kernels disagree on the execution mode we cannot guard a function right now. Ensure we do not as we otherwise will cause a deadlock. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D114866	2021-12-01 11:44:32 -05:00
Joseph Huber	7986a5f23e	[OpenMP] Add RTL function to externalization RAII This patch adds the `__kmpc_get_warp_size` OpenMP RTL function to the externalization RAII struct. This was getting optimized out and then being replaced with an undefined value once added back in, causing bugs for complex reductions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D114802	2021-11-30 10:19:06 -05:00
Kazu Hirata	d243cbf8ea	[llvm] Use isa instead of dyn_cast (NFC)	2021-11-14 19:40:46 -08:00
Joel E. Denny	c9dfe322ee	[OpenMP] Fix main thread barrier for Pascal and amdgpu Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602	2021-11-12 11:18:45 -05:00
Joseph Huber	e52937eba0	[OpenMP] Use AAAssumptionInfo to get assumptions in OpenMPOpt This patch uses the abstract attributor introduced in D111054 to get the assumption values instead of the `hasAssumption` function. This also calls it so assumption information should propagate throug the device where applicabile. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111445	2021-11-09 17:39:21 -05:00
Johannes Doerfert	d61aac76bf	[OpenMP][FIX] Do not signal SPMD-mode but then keep generic-mode If we assume SPMD-mode during the fixpoint iteration we have to execute the kernel in SPMD-mode. If we change our mind during manifest there is the chance of a mismatch between the simplification, e.g., of `__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem was introduced in D109438. This patch is compromise to resolve the problem purely in OpenMP-opt while trying to keep the benefits of D109438 around. This might not always work, see `get_hardware_num_threads_in_block_fold` but it often does. At the same time we do keep value specialization and execution mode in sync. Proper solutions to this problem should be considered. I believe a new execution mode is the easiest way forward (Singleton-SPMD). Alternatively, SPMD-mode execution can be used with a way to provide a new thread_limit (here 1) to the runtime. This is more general and could be useful if we see `num_threads` clauses or workshared loops with small trip counts in the kernel. In either proposal we need to disable the guarding for the kernel (which was the motivation for D109438). Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D112894	2021-11-02 23:22:04 -05:00
Johannes Doerfert	73720c8059	[OpenMP][FIX] Introduce and use a simple generic-mode barrier Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112893	2021-11-02 23:22:01 -05:00
Johannes Doerfert	e6e440ae5f	[OpenMP][FIX] Ensure guarding uses proper global name Global symbols cannot have any name so we need to sanitize the string first. Also remove an assertion that is not actually necessary nor true in general. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D112892	2021-11-02 23:21:53 -05:00
Kazu Hirata	9800731367	[Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC)	2021-10-24 17:35:35 -07:00
Joseph Huber	f074a6a041	[OpenMP] Add options to change Attributor max iterations in OpenMPOpt This patch adds a new command line option `openmp-opt-max-iterations` that controls the maximum number of iterations the attributor will run for when compiling OpenMP target device code. This patch also adds a remark to indicate when the attributor failed because it did not run for enough iterations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110749	2021-10-04 09:39:04 -04:00
Kazu Hirata	4f0225f6d2	[Transforms] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-01 09:57:40 -07:00
Joseph Huber	c11ebfea6d	[OpenMP][NFC] Fix linting messages in OpenMPOpt Summary: This patch addresses some linting messages I keep getting in my editor when working on OpenMPOpt.	2021-09-29 16:07:33 -04:00
Joseph Huber	87ce7e65f2	[OpenMP] Add missing distribute definitions to AAKernelInfo Summary: The RTL functions added in https://reviews.llvm.org/D110429 were mistakenly left out from the list of safe runtime calls in AAKernelInfo. This patch adds them in.	2021-09-29 16:06:34 -04:00
Johannes Doerfert	c6457dcae8	[OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state This patch fixes a problem when the AAKernelInfo state was invalidated, e.g., due to `optnone` for a kernel, but not all parts indicated the invalidation properly. We further eliminate most full state invalidations as they should never be necessary. Differential Revision: https://reviews.llvm.org/D109468	2021-09-23 00:04:30 -05:00
Johannes Doerfert	0a16c56010	[OpenMP][NFC] Improve debug output	2021-09-23 00:04:29 -05:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00

1 2 3 4 5

207 Commits