llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	7d57639264	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Giorgis Georgakoudis	3f71b425b2	[Attributor] Preserve BBs and instructions added in AA manifests Manifesting AbstractAttributes may add new BBs in the IR. This patch provides an interface to register those BBs in the Attributor so that those BBs and containing instructions are not deleted as dead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106383	2021-07-21 11:27:00 -07:00
Giorgis Georgakoudis	e8439ec893	[OpenMP] Set RequiresFullRuntime false in SPMDization SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105556	2021-07-20 09:54:51 -07:00
Johannes Doerfert	97387fdf6d	[OpenMP] Fix carefully track SPMDCompatibilityTracker We did not properly use SPMDCompatibilityTracker in various places. This patch makes sure we look at the validity properly and also fix the state if we can. Differential Revision: https://reviews.llvm.org/D106085	2021-07-19 22:47:03 -05:00
Shilei Tian	d3454ee8d2	[AbstractAttributor] Fix two issues in folding __kmpc_is_spmd_exec_mode This patch fixed two issues found when folding `__kmpc_is_spmd_exec_mode`: 1. When the reaching kernels are empty, it should not fold to generic mode. 2. When creating AA for the caller when updating information, the dependency should be required. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D106209	2021-07-17 13:13:44 -04:00
Joseph Huber	b910a109f8	[OpenMP][NFC] Update the comment header for optimizations.	2021-07-16 14:13:13 -04:00
Joseph Huber	2c31d5ebfb	[OpenMP] Add IDs to OpenMP remarks This patch adds unique idenfitiers to the existing OpenMP remarks. This makes it easier to identify the corresponding documentation for each remark that will be hosted in the OpenMP webpage. Depends on D105898 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105939	2021-07-16 14:07:03 -04:00
Joseph Huber	eef6601b0f	[OpenMP] Rework OpenMP remarks This patch rewrites and reworks a few of the existing remarks to make the mmore concise and consistent prior to writing the documentation for them. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105898	2021-07-16 14:07:00 -04:00
Shilei Tian	c23da666b5	[Attributor] Add support for compound assignment for ChangeStatus A common use of `ChangeStatus` is as follows: ``` ChangeStatus Changed = ChangeStatus::UNCHANGED; Changed \|= foo(); ``` where `foo` returns `ChangeStatus` as well. Currently `ChangeStatus` doesn't support compound assignment, we have to write as ``` Changed = Changed \| foo(); ``` which is not that convenient. This patch add the support for compound assignment for `ChangeStatus`. Compound assignment is usually implemented as a member function, and binary arithmetic operator is therefore implemented using compound assignment. However, unlike regular C++ class, enum class doesn't support member functions. As a result, they can only be implemented in the way shown in the patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106109	2021-07-15 23:51:46 -04:00
Shilei Tian	ca662297d5	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-15 18:23:23 -04:00
Shilei Tian	a70ef3f568	Revert "[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible" This reverts commit `1100e4aafe`.	2021-07-15 11:19:28 -04:00
Shilei Tian	1100e4aafe	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-13 22:28:35 -04:00
Johannes Doerfert	792aac9897	[Attributor][NFCI] Add UsedAssumedInformation to more interfaces As with other Attributor interfaces we often want to know if assumed information was used to answer a query. This is important if only known information is allowed or if known information can lead to an early fixpoint. The users have been adjusted but none of them utilizes the new information yet.	2021-07-11 19:18:03 -05:00
Johannes Doerfert	514c033db1	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 18:44:25 -05:00
Johannes Doerfert	8cb7d71355	[OpenMP][FIX] Add missing `)` to remark	2021-07-10 18:40:32 -05:00
Johannes Doerfert	d9659bf6a0	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 17:57:08 -05:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Johannes Doerfert	c1c1fe9385	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 16:32:24 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	269416d419	[Attributor][NFCI] Add UsedAssumedInformation to more interfaces As with other Attributor interfaces we often want to know if assumed information was used to answer a query. This is important if only known information is allowed or if known information can lead to an early fixpoint. The users have been adjusted but none of them utilizes the new information yet.	2021-07-10 12:32:51 -05:00
Johannes Doerfert	d39179d7fa	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 12:32:51 -05:00
Johannes Doerfert	f0628c6ff7	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 12:32:50 -05:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Johannes Doerfert	1eb31d6de3	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 12:32:50 -05:00
Joseph Huber	ecabc6684f	[OpenMP] Change analysis remarks to not emit on cold functions The remarks will trigger on some functions that are marked cold, such as the `__muldc3` intrinsic functions. Change the remarks to avoid these functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105196	2021-06-30 11:54:24 -04:00
Joseph Huber	0edb87773b	[OpenMP] Add additional remarks for OpenMPOpt This patch adds additional remarks, suggesting the use of `noescape` for failed globalization and indicating when internalization failed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105150	2021-06-30 09:49:25 -04:00
Joseph Huber	57ad2e1067	[OpenMP] Prevent OpenMPOpt from internalizing uncalled functions Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary. Depends on D102423 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104890	2021-06-28 16:47:53 -04:00
Joseph Huber	13b2fba239	[OpenMP][NFC] Fix typo in OpenMPOpt	2021-06-28 09:49:14 -04:00
Joseph Huber	4024087731	[OpenMP][NFC] Fix missing argument	2021-06-28 09:15:01 -04:00
Joseph Huber	4a6bd8e3e7	[OpenMP] Increase attributor iterations on the GPU Increase the number of attributor iterations on a GPU target. I forgot to change this in D104416. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104920	2021-06-28 08:50:49 -04:00
Joseph Huber	5ccb7424fa	[OpenMP] Change OpenMPOpt to check openmp metadata The metadata added in D102361 introduces a module flag that we can check to determine if the module was compiled with `-fopenmp` enables. We can now check for the precense of this instead of scanning the call graph for OpenMP runtime functions. Depends on D102361 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102423	2021-06-25 16:34:22 -04:00
Joseph Huber	1cfdcae653	[Attributor] Fix AAExecutionDomain returning true on invalid states This patch fixes a problem with the AAExecutionDomain attributor not checking if it is in a valid state. This can cause it to incorrectly return that a block is executed in a single threaded context after the attributor failed for any reason. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103186	2021-06-22 18:12:43 -04:00
Joseph Huber	44feacc736	[OpenMP] Change remaining globalization from an analysis remark to missed After landing the globalization optimizations, the precense of globalization on the device that was not put in shared or stack memory is a failed optimization with performance consequences so it should indicate a missed remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104735	2021-06-22 16:52:06 -04:00
Joseph Huber	ca1560da72	[OpenMP][NFC] Add new optimizations to OpenMPOpt comment header Summary: Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.	2021-06-22 14:40:31 -04:00
Joseph Huber	b54ccab509	[Attributor] Add an option to increase the max number of iterations Right now the Attributor defaults to 32 fixed point iterations unless it is set explicitly by a command line flag. This patch allows this to be configured when the attributor instance is created. The maximum is then increased in OpenMPOpt if the target is a kernel. This is because the globalization analysis can result in larger iteration counts due to many dependent instances running at once. Depends on D102444 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104416	2021-06-22 14:38:25 -04:00
Joseph Huber	30e36c9b3c	[Attributor] Add interface to emit remarks in Attributor Summary: This patch adds support for the Attributor to emit remarks on behalf of some other pass. The attributor can now optionally take a callback function that returns an OptimizationRemarkEmitter object when given a Function pointer. If this is availible then a remark will be emitted for the corresponding pass name. Depends on D102197 Reviewed By: sstefan1 thegameg Differential Revision: https://reviews.llvm.org/D102444	2021-06-22 14:12:46 -04:00
Joseph Huber	7d69da71dd	[OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls Summary: The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087. Depends on D97818 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102197	2021-06-22 13:23:05 -04:00
Joseph Huber	03d7e61c87	[OpenMP] Internalize functions in OpenMPOpt to improve IPO passes Summary: Currently the attributor needs to give up if a function has external linkage. This means that the optimization introduced in D97818 will only apply to static functions. This change uses the Attributor to internalize OpenMP device routines by making a copy of each function with private linkage and replacing the uses in the module with it. This allows for the optimization to be applied to any regular function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102824	2021-06-22 12:38:10 -04:00
Joseph Huber	6fc51c9f7d	[OpenMP] Replace GPU globalization calls with shared memory in the middle-end Summary: The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97818	2021-06-22 11:55:44 -04:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Johannes Doerfert	9a23e673ca	[OpenMP][NFC] Expose AAExecutionDomain and rename its getter The initial use for AAExecutionDomain was to determine if a single thread executes a block. While this is sometimes informative most of the time, and for other reasons, we actually want to know if it is the "initial thread". Thus, the thread that started execution on the current device. The deduction needs to be adjusted in a follow up as the methods we use right not are looking for the OpenMP thread id which is resets whenever a thread enters a parallel region. What we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and equivalent functions.	2021-06-18 01:07:52 -05:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Joseph Huber	2db182ff8d	[Diagnostics] Allow emitting analysis and missed remarks on functions Summary: Currently, only `OptimizationRemarks` can be emitted using a Function. Add constructors to allow this for `OptimizationRemarksAnalysis` and `OptimizationRemarkMissed` as well. Reviewed By: jdoerfert thegameg Differential Revision: https://reviews.llvm.org/D102784	2021-05-19 15:10:20 -04:00
Joseph Huber	68abc3d264	[Attributor] Change AAExecutionDomain to only accept intrinsics Summary: The OpenMP runtime functions don't always provide unique thread ID's to determine if a basic block is truly single-threaded. Change the implementation to only check NVPTX intrinsics for now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102700	2021-05-18 21:19:26 -04:00
Joseph Huber	8b57ed09bd	[OpenMP] Prevent Attributor from deleting functions in OpenMPOptCGSCC pass Summary: This patch prevents the Attributor instances made in the CGSCC pass from deleting functions. This prevents the attributor from changing the call graph while OpenMPOpt is working with it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102363	2021-05-13 16:35:23 -04:00
Joseph Huber	182831258b	[Attributor] Add AAExecutionDomainInfo interface to OpenMPOpt Summary: Add the AAExecutionDomainInfo attributor instance to OpenMPOpt. This will infer information relating to domain information that an instruction might be expecting in. Right now this only includes a very crude check for instructions that will be executed by the master thread by comparing a thread-id function with a constant zero. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101578	2021-05-03 19:24:19 -04:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Joseph Huber	b2ad63d3cf	[OpenMP] Add OpenMPOpt as a Module pass Summary: This patch registers OpenMPOpt as a Module pass in addition to a CGSCC pass. This is so certain optimzations that are sensitive to intact call-sites can happen before inlining. The old `openmpopt` pass name is changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass. The current module pass only runs a single check but will be expanded in the future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D99202	2021-04-20 12:28:58 -04:00
Michael Kruse	b119120673	[clang][OpenMP] Use OpenMPIRBuilder for workshare loops. Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to: * Only the worksharing-loop directive. * Recognizes only the nowait clause. * No loop nests with more than one loop. * Untested with templates, exceptions. * Semantic checking left to the existing infrastructure. This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics: * The distance function: How many loop iterations there will be before entering the loop nest. * The loop variable function: Conversion from a logical iteration number to the loop variable. These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang. The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does. For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting. Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D94973	2021-03-04 22:52:59 -06:00
Johannes Doerfert	5b70c12f3e	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Giorgis Georgakoudis	9751705512	[OpenMPOpt][WIP] Expand parallel region merging The existing implementation of parallel region merging applies only to consecutive parallel regions that have speculatable sequential instructions in-between. This patch lifts this limitation to expand merging with any sequential instructions in-between, except calls to unmergable OpenMP runtime functions. In-between sequential instructions in the merged region are sequentialized in a "master" region and any output values are broadcasted to the following parallel regions and the sequential region continuation of the merged region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90909	2021-01-11 08:06:23 -08:00
Arthur Eubanks	7fea561eb1	[CGSCC][Coroutine][NewPM] Properly support function splitting/outlining Previously when trying to support CoroSplit's function splitting, we added in a hack that simply added the new function's node into the original function's SCC (https://reviews.llvm.org/D87798). This is incorrect since it might be in its own SCC. Now, more similar to the previous design, we have callers explicitly notify the LazyCallGraph that a function has been split out from another one. In order to properly support CoroSplit, there are two ways functions can be split out. One is the normal expected "outlining" of one function into a new one. The new function may only contain references to other functions that the original did. The original function must reference the new function. The new function may reference the original function, which can result in the new function being in the same SCC as the original function. The weird case is when the original function indirectly references the new function, but the new function directly calls the original function, resulting in the new SCC being a parent of the original function's SCC. This form of function splitting works with CoroSplit's Switch ABI. The second way of splitting is more specific to CoroSplit. CoroSplit's Retcon and Async ABIs split the original function into multiple functions that all reference each other and are referenced by the original function. In order to keep the LazyCallGraph in a valid state, all new functions must be processed together, else some nodes won't be populated. To keep things simple, this only supports the case where all new edges are ref edges, and every new function references every other new function. There can be a reference back from any new function to the original function, putting all functions in the same RefSCC. This also adds asserts that all nodes in a (Ref)SCC can reach all other nodes to prevent future incorrect hacks. The original hacks in https://reviews.llvm.org/D87798 are no longer necessary since all new functions should have been registered before calling updateCGAndAnalysisManagerForPass. This fixes all coroutine tests when opt's -enable-new-pm is true by default. This also fixes PR48190, which was likely due to the previous hack breaking SCC invariants. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D93828	2021-01-06 11:19:15 -08:00
Johannes Doerfert	994bb6eb7d	[OpenMP][NFC] Provide a new remark and documentation If a GPU function is externally reachable we give up trying to find the (unique) kernel it is called from. This can hinder optimizations. Emit a remark and explain mitigation strategies. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93439	2020-12-17 14:38:26 -06:00
Johannes Doerfert	dcaec81211	[OpenMP] Use assumptions during ICV tracking The OpenMP 5.1 assumptions `no_openmp` and `no_openmp_routines` allow us to ignore calls that would otherwise prevent ICV tracking. Once we track more ICVs we might need to distinguish the ones that could be impacted even with `no_openmp_routines`. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D92050	2020-12-15 16:51:34 -06:00
Johannes Doerfert	d08d490a4c	[OpenMPOpt][NFC] Clang format	2020-12-15 16:51:34 -06:00
Alex Zinenko	240dd92432	[OpenMPIRBuilder] forward arguments as pointers to outlined function OpenMPIRBuilder::createParallel outlines the body region of the parallel construct into a new function that accepts any value previously defined outside the region as a function argument. This function is called back by OpenMP runtime function __kmpc_fork_call, which expects trailing arguments to be pointers. If the region uses a value that is not of a pointer type, e.g. a struct, the produced code would be invalid. In such cases, make createParallel emit IR that stores the value on stack and pass the pointer to the outlined function instead. The outlined function then loads the value back and uses as normal. Reviewed By: jdoerfert, llitchev Differential Revision: https://reviews.llvm.org/D92189	2020-12-02 14:59:41 +01:00
Joseph Huber	da8bec47ab	[OpenMP] Add Location Fields to Libomptarget Runtime for Debugging Summary: Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values. Reviewers: jdoerfert grokos Differential Revision: https://reviews.llvm.org/D87946	2020-11-19 12:01:53 -05:00
Joseph Huber	97e55cfef5	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;" Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D89802	2020-11-18 15:28:39 -05:00
Michael Kruse	e5dba2d7e5	[OMPIRBuilder] Start 'Create' methods with lower case. NFC. For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews. This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers. I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller. Reviewed By: mehdi_amini, fghanim Differential Revision: https://reviews.llvm.org/D91109	2020-11-09 19:35:11 -06:00
Benjamin Kramer	207cf71fa9	Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" This reverts commit `d981c7b758` and `a87d7b3d44`. Test fails under msan.	2020-10-28 13:58:14 +01:00
Joseph Huber	a87d7b3d44	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;". See clang/test/OpenMP/target_map_names.cpp for an example of the generated output for a given map clause. Reviewers: jdoervert Differential Revision: https://reviews.llvm.org/D89802	2020-10-27 16:09:19 -04:00
Giorgis Georgakoudis	3a6bfcf2f9	[OpenMPOpt] Merge parallel regions There are cases that generated OpenMP code consists of multiple, consecutive OpenMP parallel regions, either due to high-level programming models, such as RAJA, Kokkos, lowering to OpenMP code, or simply because the programmer parallelized code this way. This optimization merges consecutive parallel OpenMP regions to: (1) reduce the runtime overhead of re-activating a team of threads; (2) enlarge the scope for other OpenMP optimizations, e.g., runtime call deduplication and synchronization elimination. This implementation defensively merges parallel regions, only when they are within the same BB and any in-between instructions are safe to execute in parallel. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83635	2020-10-09 09:59:04 -07:00
Joseph Huber	82453e759c	[OpenMP] Add Missing Runtime Call for Globalization Remarks Summary: Add a missing runtime call to perform data globalization checks. Reviewers: jdoerfert Subscribers: guansong hiraditya llvm-commits sstefan1 yaxunl Tags: #LLVM #OpenMP Differential Revision: https://reviews.llvm.org/D88621	2020-10-01 21:19:53 -04:00
sstefan1	cb9cfa0d2f	[OpenMPOpt][Fix] Only initialize ICV initial values once. Reviewers: jdoerfert, ggeorgakoudis Differential Revision: https://reviews.llvm.org/D88441	2020-09-29 12:22:58 +02:00
Joseph Huber	a22814194e	[OpenMP] OpenMPOpt Support for Globalization Remarks Summary: This patch add support for printing analysis messages relating to data globalization on the GPU. This occurs when data is shared between the threads in a GPU context and must be pushed to global or shared memory. Reviewers: jdoerfert Subscribers: guansong hiraditya llvm-commits ormris sstefan1 yaxunl Tags: #OpenMP #LLVM Differential Revision: https://reviews.llvm.org/D88243	2020-09-24 18:23:12 -04:00
Hamilton Tobon Mosquera	bd31abc1d0	[OpenMPOpt] Refactored "issue" and "wait" declarations for data map runtime call. Refactored __tgt_target_data_begin_mapper_<issue\|wait> to receive the handle as an input/output argument. This given the compiler warning of returning the handle as copy. Differential Revision: https://reviews.llvm.org/D88029	2020-09-22 10:50:17 -05:00
Wei Wang	4eef14f978	[OpenMPOpt] Assume indirect call always changes ICV When checking call sites, give special handling to indirect call, as the callee may be unknown and can lead to nullptr dereference later. Assume conservatively that the ICV always changes in such case. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D87104	2020-09-04 09:05:32 -07:00
Hamilton Tobon Mosquera	1d3d9b9cd8	[OpenMPOpt][NFC] Moving constants as struct static attributes	2020-08-31 19:05:00 -05:00
Hamilton Tobon Mosquera	8931add617	[OpenMPOpt][HideMemTransfersLatency] Get values stored in offload arrays getValuesInOffloadArrays goes through the offload arrays in __tgt_target_data_begin_mapper getting the values stored in them before the call is issued. call void @__tgt_target_data_begin_mapper(arg0, arg1, i8 %offload_baseptrs, i8 %offload_ptrs, i64* %offload_sizes, ...) Diferential Revision: https://reviews.llvm.org/D86300	2020-08-31 15:33:05 -05:00
sstefan1	5dfd7cc46c	Reland [OpenMPOpt] ICV tracking for calls The problem with module slice has been addressed in D86319 Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a function can have a unique ICV value after it is finished, and AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a call site. This enables us to check the value of a call and if it changes the ICV. This also changes the approach in `getReplacementValues()` to a worklist-based approach so we can explore all relevant BBs. Differential Revision: https://reviews.llvm.org/D85544	2020-08-30 11:27:48 +02:00
sstefan1	8d8ce85b23	[Attributor] Introduce module slice. Summary: The module slice describes which functions we can analyze and transform while working on an SCC as part of the Attributor-CGSCC pass. So far we simply restricted it to the SCC. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D86319	2020-08-30 10:30:44 +02:00
serge-sans-paille	4e29d25669	Fix OpenMP deduplicateRuntimeCalls return status Differential Revision: https://reviews.llvm.org/D86705	2020-08-27 15:01:04 +02:00
Johannes Doerfert	1de70a724e	Revert "[OpenMPOpt] ICV tracking for calls" This commits breaks certain OpenMP codes (on power) because it expanded the Attributor scope without telling the Attributor about the SCC extend. See: https://reviews.llvm.org/D85544#2227611 This reverts commit `b0b32e6490`.	2020-08-20 00:00:35 -05:00
Hamilton Tobon Mosquera	bd2fa1819b	[OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of __tgt_target_data_begin_mapper canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point. Differential Revision: https://reviews.llvm.org/D86155	2020-08-19 11:42:22 -05:00
sstefan1	b0b32e6490	[OpenMPOpt] ICV tracking for calls Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a function can have a unique ICV value after it is finished, and AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a call site. This enables us to check the value of a call and if it changes the ICV. This also changes the approach in `getReplacementValues()` to a worklist-based approach so we can explore all relevant BBs. Differential Revision: https://reviews.llvm.org/D85544	2020-08-19 11:43:12 +02:00
Hamilton Tobon Mosquera	496f8e5b36	[OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts. WIP that tries to hide the latency of runtime calls that involve host to device memory transfers by splitting them into their "issue" and "wait" versions. The "issue" is moved upwards as much as possible. The "wait" is moved downards as much as possible. The "issue" issues the memory transfer asynchronously, returning a handle. The "wait" waits in the returned handle for the memory transfer to finish. We still lack of the movement.	2020-08-17 20:56:10 -05:00
Roman Lebedev	351d234d86	[OpenMPOpt] Most SCC's are uninteresting, don't waste time on them (up to 16x faster) Summary: This seems obvious in hindsight, but the result is surprising. I've measured compile-time of `-openmpopt` pass standalone on RawSpeed unity build, and while there is some OpenMP stuff, most is not OpenMP. But nonetheless the pass does a lot of costly preparations before ever trying to look for OpenMP stuff in SCC. Numbers (n=25): 0.094624s -> 0.005976s, an -93.68% improvement, or ~16x Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, hiraditya, guansong, llvm-commits, sstefan1 Tags: #llvm Differential Revision: https://reviews.llvm.org/D84689	2020-07-27 23:36:34 +03:00
Giorgis Georgakoudis	694ded37b9	[OpenMPOpt] Fix preserved analyses return	2020-07-14 23:18:43 -07:00
Johannes Doerfert	fec1f2109f	[OpenMP] Emit remarks during GPU state machine optimization Since D83271 we can optimize the GPU state machine to avoid spurious call edges that increase the register usage of kernels. With this patch we inform the user why and if this optimization is happening and when it is not. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D83707	2020-07-14 22:33:57 -05:00
Luofan Chen	233af8958e	[Attributor] Create getter function for the ID of the abstract attribute Summary: The `getIdAddr()` function returns the address of the ID of the abstract attribute Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: okura, hiraditya, uenoku, kuter, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83172	2020-07-15 09:55:18 +08:00
Michael Liao	81db614411	Fix `-Wunused-variable` warnings. NFC.	2020-07-11 10:09:44 -04:00
Johannes Doerfert	dce6bc18c4	[OpenMP][FIX] remove unused variable and long if-else chain MSVC throws an error if you use "too many" if-else in a row: `Frontend/OpenMP/OMPKinds.def(570): fatal error C1061: compiler limit: blocks nested too deeply` We work around it now...	2020-07-11 02:37:57 -05:00
Mehdi Amini	c44702bcdf	Remove unused variable `KMPC_KERNEL_PARALLEL_WORK_FN_PTR_ARG_NO` (NFC) This fixes a compiler warning.	2020-07-11 07:17:28 +00:00
Johannes Doerfert	5b0581aedc	[OpenMP] Replace function pointer uses in GPU state machine In non-SPMD mode we create a state machine like code to identify the parallel region the GPU worker threads should execute next. The identification uses the parallel region function pointer as that allows it to work even if the kernel (=target region) and the parallel region are in separate TUs. However, taking the address of a function comes with various downsides. With this patch we will identify the most common situation and replace the function pointer use with a dummy global symbol (for identification purposes only). That means, if the parallel region is only called from a single target region (or kernel), we do not use the function pointer of the parallel region to identify it but a new global symbol. Fixes PR46450. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83271	2020-07-11 01:44:00 -05:00
Johannes Doerfert	624d34afff	[OpenMP] Compute a proper module slice for the CGSCCC pass The module slice describes which functions we can analyze and transform while working on an SCC as part of the CGSCC OpenMPOpt pass. So far, we simply restricted it to the SCC. In a follow up we will need to have a bigger scope which is why this patch introduces a proper identification of the module slice. In short, everything that has a transitive reference to a function in the SCC or is transitively referenced by one is fair game. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D83270	2020-07-11 01:44:00 -05:00
Johannes Doerfert	e8039ad4de	[OpenMP] Identify GPU kernels (aka. OpenMP target regions) We now identify GPU kernels, that is entry points into the GPU code. These kernels (can) correspond to OpenMP target regions. With this patch we identify and on request print them via remarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83269	2020-07-11 01:44:00 -05:00
Johannes Doerfert	54bd3751ce	[OpenMP][NFC] Add convenient helper and early exit check	2020-07-11 00:51:51 -05:00
Johannes Doerfert	b726c55709	[OpenMP][NFC] Fix some typos	2020-07-11 00:51:51 -05:00
sstefan1	b8235d2bd8	Reland "[OpenMPOpt] ICV Tracking" This reverts commit `1d542f0ca8`. `recollectUses()` is added to prevent looking at dead uses after Attributor run. This is the first and most basic ICV Tracking implementation. For this first version, we only support deduplication within the same BB. Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku, baziotis, lebedev.ri Differential Revision: https://reviews.llvm.org/D81788	2020-07-11 02:25:57 +02:00
Roman Lebedev	1d542f0ca8	Revert "[OpenMPOpt] ICV Tracking" There appears to be some kind of memory corruption/use-after-free/etc going on here. In particular, in `OpenMPOpt::deleteParallelRegions()`, in `DeleteCallCB()`, `CI` is garbage. WIll post reproducer in the original review. This reverts commit `6c4a5e9257`.	2020-07-10 19:00:15 +03:00
sstefan1	6aab27ba85	[OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. Summary: D82193 exposed a problem with global type definitions in `OMPConstants.h`. This causes a race when running in thinLTO mode. Types now live inside of OpenMPIRBuilder to prevent this from happening. Reviewers: jdoerfert Subscribers: yaxunl, hiraditya, guansong, dexonsmith, aaron.ballman, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D83176	2020-07-08 17:23:55 +02:00
sstefan1	6c4a5e9257	[OpenMPOpt] ICV Tracking This is the first and most basic ICV Tracking implementation. For this first version, we only support deduplication within the same BB. Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku, baziotis Differential Revision: https://reviews.llvm.org/D81788	2020-07-04 23:31:50 +02:00
sstefan1	61238d2690	[OpenMPOpt][Fix] Remove double initialization of omp::types.	2020-07-02 19:51:54 +02:00
sstefan1	951e43f357	[OpenMPOpt][NFC] Change ICV macros for initial value This fixes build breaks when system headers are difining FALSE.	2020-06-26 15:34:43 +00:00
sstefan1	0f426935bb	[OpenMPOpt] ICV macro definitions Summary: This defines some basic information about ICVs in `OMPKinds.def`. We also emit remarks with initial values for each function (which are default for now) as a way to test this. Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6 Subscribers: yaxunl, hiraditya, guansong, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82193	2020-06-24 13:43:35 +02:00
Mehdi Amini	77b79d79c0	Remove "unused" member ModuleSlice from `struct OpenMPOpt` This is fixing warning from clang: warning: private field 'ModuleSlice' is not used [-Wunused-private-field] SmallPtrSetImpl<Function *> &ModuleSlice; ^ Differential Revision: https://reviews.llvm.org/D82027	2020-06-18 03:02:26 +00:00
Eric Christopher	a8dad30388	Revert "Remove unused class variable ModuleSlice." as it was used in debug only code. This reverts commit `07a1749081`.	2020-06-17 14:45:17 -07:00

1 2 3 4

171 Commits