Commit Graph

32 Commits

Author SHA1 Message Date
Joseph Huber 754eb1c210 [OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106496
2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis fb0cf01795 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit e9c7291cb2.

Fix failing tests
2021-07-19 07:54:26 -07:00
Giorgis Georgakoudis e9c7291cb2 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102107
2021-07-16 23:27:44 -07:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Giorgis Georgakoudis 207b08a913 [OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101849
2021-05-05 20:08:38 -07:00
Giorgis Georgakoudis 78a7d8c4dd [Utils][NFC] Rename replace-function-regex in update_cc_test_checks
This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101934
2021-05-05 14:19:30 -07:00
Giorgis Georgakoudis f016c06abb Revert "[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks"
This reverts commit 956cae2f09.
2021-05-04 17:12:32 -07:00
Giorgis Georgakoudis 956cae2f09 [OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101849
2021-05-04 16:58:45 -07:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
JonChesterfield 5d02ca49a2 [libomptarget][nvptx] Undef, weak shared variables
[libomptarget][nvptx] Undef, weak shared variables

Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.

Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90248
2020-10-28 14:25:36 +00:00
Alexey Bataev 32ea3397be [OPENMP]Dynamic globalization for parallel target regions.
Summary:
Added support for dynamic memory allocation for globalized variables in
case if execution of target regions in parallel is required.

Reviewers: jdoerfert

Subscribers: jholewinski, yaxunl, guansong, sstefan1, cfe-commits, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D82324
2020-06-25 08:25:24 -04:00
Eli Friedman 62f3ef2b53 [CGCall] Annotate references with "align" attribute.
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.

See also D80072.

Differential Revision: https://reviews.llvm.org/D80166
2020-05-19 20:21:30 -07:00
Alexey Bataev f89cf21337 [OPENMP]Use different addresses for zeroed thread_id/bound_id.
When the parallel region is called directly in the sequential region,
the zeroed tid/bound id are used. But they must point to the different
memory locations as the parameters are marked as noalias.

llvm-svn: 375017
2019-10-16 16:59:01 +00:00
Tim Northover a009a60a91 IR: print value numbers for unnamed function arguments
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.

Also modifies the parser to accept IR in that form for obvious reasons.

llvm-svn: 367755
2019-08-03 14:28:34 +00:00
Alexey Bataev 8c5555c39a [OPENMP][NVPTX]Mark more functions as always_inline for better
performance.

Internally generated functions must be marked as always_inlines in most
cases. Patch marks some extra reduction function + outlined parallel
functions as always_inline for better performance, but only if the
optimization is requested.

llvm-svn: 361269
2019-05-21 15:11:58 +00:00
Alexey Bataev 7b3eabdcd2 [OPENMP][NVPTX]Fix PR40893: Size doesn't match for
'_openmp_teams_reductions_buffer_$_.

nvlink does not handle weak linkage correctly, same symbols with the
different sizes are reported as erroneous though the largest size must
be chosen instead. Patch fixes this problem by using Internal linkage
instead of the Common.

llvm-svn: 356072
2019-03-13 18:21:10 +00:00
Alexey Bataev 8061acd501 [OPENMP][NVPTX]Use faster teams reduction algorithm.
A faster way to reduce the values in teams reductions was found, the
codegen is updated to use this faster algorithm and new runtime functions.

llvm-svn: 354479
2019-02-20 16:36:22 +00:00
Alexey Bataev 25d3de8a0a [OPENMP][NVPTX]Reduce number of barriers in reductions.
After the fix for the syncthreads we don't need to generate extra
barriers for the parallel reductions.

llvm-svn: 350530
2019-01-07 15:45:09 +00:00
Alexey Bataev 8e009036c9 [OPENMP][NVPTX]Use new functions from the runtime library.
Updated codegen to use the new functions from the runtime library.

llvm-svn: 350415
2019-01-04 17:25:09 +00:00
Alexey Bataev 6a1b06bcd4 [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes
buffer.

Seems to me, nvlink has a bug with the proper support of the weakly
linked symbols. It does not allow to define several shared memory buffer
with the different sizes even with the weak linkage. Instead we always
use 128 bytes buffer to prevent nvlink from the error message emission.

llvm-svn: 349540
2018-12-18 21:01:42 +00:00
Alexey Bataev ae51b96f99 [OPENMP][NVPTX]Improved interwarp copy function.
Inlined runtime with the current implementation of the interwarp copy
function leads to the undefined behavior because of the not quite
correct implementation of the barriers. Start using generic
__kmpc_barier function instead of the custom made barriers.

llvm-svn: 349192
2018-12-14 21:00:58 +00:00
Gheorghe-Teodor Bercea 2b40470c61 [OpenMP] Add a new version of the SPMD deinit kernel function
Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D54970

llvm-svn: 347915
2018-11-29 20:53:49 +00:00
Alexey Bataev a116602475 [OPENMP][NVPTX]Basic support for reductions across the teams.
Added basic codegen support for the reductions across the teams.

llvm-svn: 347715
2018-11-27 21:24:54 +00:00
Alexey Bataev f2f39be9ed [OPENMP][NVPTX]Emit correct reduction code for teams/parallel
reductions.

Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.

llvm-svn: 347081
2018-11-16 19:38:21 +00:00
Alexey Bataev 9ea3c38597 [OPENMP][NVPTX] Support memory coalescing for globalized variables.
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.

llvm-svn: 344049
2018-10-09 14:49:00 +00:00
Alexey Bataev 12c62908b5 [OPENMP, NVPTX] Fix reduction of the big data types/structures.
If the shuffle is required for the reduced structures/big data type,
current code may cause compiler crash because of the loading of the
aggregate values. Patch fixes this problem.

llvm-svn: 335377
2018-06-22 19:10:38 +00:00
Alexey Bataev 9ff8083d98 [OPENMP] General code improvements.
llvm-svn: 330154
2018-04-16 20:16:21 +00:00
Alexey Bataev e290ec02c7 [OPENMP, NVPTX] Fix codegen for the teams reduction.
Added NUW flags for all the add|mul|sub operations + replaced sdiv by udiv
as we operate on unsigned values only (addresses, converted to integers)

llvm-svn: 329411
2018-04-06 16:03:36 +00:00
Alexey Bataev b2575930b3 [OPENMP] Fix casting in NVPTX support library.
If the reduction required shuffle in the NVPTX codegen, we may need to
cast the reduced value to the integer type. This casting was implemented
incorrectly and may cause compiler crash. Patch fixes this problem.

llvm-svn: 321818
2018-01-04 20:18:55 +00:00
Jonas Hahnfeld 891c7fb19d [OpenMP] Adjust arguments of nvptx runtime functions
In the future the compiler will analyze whether the OpenMP
runtime needs to be (fully) initialized and avoid that overhead
if possible. The functions already take an argument to transfer
that information to the runtime, so pass in the default value 1.
(This is needed for binary compatibility with libomptarget-nvptx
currently being upstreamed.)

Differential Revision: https://reviews.llvm.org/D40354

llvm-svn: 318836
2017-11-22 14:46:49 +00:00
Arpith Chacko Jacob fc711b1f47 [OpenMP] Teams reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any teams construct for elementary data types.  It builds
on parallel reductions on the GPU.  Subsequently,
the team master writes to a unique location in a global
memory scratchpad.  The last team to do so loads and
reduces this array to calculate the final result.

This patch emits two helper functions that are used by
the OpenMP runtime on the GPU to perform reductions across
teams.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29879

llvm-svn: 295335
2017-02-16 16:48:49 +00:00