Commit Graph

902 Commits

Author SHA1 Message Date
Peyton, Jonathan L f7655f3df3 [OpenMP] Fix improper printf format specifier 2021-06-02 11:04:48 -05:00
Hansang Bae 7ba4e96ede [OpenMP] Use new task type/flag for taskwait depend events.
Differential Revision: https://reviews.llvm.org/D103464
2021-06-02 10:16:38 -05:00
Peyton, Jonathan L 2020c981fa [OpenMP] Add L2-Tile equivalence for KNL
When on KNL and L2 or Tile layer is detected, manually add
the corresponding layer which is equivalent.

Differential Revision: https://reviews.llvm.org/D102865
2021-06-01 14:17:13 -05:00
Hansang Bae cf5c94ef08 [OpenMP] Define named constants for interop's foreign runtime ID
Also added missing Fortran definitions for interop support.

Differential Revision: https://reviews.llvm.org/D102883
2021-06-01 13:06:59 -05:00
Hansang Bae 95cefacfe1 [OpenMP] Fix crashing critical section with hint clause
Runtime was using the default lock type without using the hint.

Differential Revision: https://reviews.llvm.org/D102955
2021-05-24 17:25:01 -05:00
AndreyChurbanov aa6e7e8da8 [OpenMP] libomp: move warnings to after library initialization
Warnings on deprecated api cannot be suppressed if the library is not initialized.
With this change it is possible to set KMP_WARNINGS=false to suppress the warnings.

Differential Revision: https://reviews.llvm.org/D102676
2021-05-21 23:47:23 +03:00
Christopher Pulido 4fb0aaf033 [OpenMP] Changes to enable MSVC ARM64 build of libomp
This is the first in a series of changes to the OpenMP runtime
that have been done internally by Microsoft. This patch makes
the necessary changes to enable libomp.dll to build with
the MSVC compiler targeting ARM64.

Differential Revision: https://reviews.llvm.org/D101173
2021-05-11 23:03:12 +03:00
Peyton, Jonathan L c765d140fe [OpenMP] Fix hidden helper + affinity
When KMP_AFFINITY is set, each worker thread's gtid value is used as an
index into the place list to determine the thread's placement. With hidden
helpers enabled, this gtid value is shifted down leading to unexpected
shifted thread placement. This patch restores the previous behavior by
adjusting the mask index to take the number of hidden helper threads
into account.

Hidden helper threads are given the full initial mask and do not
participate in any of the other affinity mechanisms (place partitioning,
balanced affinity). Their affinity is only printed for debug builds.

Differential Revision: https://reviews.llvm.org/D101882
2021-05-11 08:54:22 -05:00
Peyton, Jonathan L 9982f33e2c [OpenMP] Refactor/Rework topology discovery code
This patch does the following:

1) Introduce kmp_topology_t as the runtime-friendly structure (the
corresponding global variable is __kmp_topology) to determine the
exact machine topology which can vary widely among current and future
architectures. The current design is not easy to expand beyond the assumed
three layer topology: sockets, cores, and threads so a rework capable of
using the existing KMP_AFFINITY mechanisms is required.

This new topology structure has:
* The depth and types of the topology
* Ratio count for each consecutive level (e.g., number of cores per
   socket, number of threads per core)
* Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
* Equivalent topology layer map (e.g., Numa domain is equivalent to
   socket, L1/L2 cache equivalent to core)
* Whether it is uniform or not

The hardware threads are represented with the kmp_hw_thread_t
structure. This structure contains the ids (e.g., socket 0, core 1,
thread 0) and other information grabbed from the previous Address
structure. The kmp_topology_t structure contains an array of these.

2) Generalize the KMP_HW_SUBSET envirable for the new
kmp_topology_t structure. The algorithm doesn't assume any order with
tiles,numa domains,sockets,cores,threads. Instead it just parses the
envirable, makes sure it is consistent with the detected topology
(including taking into account equivalent layers) and then trims away
the unneeded subset of hardware threads. To enable this, a new
kmp_hw_subset_t structure is introduced which contains a vector of
items (hardware type, number user wants, offset). Any keyword within
__kmp_hw_get_keyword() can be used as a name and can be shortened as
well. e.g.,
KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.

3) Simplify topology detection functions so they only do the singular
task of detecting the machine's topology. Printing, and all
canonicalizing functionality is now done afterwards. So many lines of
duplicated code are eliminated.

4) Add new ll_caches and numa_domains to OMP_PLACES, and
consequently, KMP_AFFINITY's granularity setting. All the names within
__kmp_hw_get_keyword() are available for use in OMP_PLACES or
KMP_AFFINITY's granularity setting.

5) Simplify and future-proof code where explicit lists of allowed
affinity settings keywords inside if() conditions.

6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method
so equivalent caches could be detected (in particular for the ll_caches
place).

Differential Revision: https://reviews.llvm.org/D100997
2021-05-03 18:00:24 -05:00
Martin Storsjö 01d27fc408 [OpenMP] Fix warnings due to redundant semicolons. NFC. 2021-05-02 21:51:06 +03:00
Peyton, Jonathan L 4457565757 [OpenMP] Implement GOMP task reductions
Implement the remaining GOMP_* functions to support task reductions
in taskgroup, parallel, loop, and taskloop constructs.  The unused mem
argument to many of the work-sharing constructs has to do with the
scan() directive/ inscan() modifier.  If mem is set, each function
will call KMP_FATAL() and tell the user scan/inscan is unsupported.  The
GOMP reduction implementation is kept separate from our implementation
because of how GOMP presents reduction data and computes the reductions.
GOMP expects the privatized copies to be present even after a #pragma
omp parallel reduction(task:...) region has ended so the data is stored
inside GOMP's uintptr_t* data pseudo-structure.  This style is tightly
coupled with GCC compiler codegen.  There also isn't any init(),
combiner(), fini() functions in GOMP's codegen so the two
implementations were to disparate to try to wrap GOMP's around our own.

Differential Revision: https://reviews.llvm.org/D98806
2021-04-16 16:36:31 -05:00
Peyton, Jonathan L 5ebbb366c4 [OpenMP] Allow affinity to re-detect for child processes
Current atfork() handler for child processes does not reset
the affinity masks array which prevents users from setting their own
affinity in child processes.

Differential Revision: https://reviews.llvm.org/D99218
2021-04-16 16:34:02 -05:00
Hansang Bae 9b98497b44 [OpenMP] Add omp_target_is_accessible() to header files
-- Added omp_target_is_accessible to the header files
-- Added missing const qualifier to device memory routines

Differential Revision: https://reviews.llvm.org/D100420
2021-04-16 07:54:15 -05:00
Hansang Bae 77dc7b4653 [OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT
Also fixed typo in the verbose message.

Differential Revision: https://reviews.llvm.org/D100414
2021-04-14 07:55:26 -05:00
Hansang Bae 3da61ddae7 [OpenMP] Define omp_is_initial_device() variants in omp.h
omp_is_initial_device() is marked as a built-in function in the current
compiler, and user code guarded by this call may be optimized away,
resulting in undesired behavior in some cases. This patch provides a
possible fix for such cases by defining the routine as a variant
function and removing it from builtin list.

Differential Revision: https://reviews.llvm.org/D99447
2021-04-06 16:58:01 -05:00
Peyton, Jonathan L 2aebb7cb3c [OpenMP] Fix incorrect KMP_STRLEN() macro
The second argument to the strnlen_s(str, size) function should be
sizeof(str) when str is a true array of characters with known size
(instead of just a char*). Use type traits to determine if first
parameter is a character array and use the correct size based on that
trait.

Differential Revision: https://reviews.llvm.org/D98209
2021-04-05 09:03:09 -05:00
Hansang Bae 467f39249d [OpenMP] Misc. changes that add or remove pointer/bound checks
-- Added or moved checks to appropriate places.
-- Removed ineffective null check where the pointer is already being
   dereferenced around the code.
-- Initialized variables that can be used without definitions.
-- Added call to dlclose/FreeLibrary in OMPT tool activation.
-- Added a new build compiler definition.

Differential Revision: https://reviews.llvm.org/D98584
2021-03-23 18:55:08 -05:00
Shilei Tian 2df65f87c1 [OpenMP] Fixed a crash in hidden helper thread
It is reported that after enabling hidden helper thread, the program
can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
`N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
`num_threads` clause, we need to expand `__kmp_threads`. In
`__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is `Y`.  If `Y` happens to meet the constraint
`(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.

Here is an example.
```
#include <vector>

int main(int argc, char *argv[]) {
  constexpr const size_t N = 1344;
  std::vector<int> data(N);

#pragma omp parallel for
  for (unsigned i = 0; i < N; ++i) {
    data[i] = i;
  }

#pragma omp parallel for num_threads(N)
  for (unsigned i = 0; i < N; ++i) {
    data[i] += i;
  }

  return 0;
}
```
My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
`__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
hit.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D98838
2021-03-18 18:25:36 -04:00
Hansang Bae a6f9cb6adc [OpenMP] Add runtime interface for OpenMP 5.1 error directive
The proposed new interface is for supporting `at(execution)` clause in the
error directive.

Differential Revision: https://reviews.llvm.org/D98448
2021-03-16 08:55:25 -05:00
Peyton, Jonathan L 7085f04573 [OpenMP] Remove unused cpu_stackoffset member 2021-03-15 16:52:04 -05:00
AndreyChurbanov aaf16b80dd [OpenMP] libomp: eliminate pause from atomic CAS loops
For clang this change is NFC cleanup, because clang
never calls atomic functions from runtime library.

Basically, pause is good in spin-loops waiting for something.
Atomic CAS loops do not wait for anything,
each CAS failure means some other thread progressed.

Performance experiments show that the pause only causes unnecessary slowdown
on CPUs with slow pause instruction, no difference on CPUs with fast pause
instruction, removal of the pause gives lesser binary size which is good.

Differential Revision: https://reviews.llvm.org/D97079
2021-03-09 18:30:08 +03:00
AndreyChurbanov e4492b6f31 [OpenMP] NFC: temporarily disable assertion until the bug with dependences is fixed 2021-03-08 22:18:30 +03:00
Peyton, Jonathan L e2738b3758 [OpenMP] Fix potential integer overflow in dynamic schedule code
Restrict the chunk_size * chunk_num to only occur for valid
chunk_nums and reimplement calculating the limit to avoid overflow.

Differential Revision: https://reviews.llvm.org/D96747
2021-03-08 09:43:05 -06:00
tlwilmar 97d000cfc6 Added API for "masked" construct via two entrypoints: __kmpc_masked,
and __kmpc_end_masked. The "master" construct is deprecated. Changed
proc-bind keyword from "master" to "primary". Use of both master
construct and master as proc-bind keyword is still allowed, but
deprecated.

Remove references to "master" in comments and strings, and replace
with "primary" or "primary thread". Function names and variables were
not touched, nor were references to deprecated master construct. These
can be updated over time. No new code should refer to master.
2021-03-05 09:29:57 -06:00
Hansang Bae b6c2f538b2 [OpenMP] Add allocator support for target memory
This is a preview of allocator support for target memory that depends on the
offload runtime API which allocates memory as described below.

llvm_omp_target_alloc_host(size_t size, int device_num);
-- Returns non-migratable memory owned by host.
-- Memory is accessible by host and device(s).

llvm_omp_target_alloc_shared(size_t size, int device_num);
-- Returns migratable memory owned by host and device.
-- Memory is accessible by host and device.

llvm_omp_target_alloc_device(size_t size, int device_num);
-- Returns memory owned by device.
-- Memory is only accessible by device.

New memory space and predefined allocator names are
-- llvm_omp_target_host_mem_space
-- llvm_omp_target_shared_mem_space
-- llvm_omp_target_device_mem_space
-- llvm_omp_target_host_mem_alloc
-- llvm_omp_target_shared_mem_alloc
-- llvm_omp_target_device_mem_alloc

Differential Revision: https://reviews.llvm.org/D96669
2021-03-02 16:45:12 -06:00
AndreyChurbanov 1df6e58e55 [OpenMP] libomp minor cleanup
Cleanup changes:
- check value read from file;
- remove dead code;
- make unsigned variable to read hexadecimal number to;
- add debug assertion to check ref count.

Differential Revision: https://reviews.llvm.org/D96893
2021-02-26 00:44:51 +03:00
AndreyChurbanov 4932101177 [OpenMP] libomp: fix ittnotify stack stitching for teams construct
Stitching id could be overridden causing reference of destroyed object
when number of teams is 1. The patch separates stitching id store
location for teams and parallel nested in teams.

Differential Revision: https://reviews.llvm.org/D96562
2021-02-26 00:23:24 +03:00
Peyton, Jonathan L d12ae7db99 [OpenMP] Fix accidental addition of use omp_lib_kinds
Fortran header accidentally had use omp_lib_kinds added inside a
subroutine and function. This patch removes the lines.
2021-02-25 12:49:56 -06:00
Shilei Tian e5da63d5a9 [OpenMP] Fixed a crash when offloading to x86_64 with target nowait
PR#49334 reports a crash when offloading to x86_64 with `target nowait`,
which is caused by referencing a nullptr. The root cause of the issue is, when
pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its
shadow gtid, which is wrong.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97329
2021-02-24 12:37:30 -05:00
Peyton, Jonathan L 56223b1e91 [OpenMP] Help static loop code avoid over/underflow
This code alleviates some pathological loop parameters (lower,
upper, stride) within calculations involved in the static loop code.  It
bounds the chunk size to the trip count if it is greater than the trip
count and also minimizes problematic code for when trip count < nth.

Differential Revision: https://reviews.llvm.org/D96426
2021-02-22 13:22:01 -06:00
Peyton, Jonathan L 1b968467c0 [OpenMP] Remove shutdown attempt on Windows process detach
Only attempt shutdown if lpReserved is NULL. The Windows documentation
states:
When handling DLL_PROCESS_DETACH, a DLL should free resources such as
heap memory only if the DLL is being unloaded dynamically (the
lpReserved parameter is NULL). If the process is terminating (the
lpReserved parameter is non-NULL), all threads in the process except the
current thread either have exited already or have been explicitly
terminated by a call to the ExitProcess function, which might leave some
process resources such as heaps in an inconsistent state. In this case,
it is not safe for the DLL to clean up the resources. Instead, the DLL
should allow the operating system to reclaim the memory.

Differential Revision: https://reviews.llvm.org/D96750
2021-02-22 13:15:06 -06:00
Peyton, Jonathan L 8c73be9d86 [OpenMP] Limit number of dispatch buffers
This patch limits the number of dispatch buffers (used for
loop worksharing construct) to between 1 and 4096.

Differential Revision: https://reviews.llvm.org/D96749
2021-02-22 13:14:28 -06:00
Peyton, Jonathan L 55dff8b2e4 [OpenMP] Update HWLOC code for die level detection
Differential Revision: https://reviews.llvm.org/D96748
2021-02-22 13:05:55 -06:00
AndreyChurbanov 1611e5473c [OpenMP] libomp: cleanup some resource leaks
Close mutexattr and condattr local objects to eliminate resource leaks.

Differential Revision: https://reviews.llvm.org/D96892
2021-02-20 23:27:37 +03:00
Shilei Tian 309b00a42e [OpenMP][NFC] clang-format the whole openmp project
Same script as D95318. Test files are excluded.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D97088
2021-02-20 12:46:32 -05:00
AndreyChurbanov cf1ddae7e3 [OpenMP][NFC] replaced 'dependencies' with 'dependences' in comments and debug prints 2021-02-18 00:38:18 +03:00
Martin Storsjö 16428a8d91 [OpenMP] Avoid warnings about unused static functions on windows
Add ifdefs around one function that only is used in unix build
configurations.

Add a void cast for a windows specific function that currently is
unused but may be intended to be used at some point.

Differential Revision: https://reviews.llvm.org/D96584
2021-02-12 21:55:31 +02:00
Martin Storsjö b388c84c09 [OpenMP] Remove two entirely unused variables
Differential Revision: https://reviews.llvm.org/D96583
2021-02-12 21:55:31 +02:00
Martin Storsjö b3d84790fa [OpenMP] Add void casts to silence unused variable warnings
These variables are used only in certain build configurations,
or marked with a todo comment indicating that they should be
used/checked/reported.

Differential Revision: https://reviews.llvm.org/D96582
2021-02-12 21:55:31 +02:00
Martin Storsjö 3f9519b768 [OpenMP] Only use #pragma comment(lib, ...) in MSVC build configurations
MinGW build configurations don't support this pragma (unless
compiling with clang, with -fms-extensions, and linking with
lld), and at least clang warns about it.

This library does end up linked by the cmake files anyway (as
long as the check works properly).

Differential Revision: https://reviews.llvm.org/D96581
2021-02-12 21:55:31 +02:00
AndreyChurbanov 838dcdb5fc [OpenMP] libomp: minor changes to improve library performance
Three minor changes in this patch:
- added UNLIKELY hint to few rarely executed branches;
- replaced couple of run time checks with debug assertions;
- moved check of presence of ittnotify tool from inside the function call.

Differential Revision: https://reviews.llvm.org/D95816
2021-02-12 00:43:13 +03:00
Hansang Bae ffb21e7f05 [OpenMP] Enable omp_get_num_devices() on Windows
This patch enables omp_get_num_devices() and omp_get_initial_device() on
Windows by providing an alternative to dlsym on Windows, and proposes to
add a new libomptarget entry, __tgt_get_num_devices().

Differential Revision: https://reviews.llvm.org/D96182
2021-02-11 14:53:48 -06:00
Nawrin Sultana 4692bb4a8a [OpenMP] Add lower and upper bound in num_teams clause
This patch adds lower-bound and upper-bound to num_teams clause
according to OpenMP 5.1 specification. The initial number of teams
created is implementation defined, but it will be greater than or
equal to lower-bound and less than or equal to upper-bound. If
num_teams clause is not specified, the number of teams created is
implementation defined, but it will be greater or equal to 1.

Differential Revision: https://reviews.llvm.org/D95820
2021-02-10 13:58:50 -06:00
Shilei Tian 3c31b78455 [OpenMP] Fixed an issue that taskwait doesn't work on detachable task
D77609 mistakenly changed the bebavior of task waiting on detachable task that a detachable task is not waited, based on https://lists.llvm.org/pipermail/openmp-dev/2021-February/003836.html. This patch fixed it. Thank Raúl for the report.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95798
2021-02-03 13:12:43 -05:00
Peyton, Jonathan L ffca74b8b8 [OpenMP] Fix sign comparison warnings from GCC
New affinity patch introduced legitimate sign-compare warnings that
clang doesn't report but GCC-10 does. This removes the warnings by
changing two variables types to unsigned.

Differential Revision: https://reviews.llvm.org/D95818
2021-02-02 10:52:16 -06:00
AndreyChurbanov d7b12004bd [OpenMP] libomp: implement nteams-var and teams-thread-limit-var ICVs
The change includes OMP_NUM_TEAMS, OMP_TEAMS_THREAD_LIMIT env variables,
omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit,
omp_get_teams_thread_limit routines.

Differential Revision: https://reviews.llvm.org/D95003
2021-02-01 22:54:11 +03:00
Jonathan Peyton 67773681c0 [OpenMP] Add environment variable to force monotonic dynamic scheduling
This patch introduces a new environment variable to force monotonic
behavior for users that absolutely need it.  This is in anticipation
of 5.0 change that uses non-monotonic behavior for dynamic scheduling
by default. Fixes for that and the actual switch are coming soon.

Differential Revision: https://reviews.llvm.org/D95263
2021-01-29 12:23:27 -06:00
AndreyChurbanov 7f5ad0e071 [OpenMP] libomp: fix build by cl with vs2019
Replace VLA with dynamic allocation using alloca().
This fixes https://bugs.llvm.org/show_bug.cgi?id=48919.

Differential Revision: https://reviews.llvm.org/D95627
2021-01-29 13:16:41 +03:00
Shilei Tian c571b16834 [OpenMP] Disabled profiling in `libomp` by default to unblock link errors
Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.

This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95585
2021-01-28 07:24:32 -05:00
Peyton, Jonathan L 8e67134364 [OpenMP] Fix misleading warning for OMP_PLACES
When OMP_PLACES contains an invalid value, the warning informs the user
that the fallback is OMP_PLACES=threads, but the actual internal setting
is OMP_PLACES=cores and is detected as such with KMP_SETTINGS=1.
This patch informs the user that OMP_PLACES=cores is being used instead
of OMP_PLACES=threads.

Differential Revision: https://reviews.llvm.org/D95170
2021-01-27 14:27:24 -06:00
Peyton, Jonathan L 598c590b3c [OpenMP] Add cpuid leaf 1f topology discovery
This patch adds the new algorithm for topology discovery using cpuid
leaf 1f.  Only the new die level is detected and integrated into the
current affinity mechanisms including KMP_AFFINITY (granularity level
and compact/scatter algorithm), OMP_PLACES=dies, and KMP_HW_SUBSET.

Differential Revision: https://reviews.llvm.org/D95157
2021-01-27 14:27:23 -06:00
Peyton, Jonathan L 9f87c6b47d [OpenMP] Fix HWLOC topology detection for 2.0.x
HWLOC 2.0 has numa nodes as separate children and are not in the main
parent/child topology tree anymore.  This change takes this into
account.  The main topology detection loop in the create_hwloc_map()
routine starts at a hardware thread within the initial affinity mask and
goes up the topology tree setting the socket/core/thread labels
correctly.

This change also introduces some of the more generic changes that the
future kmp_topology_t structure will take advantage of including a
generic ratio & count array (finding all ratios of topology layers like
threads/core cores/socket and finding all counts of each topology
layer), generic radix1 reduction step, generic uniformity check, and
generic printing of topology (en_US.txt)

Differential Revision: https://reviews.llvm.org/D95156
2021-01-27 14:27:23 -06:00
Giorgis Georgakoudis bb40e67318 [OpenMP] Fix building using LLVM_ENABLE_RUNTIMES
Fix when time profiling is enabled.

Related to: D94855

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95398
2021-01-27 06:43:57 -08:00
AndreyChurbanov 498c4b6fc4 [OpenMP] libomp: fix build by clang-cl with vs2019
Problem reported by Joseph Shen <joseph.smeng@gmail.com>.
The patch changes *(&<atomic-var>) to (&<atomic-var>)->load().

Differential Revision: https://reviews.llvm.org/D95485
2021-01-27 12:18:15 +03:00
Nawrin Sultana 927af4b3c5 [OpenMP] Modify OMP_ALLOCATOR environment variable
This patch sets the def-allocator-var ICV based on the environment variables
provided in OMP_ALLOCATOR. Previously, only allowed value for OMP_ALLOCATOR
was a predefined memory allocator. OpenMP 5.1 specification allows predefined
memory allocator, predefined mem space, or predefined mem space with traits in
OMP_ALLOCATOR. If an allocator can not be created using the provided environment
variables, the def-allocator-var is set to omp_default_mem_alloc.

Differential Revision: https://reviews.llvm.org/D94985
2021-01-26 18:27:39 -06:00
Shilei Tian 9d64275ae0 [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-25 22:16:17 -05:00
Hansang Bae 480cbed31e [OpenMP] Remove unnecessary pointer checks in a few locations
Also, return NULL from unsuccessful OMPT function lookup.

Differential Revision: https://reviews.llvm.org/D95277
2021-01-22 19:18:50 -06:00
Joseph Schuchart edbcc17b7a [OpenMP] libomp: properly initialize buckets in __kmp_dephash_extend
The buckets are initialized in __kmp_dephash_create but when they are extended
the memory is allocated but not NULL'd, potentially leaving some buckets
uninitialized after all entries have been copied into the new allocation.
This commit makes sure the buckets are properly initialized with NULL before
copying the entries.

Differential Revision: https://reviews.llvm.org/D95167
2021-01-22 20:29:46 +03:00
Giorgis Georgakoudis 6b7645dd31 [OpenMP] Add time profiling support in libomp
Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94855
2021-01-21 09:15:14 -08:00
Hansang Bae 2d911f7c72 [OpenMP] Fix atomic entries for captured logical operation
Added missing code for the captured atomic operation.

Differential Revision: https://reviews.llvm.org/D94848
2021-01-19 09:59:28 -06:00
AndreyChurbanov a60bc55c69 [OpenMP] libomp: cleanup parsing of OMP_ALLOCATOR env variable.
Differential Revision: https://reviews.llvm.org/D94932
2021-01-19 16:21:22 +03:00
Shilei Tian 9bf843bdc8 Revert "[OpenMP] Added the support for hidden helper task in RTL"
This reverts commit ed939f853d.
2021-01-18 06:57:52 -05:00
Shilei Tian ed939f853d [OpenMP] Added the support for hidden helper task in RTL
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks.  We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D77609
2021-01-16 14:13:35 -05:00
Terry Wilmarth 4fe17ada55 [OpenMP] Fix hierarchical barrier
Hierarchical barrier is an experimental barrier algorithm that uses aspects
of machine hierarchy to define the barrier tree structure. This patch fixes
offset calculation in hierarchical barrier. The offset is used to store info
on a flag about sleeping threads waiting on a location stored in the flag.
This commit also fixes a potential deadlock in hierarchical barrier when
using infinite blocktime by adjusting the offset value of leaf kids so that
it matches the value of leaf state. It also adds testing of default barriers
with infinite blocktime, and also tests hierarchical barrier algorithm with
both default and infinite blocktime.

Patch by Terry Wilmarth and Nawrin Sultana.

Differential Revision: https://reviews.llvm.org/D94241
2021-01-13 10:22:57 -06:00
Hansang Bae bba3a82b56 [OpenMP] Use persistent memory for omp_large_cap_mem
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.

https://pmem.io/2020/01/20/memkind-dax-kmem.html

Differential Revision: https://reviews.llvm.org/D94353
2021-01-12 20:35:27 -06:00
Hansang Bae 6f0f022038 [OpenMP] Update allocator trait key/value definitions
Use new definitions introduced in 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94277
2021-01-12 20:09:45 -06:00
Shilei Tian 676c7cb0c0 [OpenMP] Added the support for cache line size 256 for A64FX
Fugaku supercomputer is built with the Fujitsu A64FX microprocessor, whose cache line is 256. In current libomp, we only have cache line size 128 for PPC64 and otherwise 64. This patch added the support of cache line 256 for A64FX. It's worth noting that although A64FX is a variant of AArch64, this property is not shared. As a result, in light of UCX source code (392443ab92/src/ucs/arch/aarch64/cpu.c (L17)), we can only determine by checking whether the CPU is FUJITSU A64FX.

Reviewed By: jdoerfert, Hahnfeld

Differential Revision: https://reviews.llvm.org/D93169
2021-01-09 11:58:47 -05:00
Hansang Bae fb1c528526 [OpenMP] Use c_int/c_size_t in Fortran target memory routine interface
The Fortran interface is now in line with 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94042
2021-01-06 16:28:30 -06:00
Hansang Bae 82a29a62ab [OpenMP] Add definition/interface for target memory routines
The change includes new routines introduced in 5.1 and Fortran
interface.

Differential Revision: https://reviews.llvm.org/D93505
2021-01-04 08:12:57 -06:00
Terry Wilmarth 6b316febb4 [OpenMP] libomp: Handle implicit conversion warnings
This patch partially prepares the runtime source code to be built with
-Wconversion, which should trigger warnings if any implicit conversions
can possibly change a value. For builds done with icc or gcc, all such
warnings are handled in this patch. clang gives a much longer list of
warnings, particularly for sign conversions, which the other compilers
don't report. The -Wconversion flag is commented into cmake files, but
I'm not going to turn it on. If someone thinks it is important, and wants
to fix all the clang warnings, they are welcome to.

Types of changes made here involve either improving the consistency of types
used so that no conversion is needed, or else performing careful explicit
conversions, when we're sure a problem won't arise.

Patch is a combination of changes by Terry Wilmarth and Johnny Peyton.

Differential Revision: https://reviews.llvm.org/D92942
2020-12-31 00:39:57 +03:00
Hansang Bae e1fd202489 [OpenMP] Add definitions for 5.1 interop to omp.h 2020-12-17 13:03:59 -06:00
Peyton, Jonathan L 5aafdd7b88 [OpenMP] Introduce new file wrapper class for runtime
Introduce new kmp_safe_raii_file_t class with RAII semantics for file
open/close. It is essentially a wrapper around the C-style FILE* object.
This also unifies the way we error report if a file can't be opened.

Differential Revision: https://reviews.llvm.org/D92604
2020-12-15 14:46:30 -06:00
Hansang Bae 171ca93c54 [OpenMP] Initialize runtime in the forked child process
This patch enables serial initialization in the forked child process
to fix unstable runtime behavior when used with Python-based AI tools.

Differential Revision: https://reviews.llvm.org/D93230
2020-12-15 07:29:28 -06:00
Hansang Bae c3b5009aa7 [OpenMP] Use RTM lock for OMP lock with synchronization hint
This patch introduces a new RTM lock type based on spin lock which is
used for OMP lock with speculative hint on supported architecture.

Differential Revision: https://reviews.llvm.org/D92615
2020-12-09 19:14:53 -06:00
Nawrin Sultana 540007b427 [OpenMP] Add strict mode in num_tasks and grainsize
This patch adds new API __kmpc_taskloop_5 to accomadate strict
modifier (introduced in OpenMP 5.1) in num_tasks and grainsize
clause.

Differential Revision: https://reviews.llvm.org/D92352
2020-12-09 16:46:30 -06:00
Peyton, Jonathan L fe3b244ef7 [OpenMP] Fix norespect affinity bug for Windows
KMP_AFFINITY=norespect was triggering an error because the underlying
process affinity mask was not updated to include the entire machine.
The Windows documentation states that the thread affinities must be
subsets of the process affinity. This patch also moves the printing
(for KMP_AFFINITY=verbose) of whether the initial mask was respected
out of each topology detection function and to one location where the
initial affinity mask is read.

Differential Revision: https://reviews.llvm.org/D92587
2020-12-09 14:32:48 -06:00
Peyton, Jonathan L 9b7d6a6bff [OpenMP] Fix too long name for shm segment on macOS
Remove the user id component to the shm segment name and just use
the pid like before.

Differential Revision: https://reviews.llvm.org/D92660
2020-12-09 14:31:15 -06:00
AndreyChurbanov fff1abc406 [OpenMP] NFC: comment adjusted 2020-12-07 19:50:14 +03:00
AndreyChurbanov 22558c8501 [OpenMP] libomp: Fix possible NULL dereferences
Check pointer returned by strchr, as it can be NULL in case of broken
format of input string. Introduced new function __kmp_str_loc_numbers
for fast parsing of numbers only in the location string.
Also made some cleanup of __kmp_str_loc_init declaration and usage:
- changed type of init_fname parameter to bool;
- changed input from true to false in places where fname is not used.

Differential Revision: https://reviews.llvm.org/D90962
2020-12-07 19:09:07 +03:00
Joachim Protze a148216b31 [OpenMP][OMPT] Fix OMPT return address guard for gomp interface
D91692 missed various locations in kmp_gsupport, where the scope for
OMPT_STORE_RETURN_ADDRESS is too narrow, i.e. the scope ends before the OMPT
callback is called in some nested function.

This patch fixes the scoping issue, so that all OMPT tests pass, when the
tests are built with gcc.

Differential Revision: https://reviews.llvm.org/D92121
2020-12-05 19:06:28 +01:00
Hansang Bae c4a22224d9 [OpenMP] Add __kmpc_omp_target_task_alloc to dllexport
This patch enables use of the entry on Windows.

Differential Revision: https://reviews.llvm.org/D92618
2020-12-04 08:11:14 -06:00
Terry Wilmarth e0665a9050 [OpenMP] Add support for Intel's umonitor/umwait
These changes add support for Intel's umonitor/umwait usage in wait
code, for architectures that support those intrinsic functions. Usage of
umonitor/umwait is off by default, but can be turned on by setting the
KMP_USER_LEVEL_MWAIT environment variable.

Differential Revision: https://reviews.llvm.org/D91189
2020-12-01 14:07:46 -06:00
AndreyChurbanov 6bf84871e9 [OpenMP] libomp: add UNLIKELY hints to rarely executed branches
Added UNLIKELY hint to one-time or rarely executed branches.
This improves performance of the library on some tasking benchmarks.

Differential Revision: https://reviews.llvm.org/D92322
2020-12-01 16:53:21 +03:00
Todd Erdner 9615890db5 [OpenMP] libomp: change shm name to include UID, call unregister_lib on SIGTERM
With the change to using shared memory, there were a few problems that need to be fixed.
- The previous filename that was used for SHM only used process id. Given that process is
  usually based on 16bit number, this was causing some conflicts on machines. Thus we add
  UID to the name to prevent this.
- It appears under some conditions (SIGTERM, etc) the shared memory files were not getting
  cleaned up. Added a call to clean up the shm files under those conditions. For this user
  needs to set envirable KMP_HANDLE_SIGNALS to true.

Patch by Erdner, Todd <todd.erdner@intel.com>

Differential Revision: https://reviews.llvm.org/D91869
2020-12-01 00:40:47 +03:00
AndreyChurbanov f6f28b44ad [OpenMP] libomp: fix mutexinoutset dependence for proxy tasks
Once __kmp_task_finish is not executed for proxy tasks,
move mutexinoutset dependency code to __kmp_release_deps
which is executed for all task kinds.

Differential Revision: https://reviews.llvm.org/D92326
2020-12-01 00:13:31 +03:00
Martin Storsjö 6b429668de [OpenMP][OMPT] Fix building with OMPT disabled after 6d3b81664a 2020-11-26 10:09:32 +02:00
AndreyChurbanov 9e3e332d27 [OpenMP] libomp: fix non-X86, non-AARCH64 builds
Commit https://reviews.llvm.org/rG7b5254223acbf2ef9cd278070c5a84ab278d7e5f
broke the build for some architectures, because macro KMP_PREFIX_UNDERSCORE
was defined only for x86, x86_64 and aarch64. This patch defines it for other
architectures (as a no-op).

Differential Revision: https://reviews.llvm.org/D92027
2020-11-25 20:40:23 +03:00
Joachim Protze 6d3b81664a [OpenMP][OMPT] Introduce a guard to handle OMPT return address
This is an alternative approach to address inconsistencies pointed out in: D90078
This patch makes sure that the return address is reset, when leaving the scope.
In some cases, I had to move the macro out of an if-statement to have it in the
right scope, in some cases I added an additional block to restrict the scope.

This patch does not handle inconsistencies, which might occur if the return
address is still set when we call into the application.

Test case (repeated_calls.c) provided by @hbae

Differential Revision: https://reviews.llvm.org/D91692
2020-11-25 18:17:44 +01:00
Isabel Thärigen b281a05dac [OpenMP][OMPT] Implement verbose tool loading
OpenMP 5.1 introduces the new env variable
OMP_TOOL_VERBOSE_INIT=(disabled|stdout|stderr|<filename>) to enable verbose
loading and initialization of OMPT tools.
This env variable helps to understand the cause when loading of a tool fails
(e.g., undefined symbols or dependency not in LD_LIBRARY_PATH)
Output of OMP_TOOL_VERBOSE_INIT is added for OMP_DISPLAY_ENV

Tests for this patch are integrated into the different existing tool loading
tests, making these tests more verbose. An Archer specific verbose test is
integrated into an existing Archer test.

Patch prepared by: Isabel Thärigen

Differential Revision: https://reviews.llvm.org/D91464
2020-11-25 18:17:44 +01:00
AndreyChurbanov 7b5254223a [OpenMP] fix asm code for for arm64 (AARCH64) for Darwin/macOS
Adjusted external reference for Darwin/AARCH64 link compatibility.
Made size directive conditional only if __ELF__ defined.

Patch by Michael_Pique <mpique@icloud.com>

Differential Revision: https://reviews.llvm.org/D88252
2020-11-24 13:08:24 +03:00
AndreyChurbanov 5644f734d6 Revert "[OpenMP] Add support for Intel's umonitor/umwait"
This reverts commit 9cfad5f9c5.
2020-11-20 12:16:34 +03:00
AndreyChurbanov 9cfad5f9c5 [OpenMP] Add support for Intel's umonitor/umwait
Patch by tlwilmar (Terry Wilmarth)

Differential Revision: https://reviews.llvm.org/D91189
2020-11-19 22:04:21 +03:00
Hansang Bae 44a11c342c [OpenMP] Use explicit type casting in kmp_atomic.cpp
Differential Revision: https://reviews.llvm.org/D91105
2020-11-17 14:31:13 -06:00
Nawrin Sultana 5439db05e7 [OpenMP] Add omp_realloc implementation
This patch adds omp_realloc function implementation according to
OpenMP 5.1 specification.

Differential Revision: https://reviews.llvm.org/D90971
2020-11-17 13:43:00 -06:00
Peyton, Jonathan L 8647c669a4 [OpenMP] NFC: remove tabs in message catalog file 2020-11-17 10:15:04 -06:00
Peyton, Jonathan L 0454154efd [OpenMP][stats] reset serial state when re-entering serial region
Differential Revision: https://reviews.llvm.org/D90867
2020-11-17 10:09:56 -06:00
Nawrin Sultana 938f1b8581 [OpenMP] Add omp_calloc implementation
This patch adds omp_calloc implementation according to OpenMP 5.1
specification.

Differential Revision: https://reviews.llvm.org/D90967
2020-11-13 14:35:46 -06:00
Shilei Tian 24d0ef0f50 [OpenMP] Fixed a bug when displaying affinity
Currently the affinity format string has initial value. When users set
the format via OMP_AFFINITY_FORMAT, it will overwrite the format string. However,
when copying the format, the tailing null is missing. As a result, if the user
format string is shorter than default value, the remaining part in the default
value still makes effort. This bug is not exposed because the test case doesn't
check the end of a string. It only checks whether given output "contains" the
check string.

Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D91309
2020-11-12 22:27:32 -05:00
Peyton, Jonathan L dd8723d348 [OpenMP] Fix shutdown hang/race bug
The deadlock/race happens when primary thread gets initz lock and tries to join
the worker thread which waits for the same lock in TLS key destructor.
The patch removes the lock and the code of setting TLS value which needed
the lock. Also removed setting TLS from __kmp_unregister_root_current_thread.

Differential Revision: https://reviews.llvm.org/D90647
2020-11-11 13:47:23 -06:00
Joachim Protze 6213ed062b [OpenMP][OMPT] Update the omp-tools header file to reflect 5.1 changes
This doesn't add functionality, but just adds the new types and renames the
master callback to masked callback.

Differential Revision: https://reviews.llvm.org/D90752
2020-11-11 20:13:21 +01:00