This patch adds the `llvm_omp_target_dynamic_shared_alloc` function to
the `omp.h` header file so users can access it by default. Also changed
the name to keep it consistent with the other target allocators. Added
some documentation so users know how to use it. Didn't add the interface
for Fortran since there's no way to test it right now.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D123246
The target allocators have been supported for NVPTX offloading for
awhile. The tests should use the allocators instead of calling the
functions manually. Also the comments indicating these being a preview
should be removed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D123242
This change adds support for ompt_callback_dispatch with the new
dispatch chunk type introduced in 5.2. Definitions of the new
ompt_work_loop types were also added in the header file.
Differential Revision: https://reviews.llvm.org/D122107
Register constraint switched to "=q" which means very specifically (from
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints)
> Any register accessible as rl. In 32-bit mode, a, b, c, and d; in 64-bit
mode, any integer register.
Older gcc versions (8.x and below) were trying to use esi or edi for the
8 bit flag variable, but it wound up displaying this error in the end:
kmp_lock.cpp: In function ‘void __kmp_spin_backoff(kmp_backoff_t*)’:
kmp_lock.cpp:2684:1: error: unsupported size for integer register
Hence the correct restriction is "=q" instead of "=r".
Fixes: https://github.com/llvm/llvm-project/issues/53309
Differential Revision: https://reviews.llvm.org/D120519
Before this patch task priorities were ignored, that was a valid implementation
as the task priority is a hint according to OpenMP specification.
Implemented shared list of sorted (high -> low) task deques one per task
priority value. Tasks execution changed to first check if priority tasks ready
for execution exist, and these tasks executed before others;
otherwise usual tasks execution mechanics work.
Differential Revision: https://reviews.llvm.org/D119676
Essentially removed the "use omp_lib_kinds" statement and replaced it
with import to maintain consistency (and avoid compilation error
in case the omp_lib_kinds.mod file is not accessible) in header file.
The import is required to access entities in host scoping unit.
Differential Revision: https://reviews.llvm.org/D120707
Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI
compiler.
Fixup flag detection and compiler ID detection in CMake. Older CMake's
detect IntelLLVM as Clang.
Fix compiler warnings.
Fixup many of the tests to have non-empty parallel regions as they are
elided by oneAPI compiler.
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads.
Thus number of worker hidden helper threads corresponds to user request,
main thread of helper team excluded as it does not participate in actual work.
This also fixes divide-by-0 issue in the code.
Fixes#48656
Differential Revision: https://reviews.llvm.org/D119586
As @mstorsjo wrote in https://reviews.llvm.org/D117945#inline-1132920 :
> This change seems to have broken one aspect: When doing `ninja
install` I now get a warning saying `Error copying file "libomp.dll" to
"libiomp5md.dll".`, and `libiomp5md.dll` isn't installed.
>
> I believe the reason is that the inline cmake snippet is written to
`runtime/src/cmake_install.cmake` and then executed on install, but on
install, `${CMAKE_INSTALL_BINDIR}` isn't set (as `GNUInstallDirs` isn't
included there). Should this maybe expand `${CMAKE_INSTALL_BINDIR}`
right here instead of deferring it to the install cmake, or what's the
right course of action?
I agree that is the right course of action. We also agreed to restore the `CMAKE_INSTALL_PREFIX` that was there before, too.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D118528
This patch fixes the link error on Windows caused by `interop`
functions.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D118524
This patch fixes the issue that numbers assigned to `interop` functions were already taken in `openmp/runtime/src/dllexports`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D118523
This implements the runtime portion of the interop directive.
It expects the frontend and IRBuilder portions to be in place
for proper execution. It currently works only for GPUs
and has several TODOs that should be addressed going forward.
Reviewed By: RaviNarayanaswamy
Differential Revision: https://reviews.llvm.org/D106674
This patch allows Openmp runtime atomic functions operating on x87 high-precision
to be present only in Openmp runtime for x86 architectures
The functions affected are:
__kmpc_atomic_10
__kmpc_atomic_20
__kmpc_atomic_cmplx10_add
__kmpc_atomic_cmplx10_div
__kmpc_atomic_cmplx10_mul
__kmpc_atomic_cmplx10_sub
__kmpc_atomic_float10_add
__kmpc_atomic_float10_div
__kmpc_atomic_float10_mul
__kmpc_atomic_float10_sub
__kmpc_atomic_float10_add_fp
__kmpc_atomic_float10_div_fp
__kmpc_atomic_float10_mul_fp
__kmpc_atomic_float10_sub_fp
__kmpc_atomic_float10_max
__kmpc_atomic_float10_min
Differential Revision: https://reviews.llvm.org/D117473
Add use of TPAUSE (from WAITPKG) to the runtime for Intel hardware,
with an envirable to turn it on in a particular C-state. Always uses
TPAUSE if it is selected and enabled by Intel hardware and presence of
WAITPKG, and if not, falls back to old way of checking
__kmp_use_yield, etc.
Differential Revision: https://reviews.llvm.org/D115758
This is the original patch in my GNUInstallDirs series, now last to merge as the final piece!
It arose as a new draft of D28234. I initially did the unorthodox thing of pushing to that when I wasn't the original author, but since I ended up
- Using `GNUInstallDirs`, rather than mimicking it, as the original author was hesitant to do but others requested.
- Converting all the packages, not just LLVM, effecting many more projects than LLVM itself.
I figured it was time to make a new revision.
I have used this patch series (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS), which was merged last spring (2021). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful.
---
As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. Variables like `COMPILER_RT_INSTALL_PATH` have already been dealt with. Variables like `LLVM_LIBDIR_SUFFIX` however, will require further work, so that we may use `CMAKE_INSTALL_LIBDIR`.
These remaining items will be addressed in further patches. What is here is now rote and so we should get it out of the way before dealing more intricately with the remainder.
Reviewed By: #libunwind, #libc, #libc_abi, compnerd
Differential Revision: https://reviews.llvm.org/D99484
This is the original patch in my GNUInstallDirs series, now last to merge as the final piece!
It arose as a new draft of D28234. I initially did the unorthodox thing of pushing to that when I wasn't the original author, but since I ended up
- Using `GNUInstallDirs`, rather than mimicking it, as the original author was hesitant to do but others requested.
- Converting all the packages, not just LLVM, effecting many more projects than LLVM itself.
I figured it was time to make a new revision.
I have used this patch series (and many back-ports) as the basis of https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS), which was merged last spring (2021). It looked like people were generally on board in D28234, but I make note of this here in case extra motivation is useful.
---
As pointed out in the original issue, a central tension is that LLVM already has some partial support for these sorts of things. Variables like `COMPILER_RT_INSTALL_PATH` have already been dealt with. Variables like `LLVM_LIBDIR_SUFFIX` however, will require further work, so that we may use `CMAKE_INSTALL_LIBDIR`.
These remaining items will be addressed in further patches. What is here is now rote and so we should get it out of the way before dealing more intricately with the remainder.
Reviewed By: #libunwind, #libc, #libc_abi, compnerd
Differential Revision: https://reviews.llvm.org/D99484
In most cases, hidden helper task behave similar as detached tasks. That means,
for example, if we have to wait for detached tasks, we have to do the same thing
for hidden helper tasks as well. This patch adds the missing condition for hidden
helper task accordingly along with detached task.
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D107316
This patch allows the user to request all resources of a particular
layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified
so the number of units requested is optional or can be replaced with an
'*' character.
e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3
e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores
e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per
each socket and 1 thread per core.
Differential Revision: https://reviews.llvm.org/D115826
Just defensive CMake-ing. I pulled this from D115544 and D99484 which
are blocked on some lldb CI failures I don't yet understand. Hoping to land
something smaller in the meantime.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D115566
This reverts commit 492de35df4.
I tried to apply John's changes in 8d897ec915 that were expected to
fix his patch but that didn't work unfortunately.
Reverting this again to fix the macOS bots and leave him more time to
investigate the issue.
This reverts commit 797b50d4be.
See the original D99484. @mib who noticed the original problem could not longer
reproduce it, after I tried and also failed. We are threfore hoping it went
away on its own!
Reviewed By: mib
Differential Revision: https://reviews.llvm.org/D115544
Allow filtering of resources based on core attributes. There are two new
attributes added:
1) Core Type (intel_atom, intel_core)
2) Core Efficiency (integer) where the higher the efficiency, the more
performant the core
On hybrid architectures , e.g., Alder Lake, users can specify
KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom
and first four Big cores. The can also use the efficiency syntax. e.g.,
KMP_HW_SUBSET=2c:eff0,2c:eff1
Differential Revision: https://reviews.llvm.org/D114901
Regardless that specification requires thread_limit to be positive,
it is better to warn user instead of crash in case the value is negative.
Differential Revision: https://reviews.llvm.org/D115340
Teach the HWLOC topology method how to detect Atom and Core
types so hybrid CPUs are properly detected and represented when using
the HWLOC topology method.
Differential Revision: https://reviews.llvm.org/D112270
The current implementation of Windows Processor Groups has
a separate topology method to handle them. This patch deprecates
that specific method and uses the regular CPUID topology
method by default and inserts the Windows Processor Group objects
in the topology manually.
Notes:
* The preference for processor groups is lowered to a value less than
socket so that the user will see sockets in the KMP_AFFINITY=verbose
output instead of processor groups when sockets=processor groups.
* The topology's capacity is modified to handle additional topology layers
without the need for reallocation.
* If a user asks for a granularity setting that is "above" the processor
group layer, then the granularity is adjusted "down" to the processor
group since this is the coarsest layer available for threads.
Differential Revision: https://reviews.llvm.org/D112273
If some CPUs are offline, then make sure they are not included in the
fullMask even if norespect is given to KMP_AFFINITY.
Differential Revision: https://reviews.llvm.org/D112274
Remove restriction forcing users to specify the KMP_HW_SUBSET value in
topology order. This patch sorts the user KMP_HW_SUBSET value before
trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c
Differential Revision: https://reviews.llvm.org/D112027
It is better to set all barrier patterns to use "dist" when at least
one environment variable specifies "dist". Otherwise if only one
environment is set to "dist" and others left blank inadvertently,
it would result in mixing dist barrier with default hyper barrier
pattern.
Differential Revision: https://reviews.llvm.org/D112597
This is a new draft of D28234. I previously did the unorthodox thing of
pushing to it when I wasn't the original author, but since this version
- Uses `GNUInstallDirs`, rather than mimics it, as the original author
was hesitant to do but others requested.
- Is much broader, effecting many more projects than LLVM itself.
I figured it was time to make a new revision.
I am using this patch (and many back-ports) as the basis of
https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS). It
looked like people were generally on board in D28234, but I make note of
this here in case extra motivation is useful.
---
As pointed out in the original issue, a central tension is that LLVM
already has some partial support for these sorts of things. For example
`LLVM_LIBDIR_SUFFIX`, or `COMPILER_RT_INSTALL_PATH`. Because it's not
quite clear yet what to do about those, we are holding off on changing
libdirs and `compiler-rt`. for this initial PR.
---
On the advice of @lebedev.ri, I am splitting this up a bit per
subproject, starting with LLVM. To allow it to be more easily reviewed. This and the subsequent patch must be landed together, as this will not build alone. But the rest can be landed on their own.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D100810
According to dlsym description, the value of symbol could be NULL,
and there is no error in this case. Thus dlerror will also return NULL in
this case. We need to check the value returned by dlerror before printing it.
Differential Revision: https://reviews.llvm.org/D112174
Declarations of 5.1 atomic entries were added under
"#if KMP_ARCH_X86 || KMP_ARCH_X86_64" in kmp_atomic.h,
but definitions of the functions missed architecture guard in kmp_atomic.cpp.
As a result mangled symbols were available on non-x86 architecture.
The patch eliminates these unexpected symbols from the library.
Differential Revision: https://reviews.llvm.org/D112261
__ompt_get_task_info_internal function is adapted to support thread_num
determination during the execution of multiple nested serialized
parallel regions enclosed by a regular parallel region.
Consider the following program that contains parallel region R1 executed
by two threads. Let the worker thread T of region R1 executes serialized
parallel regions R2 that encloses another serialized parallel region R3.
Note that the thread T is the master thread of both R2 and R3 regions.
Assume that __ompt_get_task_info_internal function is called with the
argument "ancestor_level == 1" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside
the team of region R2, whose implicit task is at level 1 inside the
hierarchy of active tasks. Since the thread T is the master thread of
region R2, one should expected that "thread_num" takes a value 0.
After the while loop finishes, the following stands: "lwt != NULL",
"prev_lwt == NULL", "prev_team" represents the team information about
the innermost serialized parallel region R3. This results in executing
the assignment "thread_num = prev_team->t.t_master_tid". Note that
"prev_team->t.t_master_tid" was initialized at the moment of
R2’s creation and represents the "thread_num" of the thread T inside
the region R1 which encloses R2. Since the thread T is the worker thread
of the region R1, "the thread_num" takes value 1, which is a contradiction.
This patch proposes to use "lwt" instead of "prev_lwt" when determining
the "thread_num". If "lwt" exists, the task at the requested level belongs
to the serialized parallel region. Since the serialized parallel region
is executed by one thread only, the "thread_num" takes value 0.
Similarly, assume that __ompt_get_task_info_internal function is called
with the argument "ancestor_level == 2" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside the
team of region R1. Since the thread is the worker inside the region R1,
one should expected that "thread_num" takes value 1. After the loop finishes,
the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents
the team information about the innermost serialized parallel region R3.
This leads to execution of the assignment "thread_num = 0", which causes
a contradiction.
Ignoring the "prev_lwt" leads to executing the assignment
"thread_num = prev_team->t.t_master_tid" instead. From the previous explanation,
it is obvious that "thread_num" takes value 1.
Note that the "prev_lwt" variable is marked as unnecessary and thus removed.
This patch introduces the test case which represents the OpenMP program
described earlier in the summary.
Differential Revision: https://reviews.llvm.org/D110699
__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.
The old implementation of __kmpc_fork_call didn’t clear the enter_frame of
active task.
Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.
The first program: Let R1 be the serialized parallel region that encloses
another serialized parallel region R2. Assume that thread that executes R2 is
going to create a new serialized parallel region R3 by executing
__kmpc_fork_call. This thread is responsible to set enter_frame of R2's
implicit task. Note that the information about R2's implicit task is present
inside master_th->th.th_current_task at this moment, while lwt represents the
information about R1's implicit task. The old implementation uses lwt and
resets enter_frame of R1's implicit task instead of R2's implicit task. The
new implementation uses master_th->th.th_current_task instead.
The second program: Consider the OpenMP program that contains parallel region
R1 which encloses an explicit task T. Assume that thread should create another
parallel region R2 during the execution of the T. The __kmpc_fork_call is
responsible to create R2 and set enter frame of T whose information is present
inside the master_th->th.th_current_task.
Old implementation tries to set the frame of
parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit
task of the R1, instead of T.
Differential Revision: https://reviews.llvm.org/D112419
KMP_API_NAME_GOMP_PARALLEL_SECTIONS function was missing the task frame support.
This patch introduced a fix responsible to set properly the exit_frame of
the innermost implicit task that corresponds to the parallel section construct,
as well as the enter_frame of the task that encloses the mentioned implicit task.
This patch also introduced a simple test case sections_serialized.c that contains
serialized parallel section construct and validates whether the mentioned
task frames are set correctly.
Differential Revision: https://reviews.llvm.org/D112205
This patch allows to simplify compiler implementation on "taskwait nowait"
construct. The "taskwait nowait" is semantically equivalent to the empty task.
Instead of creating an empty routine as a task entry, compiler can just send
NULL pointer to the runtime. Then the runtime will make all the work with
dependences and return because of the absent task routine.
Differential Revision: https://reviews.llvm.org/D112015
__ompt_get_task_info_internal is now able to determine the right value of the
“thread_num” argument during the execution of an explicit task.
During the execution of a while loop that iterates over the ancestor tasks
hierarchy, the “prev_team” variable was always set to “team” variable at the
beginning of each loop iteration.
Assume that the program contains a parallel region which encloses an explicit
task executed by the worker thread of the region. Also assume that the tool
inquires the “thread_num” of a worker thread for the implicit task that
corresponds to the region (task at “ancestor_level == 1”) and expects to
receive the value of “thread_num > 0”.
After the loop finishes, both “team” and “prev_team” variables are equal and
point to the team information of the parallel region.
The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to
“team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since
the master thread of the region is the initial master thread of the program.
This leads to a contradiction.
To prevent this, “prev_team” variable is set to “team” variable only at the
time when the loop that has already encountered the implicit task (“taskdata”
variable contains the information about an implicit task) continues iterating
over the implicit task’s ancestors, if any.
After the mentioned loop finishes, the “prev_team” variable might be equal to
NULL. This means that the task at requested “ancestor_level” belongs to the
innermost parallel region, so the “thread_num” will be determined by calling
the “__kmp_get_tid”.
To prove that this patch works, the test case “explicit_task_thread_num.c” is
provided.
It contains the example of the program explained earlier in the summary.
Differential Revision: https://reviews.llvm.org/D110473
Detect, through CPUID.1A, and show user different core types through
KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations
__kmp_is_hybrid_cpu() to know whether running on a hybrid system or not.
Differential Revision: https://reviews.llvm.org/D110435
This patch implements teams affinity on the host.
The default is spread. A user can specify either spread, close, or
primary using KMP_TEAMS_PROC_BIND environment variable. Unlike
OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a
list of values. The values follow the same semantics under the OpenMP
specification for parallel regions except T is the number of teams in
a league instead of the number of threads in a parallel region.
Differential Revision: https://reviews.llvm.org/D109921
Added functions those implement "atomic compare".
Though clang does not use library interfaces to implement OpenMP atomics,
the functions added for consistency.
Also added missed functions for 80-bit floating min/max atomics.
Differential Revision: https://reviews.llvm.org/D110109
Replaced storing of ittnotify domain array index into
location info structure (which is now read-only) with storing of
(location info address + ittnotify domain + team size) into hash map.
Replaced __kmp_itt_barrier_domains and __kmp_itt_imbalance_domains arrays with
__kmp_itt_barrier_domains hash map; __kmp_itt_region_domains and
__kmp_itt_region_team_size arrays with __kmp_itt_region_domains hash map.
Basic functionality did not change (at least tried to not change).
The patch fixes https://bugs.llvm.org/show_bug.cgi?id=48644.
Differential Revision: https://reviews.llvm.org/D111580
The minor code refactorization introduces the TASK_TIED constant inside
kmp_gsupprot.cpp as a replacement for the literal value 1.
The mentioned constant is now used in both kmp_tasking.cpp and
kmp_gsupport.cpp files.
Differential Revision: https://reviews.llvm.org/D110441
The indirect lock table can exhibit a race condition during initializing
and setting/unsetting locks. This occurs if the lock table is
resized by one thread (during an omp_init_lock) and accessed (during an
omp_set|unset_lock) by another thread.
The test runtime/test/lock/omp_init_lock.c test exposed this issue and
will fail if run enough times.
This patch restructures the lock table so pointer/iterator validity is
always kept. Instead of reallocating a single table to a larger size, the
lock table begins preallocated to accommodate 8K locks. Each row of the
table is allocated as needed with each row allowing 1K locks. If the 8K
limit is reached for the initial table, then another table, capable of
holding double the number of locks, is allocated and linked
as the next table. The indices stored in the user's locks take this
linked structure into account when finding the lock within the table.
Differential Revision: https://reviews.llvm.org/D109725
The third-party ittnotify sources updated from https://github.com/intel/ittapi.
Changes applied:
- llvm license aded to all files; initial BSD license saved in LICENSE.txt;
- clang-formatted;
- renamed *.c to *.cpp, similar to what we did with all our sources;
- added #include "kmp_config.h" with definition of INTEL_ITTNOTIFY_PREFIX macro
into ittnotify_static.cpp.
Differential Revision: https://reviews.llvm.org/D109333
GOMP depobjs are represented as a two intptr_t array. The first
element is the base address of the dependency and the second element
is the flag indicating the type the depobj represents.
Differential Revision: https://reviews.llvm.org/D108790
New omp_all_memory task dependence type is implemented.
Library recognizes the new type via either
(dependence_address == NULL && dependence_flag == 0x80)
or
(dependence_address == SIZE_MAX).
A task with new dependence type depends on each preceding task
with any dependence type (kind of a dependence barrier).
Differential Revision: https://reviews.llvm.org/D108574
The new interface only marks begin/end of a scope construct for
corresponding OMPT events, and we can use existing interfaces for
reduction operations.
Differential Revision: https://reviews.llvm.org/D108062
This patch changes the default monotonicity of dynamic schedule from
monotonic to non-monotonic when no modifier is specified.
Differential Revision: https://reviews.llvm.org/D109026
As discussed in D107121, task wait doesn't work when a regular task T depends on
a detached task or a hidden helper task T' in a serialized team. The root cause is,
since the team is serialized, the last task will not be tracked by
`td_incomplete_child_tasks`. When T' is finished, it first releases its
dependences, and then decrements its parent counter. So far so good. For the thread
that is running task wait, if at the moment it is still spinning and trying to
execute tasks, it is fine because it can detect the new task and execute it.
However, if it happends to finish the function `flag.execute_tasks(...)`, it will
be broken because `td_incomplete_child_tasks` is 0 now.
In this patch, we update the rule to track children tasks a little bit. If the
task team encounters a proxy task or a hidden helper task, all following tasks
will be tracked.
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D107496
These changes don't come under OMPD guard as it is a movement of existing code to capture parallel behavior correctly.
"Runtime Entry Points for OMPD" like "ompd_bp_parallel_begin" and "ompd_bp_parallel_begin" should be placed at the correct execution point for the debugging tool to access proper handles/data.
Without the below changes, in certain cases, debugging tool will pick the wrong parallel and task handle.
Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100366
This patch replaces the current implementation, overwrites `gtid` and `thread`,
with `__kmpc_give_task`.
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D106977
KMP_SSCANF only evaluates to sscanf_s within
#if KMP_OS_WINDOWS && KMP_MSVC_COMPAT
so we need to pass the sscanf_s specific parameters within a similar
condition.
Differential Revision: https://reviews.llvm.org/D108196
* Add comment to help ensure new construct data are added in two places
* Check for division by zero in the loop worksharing code
* Check for syntax errors in parrange parsing
Differential Revision: https://reviews.llvm.org/D105929
On Windows, the documentation states that when using sscanf_s,
each %c and %s specifier must also have additional size parameter.
This patch adds the size parameter in the one place where %c is
used.
Differential Revision: https://reviews.llvm.org/D105931
Android provides ashmem/ASharedMemory support on newer releases, which
we can use if requested by openmp users on Android.
Also refactor the preprocessor check for using shared memory to
kmp_config.h.cmake.
Differential Revision: https://reviews.llvm.org/D107181
This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied.
Reviewed By: protze.joachim
Differential Revision: https://reviews.llvm.org/D107121
When loading libomptarget, the init function in libomptarget/src/rtl.cpp
will search for the libomptarget_start_tool function using libdl.
libomptarget_start_tool will pass those OMPT callbacks related to target
constructs to libomptarget
Differential Revision: https://reviews.llvm.org/D99803
Fix for https://bugs.llvm.org/show_bug.cgi?id=49723.
Eliminated references from task dependency hash to node allocated on stack,
thus eliminated accesses to stale memory. So the node now never freed.
Uncommented assertion which triggered when stale memory accessed.
Removed unneeded ref count increment for stack allocated node.
Differential Revision: https://reviews.llvm.org/D106705
Put declarations/definitions of unused variables under corresponding macros
to silence clang build warnings.
Differential Revision: https://reviews.llvm.org/D106608
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.
This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers. There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.
The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns. Thus, to turn it on, the following settings are required:
KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist
Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.
Patch fixed for ITTNotify disabled builds and non-x86 builds
Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com>
Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com>
Differential Revision: https://reviews.llvm.org/D103121
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066.
For detachable tasks, the assumption breaks that the proxy task cannot have
remaining child tasks when the proxy completes.
In stead of increment/decrement the incomplete task count, a high-order bit
is flipped to mark and wait for the incomplete proxy task.
Differential Revision: https://reviews.llvm.org/D101082
Bug 50022 [0] reports target nowait fails in certain case, which is added in this
patch. The root cause of the failure is, when the second task is created, its
parent's `td_incomplete_child_tasks` will not be incremented because there is no
parallel region here thus its team is serialized. Therefore, when the initial
thread is waiting for its unfinished children tasks, it thought there is only
one, the first task, because it is hidden helper task, so it is tracked. The
second task will only be pushed to the queue when the first task is finished.
However, when the first task finishes, it first decrements the counter of its
parent, and then release dependences. Once the counter is decremented, the thread
will move on because its counter is reset, but actually, the second task has not
been executed at all. As a result, since in this case, the main function finishes,
then `libomp` starts to destroy. When the second task is pushed somewhere, all
some of the structures might already have already been destroyed, then anything
could happen.
This patch simply moves `__kmp_release_deps` ahead of decrement of the counter.
In this way, we can make sure that the initial thread is aware of the existence
of another task(s) so it will not move on. In addition, in order to tackle
dependence chain starting with hidden helper thread, when hidden helper task is
encountered, we force the task to release dependences.
Reference:
[0] https://bugs.llvm.org/show_bug.cgi?id=50022
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D106519
In current implementation, if a regular task depends on a hidden helper task,
and when the hidden helper task is releasing its dependences, it directly calls
`__kmp_omp_task`. This could cause a problem that if `__kmp_push_task` returns
`TASK_NOT_PUSHED`, the task will be executed immediately. However, the hidden
helper threads are assumed to only execute hidden helper tasks. This could cause
problems because when calling `__kmp_omp_task`, the encountering gtid, which is
not the real one of the thread, is passed.
This patch uses `__kmp_give_task`, but because it is a static function, a new
wrapper `__kmpc_give_task` is added.
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D106572
Standalone build for OpenMP runtime using GCC is giving -Wcomment
warnings where a backslash newline is encountered in the // style
comment. This switches the // style for /* style to silence the
warnings.
This patch includes a few changes to improve task allocation
performance slightly. These changes are enough to restore performance
drop observed after introducing hidden helper.
Differential Revision: https://reviews.llvm.org/D105715
There is no guarantee that the space allocated in `libname`
is enough to accomodate the whole `dl_info.dli_fname`,
because it could e.g. have an suffix - `.5`,
and that highlights another problem - what it should do about suffxies,
and should it do anything to resolve the symlinks before changing the filename?
```
$ LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib" ./src/utilities/rstest/rstest -c /tmp/f49137920.NEF
dl_info.dli_fname "/usr/local/lib/libomp.so.5"
strlen(dl_info.dli_fname) 26
lib_path_length 14
lib_path_length + 12 26
=================================================================
==30949==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000002a at pc 0x000000548648 bp 0x7ffdfa0aa780 sp 0x7ffdfa0a9f40
WRITE of size 27 at 0x60300000002a thread T0
#0 0x548647 in strcpy (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647)
#1 0x7fb9e3e3d234 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:102:5
#2 0x7fb9e3dcb446 in __kmp_do_serial_initialize() /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6742:3
#3 0x7fb9e3dcb40b in __kmp_get_global_thread_id_reg /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:251:7
#4 0x59e035 in main /home/lebedevri/rawspeed/build-Clang-SANITIZE/../src/utilities/rstest/rstest.cpp:491
#5 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16
#6 0x4df449 in _start (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x4df449)
0x60300000002a is located 0 bytes to the right of 26-byte region [0x603000000010,0x60300000002a)
allocated by thread T0 here:
#0 0x55cc5d in malloc (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x55cc5d)
#1 0x7fb9e3e3d224 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:101:17
#2 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16
SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647) in strcpy
Shadow bytes around the buggy address:
0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c067fff8000: fa fa 00 00 00[02]fa fa fa fa fa fa fa fa fa fa
0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==30949==ABORTING
Aborted
```
The annotations in libomp were never built by default. The annotations are
also superseded by the annotations which the OMPT tool libarcher.so provides.
With respect to libarcher, libomp behaves as if libarcher would be the last
element of OMP_TOOL_LIBARARIES. I.e., if no other OMPT tool gets active,
libarcher will check if an OpenMP application is built with TSan.
Since libarcher gets loaded by default, enabling LIBOMP_TSAN_SUPPORT would
result in redundant annotations for TSan, which slightly differ in details
and coverage (e.g. task dependencies are not handled well by the annotations
in libomp).
This patch removes all TSan annotations from the OpenMP runtime code.
Differential Revision: https://reviews.llvm.org/D103767
This patch includes the following changes to address a few issues when
using hidden helper task.
- Assertion is triggered when there are inadvertent calls to hidden
helper functions on non-Linux OS
- Added deinit code in __kmp_internal_end_library function to fix random
shutdown crashes
- Moved task data access into the lock-guarded region in __kmp_push_task
Differential Revision: https://reviews.llvm.org/D105308
This reverts commit eab1fd389b.
This commit fixed a problem with 25073a4ecf (D103121) which is the one
we actually need to revert to unblock non-X86 builds of OpenMP. Can be
reapplied, or merged into, D103121 as it goes in again.
Normalized bounds of chunk of iterations to steal from are inclusive,
so upper bound should not be decremented in expression to check.
Problem was in attempt to steal iterations 0:0, that caused assertion after
wrong decrement. Reported in comment to https://reviews.llvm.org/D103648.
Differential Revision: https://reviews.llvm.org/D104880